Building a YouTube Video Transcription API with Node.js

YouTube video transcripts are valuable for content analysis, accessibility, and creating searchable content archives. While npm libraries like youtube-transcript once provided easy access to this data, many have become unreliable due to YouTube's frequent internal interface changes. In this article, we'll build a YouTube transcription API using Node.js, Express, and Playwright as an alternative method.

Project Overview

Our API will consist of several key components:

Express.js server (server.js) - Main application entry point
Transcript extractor (transcript.js) - Playwright-based scraping logic
Error handling (errorHandler.js) - RFC 7807 compliant error responses
Security middleware (securityHandler.js) - API key authentication
Docker containerization for easy deployment

Setting Up the Development Environment

As mentioned in the Node.js and Express development environment setup article, we'll use a modern Node.js development setup with ESLint, Prettier, and Husky for code quality. Once the environment is ready, run the following command:

npm i express playwright express-healthcheck http-problem-details morgan

Key dependencies explained:

Express 5.x: Latest Express.js for the REST API.
Playwright: Reliable browser automation for scraping.
http-problem-details: RFC 7807 compliant error responses.
morgan: HTTP request logging.
express-healthcheck: Built-in health monitoring.

Implementing Error Handling

Before building the main functionality, let's establish robust error handling using the RFC 7807 Problem Details standard in the errorHandler.js file:

import { ProblemDocument } from 'http-problem-details';

export class AppError extends Error {
  constructor(error, type, status, data = null) {
    super(error);
    this.type = type;
    this.status = status;
    this.detail = error;
    this.data = data;
  }
}

// eslint-disable-next-line no-unused-vars
export const errorHandler = (err, req, res, next) => {
  console.error(`Error ${err.status || 500}: ${err.message}`, {
    url: req.originalUrl,
    method: req.method,
    timestamp: new Date().toISOString(),
  });

  if (err instanceof AppError) {
    const problem = new ProblemDocument({
      type: '/problems/' + err.type,
      title: err.type,
      status: err.status,
      detail: err.detail,
      instance: req.originalUrl,
    });

    if (err.data) {
      Object.assign(problem, err.data);
    }

    res.status(err.status).json(problem);
  } else {
    res.status(500).json(
      new ProblemDocument({
        type: '/problems/internal-server-error',
        title: 'InternalServerError',
        status: 500,
        instance: req.originalUrl,
      })
    );
  }
};

This error handler provides:

Structured error responses following the RFC 7807 standard.
Detailed logging with request context.
Consistent error format across all endpoints.
Optional debug data (like screenshots for debugging).

Building the Transcript Extractor

The core functionality lies in the transcript.js file, which uses Playwright to extract transcripts from YouTube:

import { chromium } from 'playwright';
import { AppError } from './errorHandler.js';

const USER_AGENT =
  process.env.USER_AGENT ||
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36';

const selectors = {
  expand: process.env.EXPAND_SELECTOR || 'tp-yt-paper-button#expand',
  notFound:
    process.env.NOT_FOUND_SELECTOR ||
    'div.promo-title:has-text("This video isn\'t available anymore"), div.promo-title:has-text("Este video ya no está disponible")',
  showTranscript:
    process.env.SHOW_TRANSCRIPT_SELECTOR ||
    'button[aria-label="Show transcript"], button[aria-label="Mostrar transcripción"]',
  viewCount: process.env.VIEW_COUNT_SELECTOR || 'yt-formatted-string#info span',
  transcriptSegment:
    process.env.TRANSCRIPT_SEGMENT_SELECTOR ||
    'ytd-transcript-segment-renderer',
  transcript: process.env.TRANSCRIPT_SELECTOR || 'ytd-transcript-renderer',
  text: process.env.TRANSCRIPT_TEXT_SELECTOR || '.segment-text',
};

The selector configuration approach provides several advantages:

Environment-based customization for different YouTube layouts.
Easy maintenance when YouTube changes its interface.
Multi-language support through configurable selectors.
Fallback defaults for common interface elements.

Here is the main extraction logic:

export default async function getTranscript(videoId) {
  const browser = await chromium.launch({
    headless: true,
    args: ['--no-sandbox', '--disable-setuid-sandbox'],
  });

  try {
    const context = await browser.newContext({
      userAgent: USER_AGENT,
    });

    const page = await context.newPage();

    await page.goto(`https://www.youtube.com/watch?v=${videoId}`, {
      waitUntil: 'networkidle',
      timeout: 30000,
    });

    const errorElement = await page.$(selectors.notFound);
    if (errorElement) {
      const screenshot = await page.screenshot({
        fullPage: true,
        type: 'png',
      });
      const base64Screenshot = screenshot.toString('base64');
      throw new AppError('Video not found or unavailable', 'not_found', 404, {
        screenshot: `data:image/png;base64,${base64Screenshot}`,
      });
    }

    const expandButton = await page.$(selectors.expand);
    if (!expandButton) {
      const screenshot = await page.screenshot({
        fullPage: true,
        type: 'png',
      });
      const base64Screenshot = screenshot.toString('base64');
      throw new AppError('Expand button not found', 'validation', 400, {
        screenshot: `data:image/png;base64,${base64Screenshot}`,
      });
    }

    await expandButton.click({ timeout: 5000 });

    const showTranscriptButton = await page.$(selectors.showTranscript);
    if (!showTranscriptButton) {
      const screenshot = await page.screenshot({
        fullPage: true,
        type: 'png',
      });
      const base64Screenshot = screenshot.toString('base64');
      throw new AppError(
        'Show transcript button not found',
        'validation',
        400,
        {
          screenshot: `data:image/png;base64,${base64Screenshot}`,
        }
      );
    }

    await showTranscriptButton.click({ timeout: 5000 });

    await page.waitForSelector(selectors.transcript, { timeout: 10000 });

    const transcript = await page.$$eval(
      selectors.transcriptSegment,
      (nodes, textSelector) => {
        return nodes.map(n => n.querySelector(textSelector)?.innerText.trim());
      },
      selectors.text
    );

    const [viewsText] = await page.$$eval(selectors.viewCount, nodes =>
      nodes.map(n => n.innerText.trim())
    );

    const views = parseInt(viewsText.replace(/[^0-9]/g, ''), 10) || 0;

    return { transcript: transcript.join(' '), views };
  } catch (error) {
    if (error instanceof AppError) {
      throw error;
    }
    throw new AppError(
      `Failed to fetch transcript: ${error.message}`,
      'error',
      500
    );
  } finally {
    await browser.close();
  }
}

Key implementation details:

Browser Configuration: Headless Chromium with security flags for containerized environments.
Robust Navigation: Network idle waiting ensures full page load.
Error Detection: Proactive checking for video availability.
Screenshot Debugging: Captures page state for troubleshooting.
Resource Cleanup: Always closes the browser to prevent memory leaks.

Adding Security with API Key Authentication

The securityHandler.js file implements optional API key authentication:

import { AppError } from './errorHandler.js';

export const validateApiKey = (req, res, next) => {
  const apiKey = req.headers['x-api-key'];
  const expectedApiKey = process.env.API_KEY;

  if (!expectedApiKey) {
    return next();
  }

  if (!apiKey) {
    throw new AppError('API key is required', 'authentication', 401);
  }

  if (apiKey !== expectedApiKey) {
    throw new AppError('Invalid API key', 'authentication', 401);
  }

  next();
};

This middleware design allows for:

Optional authentication works without an API key if not configured.
Header-based authentication using the X-API-Key header.
Consistent error responses through our error handling system.

Building the Express Server

The server.js file ties everything together:

import express from 'express';
import getTranscript from './transcript.js';
import morgan from 'morgan';
import { errorHandler, AppError } from './errorHandler.js';
import { validateApiKey } from './securityHandler.js';
import healthcheck from 'express-healthcheck';

const app = express();
const PORT = process.env.PORT || 5000;

const videoIdRegex = /^[a-zA-Z0-9_-]{11}$/;

app.use(morgan('dev'));
app.use(
  '/live',
  healthcheck({
    healthy: () => ({
      status: 'healthy',
      uptime: process.uptime(),
      timestamp: Date.now(),
    }),
  })
);
app.get('/transcript/:videoId', validateApiKey, async (req, res) => {
  const { videoId } = req.params;

  if (!videoId) {
    throw new AppError('Video ID is required', 'validation', 400);
  }

  if (!videoIdRegex.test(videoId)) {
    throw new AppError('Invalid video ID format', 'validation', 400);
  }

  const { transcript, views } = await getTranscript(videoId);

  res.status(200).json({
    transcript,
    views,
  });
});

app.use(errorHandler);

app.listen(PORT, () => {
  console.log(`Server running on port ${PORT}`);
});

The server implementation includes:

Input Validation: YouTube video ID format validation using regex.
Health Monitoring: /live endpoint for deployment health checks.
Request Logging: Morgan middleware for HTTP request logging.
Error Handling: Global error middleware catches all exceptions.

Environment Configuration

The .env.example file shows all configurable options:

PORT=5000
API_KEY=your-secret-api-key-here
USER_AGENT=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36
EXPAND_SELECTOR=tp-yt-paper-button#expand
NOT_FOUND_SELECTOR=div.promo-title:has-text("This video isn't available anymore"), div.promo-title:has-text("Este video ya no está disponible")
SHOW_TRANSCRIPT_SELECTOR=button[aria-label="Show transcript"], button[aria-label="Mostrar transcripción"]
VIEW_COUNT_SELECTOR=yt-formatted-string#info span
TRANSCRIPT_SEGMENT_SELECTOR=ytd-transcript-segment-renderer
TRANSCRIPT_SELECTOR=ytd-transcript-renderer
TRANSCRIPT_TEXT_SELECTOR=.segment-text
NODE_ENV=production

This configuration approach enables:

Deployment flexibility across different environments.
Quick adaptation to YouTube HTML changes.
Multi-language support through localized selectors.
Security configuration through environment variables.

Containerization with Docker

The Dockerfile creates a production-ready container:

# Use Node.js LTS version with Debian slim for better Playwright compatibility
FROM node:20-slim AS base
# Install system dependencies required for Playwright
RUN apt-get update && apt-get install -y \
    ca-certificates \
    fonts-liberation \
    libasound2 \
    libatk-bridge2.0-0 \
    libatk1.0-0 \
    libc6 \
    libcairo2 \
    libcups2 \
    libdbus-1-3 \
    libexpat1 \
    libfontconfig1 \
    libgbm1 \
    libgcc1 \
    libglib2.0-0 \
    libgtk-3-0 \
    libnspr4 \
    libnss3 \
    libpango-1.0-0 \
    libpangocairo-1.0-0 \
    libstdc++6 \
    libx11-6 \
    libx11-xcb1 \
    libxcb1 \
    libxcomposite1 \
    libxcursor1 \
    libxdamage1 \
    libxext6 \
    libxfixes3 \
    libxi6 \
    libxrandr2 \
    libxrender1 \
    libxss1 \
    libxtst6 \
    lsb-release \
    wget \
    xdg-utils \
    && rm -rf /var/lib/apt/lists/*
# Set working directory
WORKDIR /app
# Copy package files
COPY package*.json ./
# Install dependencies
FROM base AS dependencies
RUN npm ci --omit=dev --ignore-scripts && npm cache clean --force
# Install only Playwright browsers (without system deps)
RUN npx playwright install chromium
# Production stage
FROM base AS production
# Create non-root user for security
RUN groupadd -r nodejs && useradd -r -g nodejs nodejs
# Copy node_modules from dependencies stage
COPY --from=dependencies --chown=nodejs:nodejs /app/node_modules ./node_modules
# Copy Playwright browsers from dependencies stage
COPY --from=dependencies --chown=nodejs:nodejs /root/.cache/ms-playwright /home/nodejs/.cache/ms-playwright
# Copy application files
COPY --chown=nodejs:nodejs . .
# Remove development files if they exist
RUN rm -f .env.example .gitignore README.md
# Switch to non-root user
USER nodejs
# Expose port
EXPOSE 5000
# Start the application
CMD ["node", "server.js"]

The Docker setup provides:

Multi-stage builds for smaller final images.
Security hardening with a non-root user.
Playwright optimization with pre-installed browsers.
Production readiness with minimal attack surface.

The docker-compose.yml configuration is optimized for deployment with Coolify:

version: '3.8'

services:
  youtube-transcript-api:
    build: .
    ports:
      - '5000:5000'
    environment:
      - NODE_ENV=production
      - PORT=5000
    healthcheck:
      test:
        [
          'CMD',
          'wget',
          '--no-verbose',
          '--tries=1',
          '--spider',
          'http://localhost:5000/live',
        ]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s
    restart: unless-stopped

Next Steps

To enhance this API further, consider implementing:

Browser Reuse: Consider implementing browser instance pooling for high-traffic scenarios.
Caching: Add caching for frequently requested transcripts and/or transcript persistence.

You can find all the code here. Thanks, and happy coding.

Building a YouTube Video Transcription API with Node.js

Project Overview

Setting Up the Development Environment

Implementing Error Handling

Building the Transcript Extractor

Adding Security with API Key Authentication

Building the Express Server

Environment Configuration

Containerization with Docker

Next Steps

Comments

NodeJs

Node.js and Express: OpenAPI

More from this blog

AWS SNS Topic: Check List

AWS SQS Queue: Check List

AWS API Gateway HTTP API: Check List

AWS Lambda Function: Check List

Moto Server: Our Own Local AWS

Command Palette

Project Overview

Setting Up the Development Environment

Implementing Error Handling

Building the Transcript Extractor

Adding Security with API Key Authentication

Building the Express Server

Environment Configuration

Containerization with Docker

Next Steps

Comments

NodeJs

Node.js and Express: OpenAPI

More from this blog