Skip to main content

Command Palette

Search for a command to run...

Building a YouTube Video Transcription API with Node.js

Updated
7 min read
Building a YouTube Video Transcription API with Node.js
R

Somebody who likes to code

YouTube video transcripts are valuable for content analysis, accessibility, and creating searchable content archives. While npm libraries like youtube-transcript once provided easy access to this data, many have become unreliable due to YouTube's frequent internal interface changes. In this article, we'll build a YouTube transcription API using Node.js, Express, and Playwright as an alternative method.

Project Overview

Our API will consist of several key components:

  • Express.js server (server.js) - Main application entry point

  • Transcript extractor (transcript.js) - Playwright-based scraping logic

  • Error handling (errorHandler.js) - RFC 7807 compliant error responses

  • Security middleware (securityHandler.js) - API key authentication

  • Docker containerization for easy deployment

Setting Up the Development Environment

As mentioned in the Node.js and Express development environment setup article, we'll use a modern Node.js development setup with ESLint, Prettier, and Husky for code quality. Once the environment is ready, run the following command:

npm i express playwright express-healthcheck http-problem-details morgan

Key dependencies explained:

  • Express 5.x: Latest Express.js for the REST API.

  • Playwright: Reliable browser automation for scraping.

  • http-problem-details: RFC 7807 compliant error responses.

  • morgan: HTTP request logging.

  • express-healthcheck: Built-in health monitoring.

Implementing Error Handling

Before building the main functionality, let's establish robust error handling using the RFC 7807 Problem Details standard in the errorHandler.js file:

import { ProblemDocument } from 'http-problem-details';

export class AppError extends Error {
  constructor(error, type, status, data = null) {
    super(error);
    this.type = type;
    this.status = status;
    this.detail = error;
    this.data = data;
  }
}

// eslint-disable-next-line no-unused-vars
export const errorHandler = (err, req, res, next) => {
  console.error(`Error ${err.status || 500}: ${err.message}`, {
    url: req.originalUrl,
    method: req.method,
    timestamp: new Date().toISOString(),
  });

  if (err instanceof AppError) {
    const problem = new ProblemDocument({
      type: '/problems/' + err.type,
      title: err.type,
      status: err.status,
      detail: err.detail,
      instance: req.originalUrl,
    });

    if (err.data) {
      Object.assign(problem, err.data);
    }

    res.status(err.status).json(problem);
  } else {
    res.status(500).json(
      new ProblemDocument({
        type: '/problems/internal-server-error',
        title: 'InternalServerError',
        status: 500,
        instance: req.originalUrl,
      })
    );
  }
};

This error handler provides:

  • Structured error responses following the RFC 7807 standard.

  • Detailed logging with request context.

  • Consistent error format across all endpoints.

  • Optional debug data (like screenshots for debugging).

Building the Transcript Extractor

The core functionality lies in the transcript.js file, which uses Playwright to extract transcripts from YouTube:

import { chromium } from 'playwright';
import { AppError } from './errorHandler.js';

const USER_AGENT =
  process.env.USER_AGENT ||
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36';

const selectors = {
  expand: process.env.EXPAND_SELECTOR || 'tp-yt-paper-button#expand',
  notFound:
    process.env.NOT_FOUND_SELECTOR ||
    'div.promo-title:has-text("This video isn\'t available anymore"), div.promo-title:has-text("Este video ya no está disponible")',
  showTranscript:
    process.env.SHOW_TRANSCRIPT_SELECTOR ||
    'button[aria-label="Show transcript"], button[aria-label="Mostrar transcripción"]',
  viewCount: process.env.VIEW_COUNT_SELECTOR || 'yt-formatted-string#info span',
  transcriptSegment:
    process.env.TRANSCRIPT_SEGMENT_SELECTOR ||
    'ytd-transcript-segment-renderer',
  transcript: process.env.TRANSCRIPT_SELECTOR || 'ytd-transcript-renderer',
  text: process.env.TRANSCRIPT_TEXT_SELECTOR || '.segment-text',
};

The selector configuration approach provides several advantages:

  • Environment-based customization for different YouTube layouts.

  • Easy maintenance when YouTube changes its interface.

  • Multi-language support through configurable selectors.

  • Fallback defaults for common interface elements.

Here is the main extraction logic:

export default async function getTranscript(videoId) {
  const browser = await chromium.launch({
    headless: true,
    args: ['--no-sandbox', '--disable-setuid-sandbox'],
  });

  try {
    const context = await browser.newContext({
      userAgent: USER_AGENT,
    });

    const page = await context.newPage();

    await page.goto(`https://www.youtube.com/watch?v=${videoId}`, {
      waitUntil: 'networkidle',
      timeout: 30000,
    });

    const errorElement = await page.$(selectors.notFound);
    if (errorElement) {
      const screenshot = await page.screenshot({
        fullPage: true,
        type: 'png',
      });
      const base64Screenshot = screenshot.toString('base64');
      throw new AppError('Video not found or unavailable', 'not_found', 404, {
        screenshot: `data:image/png;base64,${base64Screenshot}`,
      });
    }

    const expandButton = await page.$(selectors.expand);
    if (!expandButton) {
      const screenshot = await page.screenshot({
        fullPage: true,
        type: 'png',
      });
      const base64Screenshot = screenshot.toString('base64');
      throw new AppError('Expand button not found', 'validation', 400, {
        screenshot: `data:image/png;base64,${base64Screenshot}`,
      });
    }

    await expandButton.click({ timeout: 5000 });

    const showTranscriptButton = await page.$(selectors.showTranscript);
    if (!showTranscriptButton) {
      const screenshot = await page.screenshot({
        fullPage: true,
        type: 'png',
      });
      const base64Screenshot = screenshot.toString('base64');
      throw new AppError(
        'Show transcript button not found',
        'validation',
        400,
        {
          screenshot: `data:image/png;base64,${base64Screenshot}`,
        }
      );
    }

    await showTranscriptButton.click({ timeout: 5000 });

    await page.waitForSelector(selectors.transcript, { timeout: 10000 });

    const transcript = await page.$$eval(
      selectors.transcriptSegment,
      (nodes, textSelector) => {
        return nodes.map(n => n.querySelector(textSelector)?.innerText.trim());
      },
      selectors.text
    );

    const [viewsText] = await page.$$eval(selectors.viewCount, nodes =>
      nodes.map(n => n.innerText.trim())
    );

    const views = parseInt(viewsText.replace(/[^0-9]/g, ''), 10) || 0;

    return { transcript: transcript.join(' '), views };
  } catch (error) {
    if (error instanceof AppError) {
      throw error;
    }
    throw new AppError(
      `Failed to fetch transcript: ${error.message}`,
      'error',
      500
    );
  } finally {
    await browser.close();
  }
}

Key implementation details:

  • Browser Configuration: Headless Chromium with security flags for containerized environments.

  • Robust Navigation: Network idle waiting ensures full page load.

  • Error Detection: Proactive checking for video availability.

  • Screenshot Debugging: Captures page state for troubleshooting.

  • Resource Cleanup: Always closes the browser to prevent memory leaks.

Adding Security with API Key Authentication

The securityHandler.js file implements optional API key authentication:

import { AppError } from './errorHandler.js';

export const validateApiKey = (req, res, next) => {
  const apiKey = req.headers['x-api-key'];
  const expectedApiKey = process.env.API_KEY;

  if (!expectedApiKey) {
    return next();
  }

  if (!apiKey) {
    throw new AppError('API key is required', 'authentication', 401);
  }

  if (apiKey !== expectedApiKey) {
    throw new AppError('Invalid API key', 'authentication', 401);
  }

  next();
};

This middleware design allows for:

  • Optional authentication works without an API key if not configured.

  • Header-based authentication using the X-API-Key header.

  • Consistent error responses through our error handling system.

Building the Express Server

The server.js file ties everything together:

import express from 'express';
import getTranscript from './transcript.js';
import morgan from 'morgan';
import { errorHandler, AppError } from './errorHandler.js';
import { validateApiKey } from './securityHandler.js';
import healthcheck from 'express-healthcheck';

const app = express();
const PORT = process.env.PORT || 5000;

const videoIdRegex = /^[a-zA-Z0-9_-]{11}$/;

app.use(morgan('dev'));
app.use(
  '/live',
  healthcheck({
    healthy: () => ({
      status: 'healthy',
      uptime: process.uptime(),
      timestamp: Date.now(),
    }),
  })
);
app.get('/transcript/:videoId', validateApiKey, async (req, res) => {
  const { videoId } = req.params;

  if (!videoId) {
    throw new AppError('Video ID is required', 'validation', 400);
  }

  if (!videoIdRegex.test(videoId)) {
    throw new AppError('Invalid video ID format', 'validation', 400);
  }

  const { transcript, views } = await getTranscript(videoId);

  res.status(200).json({
    transcript,
    views,
  });
});

app.use(errorHandler);

app.listen(PORT, () => {
  console.log(`Server running on port ${PORT}`);
});

The server implementation includes:

  • Input Validation: YouTube video ID format validation using regex.

  • Health Monitoring: /live endpoint for deployment health checks.

  • Request Logging: Morgan middleware for HTTP request logging.

  • Error Handling: Global error middleware catches all exceptions.

Environment Configuration

The .env.example file shows all configurable options:

PORT=5000
API_KEY=your-secret-api-key-here
USER_AGENT=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36
EXPAND_SELECTOR=tp-yt-paper-button#expand
NOT_FOUND_SELECTOR=div.promo-title:has-text("This video isn't available anymore"), div.promo-title:has-text("Este video ya no está disponible")
SHOW_TRANSCRIPT_SELECTOR=button[aria-label="Show transcript"], button[aria-label="Mostrar transcripción"]
VIEW_COUNT_SELECTOR=yt-formatted-string#info span
TRANSCRIPT_SEGMENT_SELECTOR=ytd-transcript-segment-renderer
TRANSCRIPT_SELECTOR=ytd-transcript-renderer
TRANSCRIPT_TEXT_SELECTOR=.segment-text
NODE_ENV=production

This configuration approach enables:

  • Deployment flexibility across different environments.

  • Quick adaptation to YouTube HTML changes.

  • Multi-language support through localized selectors.

  • Security configuration through environment variables.

Containerization with Docker

The Dockerfile creates a production-ready container:

# Use Node.js LTS version with Debian slim for better Playwright compatibility
FROM node:20-slim AS base
# Install system dependencies required for Playwright
RUN apt-get update && apt-get install -y \
    ca-certificates \
    fonts-liberation \
    libasound2 \
    libatk-bridge2.0-0 \
    libatk1.0-0 \
    libc6 \
    libcairo2 \
    libcups2 \
    libdbus-1-3 \
    libexpat1 \
    libfontconfig1 \
    libgbm1 \
    libgcc1 \
    libglib2.0-0 \
    libgtk-3-0 \
    libnspr4 \
    libnss3 \
    libpango-1.0-0 \
    libpangocairo-1.0-0 \
    libstdc++6 \
    libx11-6 \
    libx11-xcb1 \
    libxcb1 \
    libxcomposite1 \
    libxcursor1 \
    libxdamage1 \
    libxext6 \
    libxfixes3 \
    libxi6 \
    libxrandr2 \
    libxrender1 \
    libxss1 \
    libxtst6 \
    lsb-release \
    wget \
    xdg-utils \
    && rm -rf /var/lib/apt/lists/*
# Set working directory
WORKDIR /app
# Copy package files
COPY package*.json ./
# Install dependencies
FROM base AS dependencies
RUN npm ci --omit=dev --ignore-scripts && npm cache clean --force
# Install only Playwright browsers (without system deps)
RUN npx playwright install chromium
# Production stage
FROM base AS production
# Create non-root user for security
RUN groupadd -r nodejs && useradd -r -g nodejs nodejs
# Copy node_modules from dependencies stage
COPY --from=dependencies --chown=nodejs:nodejs /app/node_modules ./node_modules
# Copy Playwright browsers from dependencies stage
COPY --from=dependencies --chown=nodejs:nodejs /root/.cache/ms-playwright /home/nodejs/.cache/ms-playwright
# Copy application files
COPY --chown=nodejs:nodejs . .
# Remove development files if they exist
RUN rm -f .env.example .gitignore README.md
# Switch to non-root user
USER nodejs
# Expose port
EXPOSE 5000
# Start the application
CMD ["node", "server.js"]

The Docker setup provides:

  • Multi-stage builds for smaller final images.

  • Security hardening with a non-root user.

  • Playwright optimization with pre-installed browsers.

  • Production readiness with minimal attack surface.

The docker-compose.yml configuration is optimized for deployment with Coolify:

version: '3.8'

services:
  youtube-transcript-api:
    build: .
    ports:
      - '5000:5000'
    environment:
      - NODE_ENV=production
      - PORT=5000
    healthcheck:
      test:
        [
          'CMD',
          'wget',
          '--no-verbose',
          '--tries=1',
          '--spider',
          'http://localhost:5000/live',
        ]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s
    restart: unless-stopped

Next Steps

To enhance this API further, consider implementing:

  • Browser Reuse: Consider implementing browser instance pooling for high-traffic scenarios.

  • Caching: Add caching for frequently requested transcripts and/or transcript persistence.

You can find all the code here. Thanks, and happy coding.

More from this blog

raulnq

171 posts

Somebody who likes to code