Building a YouTube Video Transcription API with Node.js

Somebody who likes to code
YouTube video transcripts are valuable for content analysis, accessibility, and creating searchable content archives. While npm libraries like youtube-transcript once provided easy access to this data, many have become unreliable due to YouTube's frequent internal interface changes. In this article, we'll build a YouTube transcription API using Node.js, Express, and Playwright as an alternative method.
Project Overview
Our API will consist of several key components:
Express.js server (
server.js) - Main application entry pointTranscript extractor (
transcript.js) - Playwright-based scraping logicError handling (
errorHandler.js) - RFC 7807 compliant error responsesSecurity middleware (
securityHandler.js) - API key authenticationDocker containerization for easy deployment
Setting Up the Development Environment
As mentioned in the Node.js and Express development environment setup article, we'll use a modern Node.js development setup with ESLint, Prettier, and Husky for code quality. Once the environment is ready, run the following command:
npm i express playwright express-healthcheck http-problem-details morgan
Key dependencies explained:
Express 5.x: Latest Express.js for the REST API.
Playwright: Reliable browser automation for scraping.
http-problem-details: RFC 7807 compliant error responses.
morgan: HTTP request logging.
express-healthcheck: Built-in health monitoring.
Implementing Error Handling
Before building the main functionality, let's establish robust error handling using the RFC 7807 Problem Details standard in the errorHandler.js file:
import { ProblemDocument } from 'http-problem-details';
export class AppError extends Error {
constructor(error, type, status, data = null) {
super(error);
this.type = type;
this.status = status;
this.detail = error;
this.data = data;
}
}
// eslint-disable-next-line no-unused-vars
export const errorHandler = (err, req, res, next) => {
console.error(`Error ${err.status || 500}: ${err.message}`, {
url: req.originalUrl,
method: req.method,
timestamp: new Date().toISOString(),
});
if (err instanceof AppError) {
const problem = new ProblemDocument({
type: '/problems/' + err.type,
title: err.type,
status: err.status,
detail: err.detail,
instance: req.originalUrl,
});
if (err.data) {
Object.assign(problem, err.data);
}
res.status(err.status).json(problem);
} else {
res.status(500).json(
new ProblemDocument({
type: '/problems/internal-server-error',
title: 'InternalServerError',
status: 500,
instance: req.originalUrl,
})
);
}
};
This error handler provides:
Structured error responses following the RFC 7807 standard.
Detailed logging with request context.
Consistent error format across all endpoints.
Optional debug data (like screenshots for debugging).
Building the Transcript Extractor
The core functionality lies in the transcript.js file, which uses Playwright to extract transcripts from YouTube:
import { chromium } from 'playwright';
import { AppError } from './errorHandler.js';
const USER_AGENT =
process.env.USER_AGENT ||
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36';
const selectors = {
expand: process.env.EXPAND_SELECTOR || 'tp-yt-paper-button#expand',
notFound:
process.env.NOT_FOUND_SELECTOR ||
'div.promo-title:has-text("This video isn\'t available anymore"), div.promo-title:has-text("Este video ya no está disponible")',
showTranscript:
process.env.SHOW_TRANSCRIPT_SELECTOR ||
'button[aria-label="Show transcript"], button[aria-label="Mostrar transcripción"]',
viewCount: process.env.VIEW_COUNT_SELECTOR || 'yt-formatted-string#info span',
transcriptSegment:
process.env.TRANSCRIPT_SEGMENT_SELECTOR ||
'ytd-transcript-segment-renderer',
transcript: process.env.TRANSCRIPT_SELECTOR || 'ytd-transcript-renderer',
text: process.env.TRANSCRIPT_TEXT_SELECTOR || '.segment-text',
};
The selector configuration approach provides several advantages:
Environment-based customization for different YouTube layouts.
Easy maintenance when YouTube changes its interface.
Multi-language support through configurable selectors.
Fallback defaults for common interface elements.
Here is the main extraction logic:
export default async function getTranscript(videoId) {
const browser = await chromium.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox'],
});
try {
const context = await browser.newContext({
userAgent: USER_AGENT,
});
const page = await context.newPage();
await page.goto(`https://www.youtube.com/watch?v=${videoId}`, {
waitUntil: 'networkidle',
timeout: 30000,
});
const errorElement = await page.$(selectors.notFound);
if (errorElement) {
const screenshot = await page.screenshot({
fullPage: true,
type: 'png',
});
const base64Screenshot = screenshot.toString('base64');
throw new AppError('Video not found or unavailable', 'not_found', 404, {
screenshot: `data:image/png;base64,${base64Screenshot}`,
});
}
const expandButton = await page.$(selectors.expand);
if (!expandButton) {
const screenshot = await page.screenshot({
fullPage: true,
type: 'png',
});
const base64Screenshot = screenshot.toString('base64');
throw new AppError('Expand button not found', 'validation', 400, {
screenshot: `data:image/png;base64,${base64Screenshot}`,
});
}
await expandButton.click({ timeout: 5000 });
const showTranscriptButton = await page.$(selectors.showTranscript);
if (!showTranscriptButton) {
const screenshot = await page.screenshot({
fullPage: true,
type: 'png',
});
const base64Screenshot = screenshot.toString('base64');
throw new AppError(
'Show transcript button not found',
'validation',
400,
{
screenshot: `data:image/png;base64,${base64Screenshot}`,
}
);
}
await showTranscriptButton.click({ timeout: 5000 });
await page.waitForSelector(selectors.transcript, { timeout: 10000 });
const transcript = await page.$$eval(
selectors.transcriptSegment,
(nodes, textSelector) => {
return nodes.map(n => n.querySelector(textSelector)?.innerText.trim());
},
selectors.text
);
const [viewsText] = await page.$$eval(selectors.viewCount, nodes =>
nodes.map(n => n.innerText.trim())
);
const views = parseInt(viewsText.replace(/[^0-9]/g, ''), 10) || 0;
return { transcript: transcript.join(' '), views };
} catch (error) {
if (error instanceof AppError) {
throw error;
}
throw new AppError(
`Failed to fetch transcript: ${error.message}`,
'error',
500
);
} finally {
await browser.close();
}
}
Key implementation details:
Browser Configuration: Headless Chromium with security flags for containerized environments.
Robust Navigation: Network idle waiting ensures full page load.
Error Detection: Proactive checking for video availability.
Screenshot Debugging: Captures page state for troubleshooting.
Resource Cleanup: Always closes the browser to prevent memory leaks.
Adding Security with API Key Authentication
The securityHandler.js file implements optional API key authentication:
import { AppError } from './errorHandler.js';
export const validateApiKey = (req, res, next) => {
const apiKey = req.headers['x-api-key'];
const expectedApiKey = process.env.API_KEY;
if (!expectedApiKey) {
return next();
}
if (!apiKey) {
throw new AppError('API key is required', 'authentication', 401);
}
if (apiKey !== expectedApiKey) {
throw new AppError('Invalid API key', 'authentication', 401);
}
next();
};
This middleware design allows for:
Optional authentication works without an API key if not configured.
Header-based authentication using the
X-API-Keyheader.Consistent error responses through our error handling system.
Building the Express Server
The server.js file ties everything together:
import express from 'express';
import getTranscript from './transcript.js';
import morgan from 'morgan';
import { errorHandler, AppError } from './errorHandler.js';
import { validateApiKey } from './securityHandler.js';
import healthcheck from 'express-healthcheck';
const app = express();
const PORT = process.env.PORT || 5000;
const videoIdRegex = /^[a-zA-Z0-9_-]{11}$/;
app.use(morgan('dev'));
app.use(
'/live',
healthcheck({
healthy: () => ({
status: 'healthy',
uptime: process.uptime(),
timestamp: Date.now(),
}),
})
);
app.get('/transcript/:videoId', validateApiKey, async (req, res) => {
const { videoId } = req.params;
if (!videoId) {
throw new AppError('Video ID is required', 'validation', 400);
}
if (!videoIdRegex.test(videoId)) {
throw new AppError('Invalid video ID format', 'validation', 400);
}
const { transcript, views } = await getTranscript(videoId);
res.status(200).json({
transcript,
views,
});
});
app.use(errorHandler);
app.listen(PORT, () => {
console.log(`Server running on port ${PORT}`);
});
The server implementation includes:
Input Validation: YouTube video ID format validation using regex.
Health Monitoring:
/liveendpoint for deployment health checks.Request Logging: Morgan middleware for HTTP request logging.
Error Handling: Global error middleware catches all exceptions.
Environment Configuration
The .env.example file shows all configurable options:
PORT=5000
API_KEY=your-secret-api-key-here
USER_AGENT=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36
EXPAND_SELECTOR=tp-yt-paper-button#expand
NOT_FOUND_SELECTOR=div.promo-title:has-text("This video isn't available anymore"), div.promo-title:has-text("Este video ya no está disponible")
SHOW_TRANSCRIPT_SELECTOR=button[aria-label="Show transcript"], button[aria-label="Mostrar transcripción"]
VIEW_COUNT_SELECTOR=yt-formatted-string#info span
TRANSCRIPT_SEGMENT_SELECTOR=ytd-transcript-segment-renderer
TRANSCRIPT_SELECTOR=ytd-transcript-renderer
TRANSCRIPT_TEXT_SELECTOR=.segment-text
NODE_ENV=production
This configuration approach enables:
Deployment flexibility across different environments.
Quick adaptation to YouTube HTML changes.
Multi-language support through localized selectors.
Security configuration through environment variables.
Containerization with Docker
The Dockerfile creates a production-ready container:
# Use Node.js LTS version with Debian slim for better Playwright compatibility
FROM node:20-slim AS base
# Install system dependencies required for Playwright
RUN apt-get update && apt-get install -y \
ca-certificates \
fonts-liberation \
libasound2 \
libatk-bridge2.0-0 \
libatk1.0-0 \
libc6 \
libcairo2 \
libcups2 \
libdbus-1-3 \
libexpat1 \
libfontconfig1 \
libgbm1 \
libgcc1 \
libglib2.0-0 \
libgtk-3-0 \
libnspr4 \
libnss3 \
libpango-1.0-0 \
libpangocairo-1.0-0 \
libstdc++6 \
libx11-6 \
libx11-xcb1 \
libxcb1 \
libxcomposite1 \
libxcursor1 \
libxdamage1 \
libxext6 \
libxfixes3 \
libxi6 \
libxrandr2 \
libxrender1 \
libxss1 \
libxtst6 \
lsb-release \
wget \
xdg-utils \
&& rm -rf /var/lib/apt/lists/*
# Set working directory
WORKDIR /app
# Copy package files
COPY package*.json ./
# Install dependencies
FROM base AS dependencies
RUN npm ci --omit=dev --ignore-scripts && npm cache clean --force
# Install only Playwright browsers (without system deps)
RUN npx playwright install chromium
# Production stage
FROM base AS production
# Create non-root user for security
RUN groupadd -r nodejs && useradd -r -g nodejs nodejs
# Copy node_modules from dependencies stage
COPY --from=dependencies --chown=nodejs:nodejs /app/node_modules ./node_modules
# Copy Playwright browsers from dependencies stage
COPY --from=dependencies --chown=nodejs:nodejs /root/.cache/ms-playwright /home/nodejs/.cache/ms-playwright
# Copy application files
COPY --chown=nodejs:nodejs . .
# Remove development files if they exist
RUN rm -f .env.example .gitignore README.md
# Switch to non-root user
USER nodejs
# Expose port
EXPOSE 5000
# Start the application
CMD ["node", "server.js"]
The Docker setup provides:
Multi-stage builds for smaller final images.
Security hardening with a non-root user.
Playwright optimization with pre-installed browsers.
Production readiness with minimal attack surface.
The docker-compose.yml configuration is optimized for deployment with Coolify:
version: '3.8'
services:
youtube-transcript-api:
build: .
ports:
- '5000:5000'
environment:
- NODE_ENV=production
- PORT=5000
healthcheck:
test:
[
'CMD',
'wget',
'--no-verbose',
'--tries=1',
'--spider',
'http://localhost:5000/live',
]
interval: 30s
timeout: 5s
retries: 3
start_period: 10s
restart: unless-stopped
Next Steps
To enhance this API further, consider implementing:
Browser Reuse: Consider implementing browser instance pooling for high-traffic scenarios.
Caching: Add caching for frequently requested transcripts and/or transcript persistence.
You can find all the code here. Thanks, and happy coding.




