Why We Built an All-in-One PDF API (and Why You Should Stop Using 3 Different Tools)

The hidden costs of cobbling together Puppeteer, pdfcpu, and Ghostscript for PDF tasks. How a single API replaces your entire PDF toolchain.

By LightningPDF Team · April 1, 2026 · 6 min read

Two years ago, our PDF stack looked like this: Puppeteer for HTML-to-PDF conversion, pdfcpu for merging and splitting, Ghostscript for compression, qpdf for password protection, and a homegrown Node script that orchestrated all of them. Five tools, three languages, and a Docker image that took 8 minutes to build.

Every week something broke. Puppeteer would hang on a page with an infinite CSS animation. Ghostscript would segfault on a PDF that pdfcpu handled fine. The Node orchestrator would leak file handles when Ghostscript timed out. We spent more time maintaining the PDF toolchain than building our actual product.

That's why we built LightningPDF. Not because the world needed another PDF API, but because we were tired of being glue-code engineers for five different PDF tools that were never designed to work together.

The real cost of "just use open-source tools"

On paper, the open-source approach sounds reasonable. Each tool is free, well-documented, and good at its specific job. The costs are hidden.

Docker image bloat

Here's what happens to your Docker image when you add PDF tools:

Tool	Image size added	System dependencies
Puppeteer + Chromium	~400MB	libnss3, libatk, libcups, fonts
Ghostscript	~45MB	libtiff, libjpeg, fontconfig
pdfcpu	~15MB	None (Go binary)
qpdf	~8MB	libqpdf, zlib
LibreOffice (for DOCX)	~680MB	Java runtime, fonts
Total	~1.15GB	20+ system packages

Our production Docker image went from 120MB (our app alone) to 1.3GB. Build times tripled. Cold starts on AWS Lambda were brutal — 12 seconds before the first request could be served. We tried multi-stage builds, Alpine variants, and Lambda layers. Each optimization shaved off 50-100MB but added complexity.

With a single API call to LightningPDF, your Docker image stays at whatever your app needs. Zero PDF dependencies. Zero system packages.

Memory usage in production

Chromium is the big offender. Each Puppeteer browser instance uses 150-300MB of RAM. If you're generating PDFs concurrently (and you will be, once your users discover the "export all" button), you need to manage a browser pool, set concurrency limits, and handle OOM kills.

We tracked our memory usage over a month:

Idle: 180MB (Chromium sitting there, waiting)
Single PDF: 320MB
5 concurrent PDFs: 1.1GB
10 concurrent PDFs: OOM killed on a 2GB instance

With LightningPDF, you're making HTTP requests. Memory usage is whatever requests or fetch needs — a few megabytes at most. The browser pool runs on our infrastructure, not yours.

Maintenance overhead

Every tool has its own release cycle, breaking changes, and security patches.

In the past year alone:

Puppeteer v21 changed the default Chrome channel and broke our font rendering
Ghostscript had a critical CVE (CVE-2024-29510) that required an emergency patch
pdfcpu changed its CLI flags in a minor version bump
Chromium dropped support for some TrueType font hinting, making our invoices look different

Each of these required debugging, testing, and deploying a fix. That's engineering time that doesn't ship features.

What LightningPDF replaces

Here's the full feature set, mapped to what you'd use instead:

Feature	LightningPDF	Without LightningPDF
HTML to PDF	`POST /v1/pdf/generate`	Puppeteer/Playwright + Chromium
Merge PDFs	`POST /v1/pdf/merge`	pdfcpu, PyPDF, pdfunite
Split PDF	`POST /v1/pdf/split`	pdfcpu, pdftk
Compress PDF	`POST /v1/pdf/compress`	Ghostscript
Password protect	`POST /v1/pdf/protect`	qpdf, pdftk
PDF/A conversion	`POST /v1/pdf/pdfa`	Ghostscript + ICC profiles
PDF to image	`POST /v1/pdf/image`	Poppler (pdftoppm), ImageMagick
Batch generation	`POST /v1/pdf/batch`	Custom job queue + workers
Async generation	Webhooks + polling	Redis/RabbitMQ + custom workers
Templates	`POST /v1/templates`	Handlebars/Jinja + file system
Markdown to PDF	`markdown` field in request	Markdown parser + Puppeteer
PDF info/metadata	`POST /v1/pdf/info`	pdfcpu, PyPDF, exiftool
WordPress plugin	Paperbolt plugin	Custom WP integration
Tailwind CSS support	Built-in	CDN link + Puppeteer config
Webhook delivery	Built-in	Custom HTTP callback logic

That's 15 capabilities behind a single API key and a single HTTP client. No system packages, no Docker bloat, no version conflicts.

Code comparison: the honest version

Let's generate a PDF, merge it with another, compress the result, and add a password. Real task, real code.

The multi-tool approach

import subprocess
import tempfile
import os
from playwright.sync_api import sync_playwright

def generate_and_process(html_content, existing_pdf_path, password):
    with tempfile.TemporaryDirectory() as tmp:
        # Step 1: Generate PDF from HTML using Playwright
        generated = os.path.join(tmp, "generated.pdf")
        with sync_playwright() as p:
            browser = p.chromium.launch()
            page = browser.new_page()
            page.set_content(html_content, wait_until="networkidle")
            page.pdf(path=generated, format="A4", print_background=True)
            browser.close()

        # Step 2: Merge with existing PDF using pdfcpu
        merged = os.path.join(tmp, "merged.pdf")
        subprocess.run(
            ["pdfcpu", "merge", merged, generated, existing_pdf_path],
            check=True, capture_output=True
        )

        # Step 3: Compress using Ghostscript
        compressed = os.path.join(tmp, "compressed.pdf")
        subprocess.run([
            "gs", "-sDEVICE=pdfwrite",
            "-dCompatibilityLevel=1.4",
            "-dPDFSETTINGS=/ebook",
            "-dNOPAUSE", "-dBATCH", "-dQUIET",
            f"-sOutputFile={compressed}",
            merged
        ], check=True, capture_output=True)

        # Step 4: Password protect using qpdf
        final = os.path.join(tmp, "final.pdf")
        subprocess.run([
            "qpdf", "--encrypt", password, password, "256",
            "--", compressed, final
        ], check=True, capture_output=True)

        with open(final, "rb") as f:
            return f.read()

That's 40 lines of code, 4 subprocess calls, 4 temporary files, and dependencies on Chromium, pdfcpu, Ghostscript, and qpdf. If any of those binaries are missing from your Docker image or have a version mismatch, you get a cryptic subprocess error at runtime.

The LightningPDF approach

import requests
import base64

API = "https://lightningpdf.dev/api/v1"
HEADERS = {"Authorization": "Bearer lpdf_your_key"}

def generate_and_process(html_content, existing_pdf_path, password):
    # Step 1: Generate PDF from HTML
    gen_resp = requests.post(f"{API}/pdf/generate", headers=HEADERS,
        json={"html": html_content, "options": {"format": "A4", "print_background": True}})
    generated_b64 = base64.b64encode(gen_resp.content).decode()

    # Step 2: Merge with existing PDF
    with open(existing_pdf_path, "rb") as f:
        existing_b64 = base64.b64encode(f.read()).decode()

    merge_resp = requests.post(f"{API}/pdf/merge", headers=HEADERS,
        json={"pdfs": [
            {"source": "base64", "data": generated_b64},
            {"source": "base64", "data": existing_b64}
        ]})
    merged_b64 = base64.b64encode(merge_resp.content).decode()

    # Step 3: Compress
    comp_resp = requests.post(f"{API}/pdf/compress", headers=HEADERS,
        json={"pdf": {"source": "base64", "data": merged_b64}})
    compressed_b64 = base64.b64encode(comp_resp.content).decode()

    # Step 4: Password protect
    protect_resp = requests.post(f"{API}/pdf/protect", headers=HEADERS,
        json={"pdf": {"source": "base64", "data": compressed_b64},
              "password": password})

    return protect_resp.content

Same result. No subprocess calls, no temp files, no system dependencies. The only import beyond requests is base64, which is in the standard library.

Latency and throughput

"But an API call is slower than a local binary." Sometimes. Let's look at actual numbers.

For a single PDF generation (10-page HTML document):

Method	Time
Local Puppeteer (cold browser)	3.2s
Local Puppeteer (warm browser)	1.4s
LightningPDF API	1.1s

LightningPDF keeps a warm browser pool. You're not paying the cold start cost. For single operations, the API is competitive with a warm local browser and faster than a cold one.

For batch operations (100 invoices):

Method	Total time	Per document
Local Puppeteer (5 concurrent)	48s	480ms
LightningPDF batch endpoint	22s	220ms

The batch endpoint parallelizes across our infrastructure. You'd need to manage your own worker pool and concurrency limits to match that locally.

Where the API is slower: if you already have a warm browser pool, you're on a fast network, and your documents are simple. In that case, local generation might save you 50-100ms per document. Whether that matters depends on your use case. For most applications, it doesn't.

When the API approach doesn't work

Being honest: there are cases where a local toolchain is the right call.

Air-gapped environments. If your PDFs contain sensitive data that can't leave your network, you need local tools. LightningPDF processes documents on our servers. We don't store them, but the data does transit our infrastructure.

Sub-10ms latency requirements. If you're generating PDFs in a hot loop and every millisecond counts, local generation with a pre-warmed browser wins. This is rare — most PDF generation is triggered by user actions where 1-2 seconds is fine.

Extreme volume at low margin. If you're generating millions of PDFs per month and your margins are thin, the API cost might exceed the engineering cost of maintaining local tools. Do the math for your specific case.

For everyone else — teams that generate hundreds to hundreds of thousands of PDFs per month, that need multiple PDF operations, and that would rather ship features than debug Ghostscript segfaults — a single API is the right trade-off.

Getting started

The free tier includes 50 PDF generations per month. Enough to test everything and build your integration before committing.

# Generate a PDF
curl -X POST https://lightningpdf.dev/api/v1/pdf/generate \
  -H "Authorization: Bearer lpdf_your_key" \
  -H "Content-Type: application/json" \
  -d '{"html": "<h1>Hello</h1><p>First PDF from a single API.</p>"}'

# Merge two PDFs
curl -X POST https://lightningpdf.dev/api/v1/pdf/merge \
  -H "Authorization: Bearer lpdf_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "pdfs": [
      {"source": "url", "url": "https://example.com/doc1.pdf"},
      {"source": "url", "url": "https://example.com/doc2.pdf"}
    ]
  }'

One API key. One HTTP client. Fifteen PDF operations. That's the pitch — and it's also the reality we built because we needed it ourselves.

Check out the full API docs or grab an API key at lightningpdf.dev.

LightningPDF Team

Building fast, reliable PDF generation tools for developers.