Why We Built an All-in-One PDF API (and Why You Should Stop Using 3 Different Tools)
The hidden costs of cobbling together Puppeteer, pdfcpu, and Ghostscript for PDF tasks. How a single API replaces your entire PDF toolchain.
Two years ago, our PDF stack looked like this: Puppeteer for HTML-to-PDF conversion, pdfcpu for merging and splitting, Ghostscript for compression, qpdf for password protection, and a homegrown Node script that orchestrated all of them. Five tools, three languages, and a Docker image that took 8 minutes to build.
Every week something broke. Puppeteer would hang on a page with an infinite CSS animation. Ghostscript would segfault on a PDF that pdfcpu handled fine. The Node orchestrator would leak file handles when Ghostscript timed out. We spent more time maintaining the PDF toolchain than building our actual product.
That's why we built LightningPDF. Not because the world needed another PDF API, but because we were tired of being glue-code engineers for five different PDF tools that were never designed to work together.
The real cost of "just use open-source tools"
On paper, the open-source approach sounds reasonable. Each tool is free, well-documented, and good at its specific job. The costs are hidden.
Docker image bloat
Here's what happens to your Docker image when you add PDF tools:
| Tool | Image size added | System dependencies |
|---|---|---|
| Puppeteer + Chromium | ~400MB | libnss3, libatk, libcups, fonts |
| Ghostscript | ~45MB | libtiff, libjpeg, fontconfig |
| pdfcpu | ~15MB | None (Go binary) |
| qpdf | ~8MB | libqpdf, zlib |
| LibreOffice (for DOCX) | ~680MB | Java runtime, fonts |
| Total | ~1.15GB | 20+ system packages |
Our production Docker image went from 120MB (our app alone) to 1.3GB. Build times tripled. Cold starts on AWS Lambda were brutal — 12 seconds before the first request could be served. We tried multi-stage builds, Alpine variants, and Lambda layers. Each optimization shaved off 50-100MB but added complexity.
With a single API call to LightningPDF, your Docker image stays at whatever your app needs. Zero PDF dependencies. Zero system packages.
Memory usage in production
Chromium is the big offender. Each Puppeteer browser instance uses 150-300MB of RAM. If you're generating PDFs concurrently (and you will be, once your users discover the "export all" button), you need to manage a browser pool, set concurrency limits, and handle OOM kills.
We tracked our memory usage over a month:
- Idle: 180MB (Chromium sitting there, waiting)
- Single PDF: 320MB
- 5 concurrent PDFs: 1.1GB
- 10 concurrent PDFs: OOM killed on a 2GB instance
With LightningPDF, you're making HTTP requests. Memory usage is whatever requests or fetch needs — a few megabytes at most. The browser pool runs on our infrastructure, not yours.
Maintenance overhead
Every tool has its own release cycle, breaking changes, and security patches.
In the past year alone:
- Puppeteer v21 changed the default Chrome channel and broke our font rendering
- Ghostscript had a critical CVE (CVE-2024-29510) that required an emergency patch
- pdfcpu changed its CLI flags in a minor version bump
- Chromium dropped support for some TrueType font hinting, making our invoices look different
Each of these required debugging, testing, and deploying a fix. That's engineering time that doesn't ship features.
What LightningPDF replaces
Here's the full feature set, mapped to what you'd use instead:
| Feature | LightningPDF | Without LightningPDF |
|---|---|---|
| HTML to PDF | POST /v1/pdf/generate |
Puppeteer/Playwright + Chromium |
| Merge PDFs | POST /v1/pdf/merge |
pdfcpu, PyPDF, pdfunite |
| Split PDF | POST /v1/pdf/split |
pdfcpu, pdftk |
| Compress PDF | POST /v1/pdf/compress |
Ghostscript |
| Password protect | POST /v1/pdf/protect |
qpdf, pdftk |
| PDF/A conversion | POST /v1/pdf/pdfa |
Ghostscript + ICC profiles |
| PDF to image | POST /v1/pdf/image |
Poppler (pdftoppm), ImageMagick |
| Batch generation | POST /v1/pdf/batch |
Custom job queue + workers |
| Async generation | Webhooks + polling | Redis/RabbitMQ + custom workers |
| Templates | POST /v1/templates |
Handlebars/Jinja + file system |
| Markdown to PDF | markdown field in request |
Markdown parser + Puppeteer |
| PDF info/metadata | POST /v1/pdf/info |
pdfcpu, PyPDF, exiftool |
| WordPress plugin | Paperbolt plugin | Custom WP integration |
| Tailwind CSS support | Built-in | CDN link + Puppeteer config |
| Webhook delivery | Built-in | Custom HTTP callback logic |
That's 15 capabilities behind a single API key and a single HTTP client. No system packages, no Docker bloat, no version conflicts.
Code comparison: the honest version
Let's generate a PDF, merge it with another, compress the result, and add a password. Real task, real code.
The multi-tool approach
import subprocess
import tempfile
import os
from playwright.sync_api import sync_playwright
def generate_and_process(html_content, existing_pdf_path, password):
with tempfile.TemporaryDirectory() as tmp:
# Step 1: Generate PDF from HTML using Playwright
generated = os.path.join(tmp, "generated.pdf")
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.set_content(html_content, wait_until="networkidle")
page.pdf(path=generated, format="A4", print_background=True)
browser.close()
# Step 2: Merge with existing PDF using pdfcpu
merged = os.path.join(tmp, "merged.pdf")
subprocess.run(
["pdfcpu", "merge", merged, generated, existing_pdf_path],
check=True, capture_output=True
)
# Step 3: Compress using Ghostscript
compressed = os.path.join(tmp, "compressed.pdf")
subprocess.run([
"gs", "-sDEVICE=pdfwrite",
"-dCompatibilityLevel=1.4",
"-dPDFSETTINGS=/ebook",
"-dNOPAUSE", "-dBATCH", "-dQUIET",
f"-sOutputFile={compressed}",
merged
], check=True, capture_output=True)
# Step 4: Password protect using qpdf
final = os.path.join(tmp, "final.pdf")
subprocess.run([
"qpdf", "--encrypt", password, password, "256",
"--", compressed, final
], check=True, capture_output=True)
with open(final, "rb") as f:
return f.read()
That's 40 lines of code, 4 subprocess calls, 4 temporary files, and dependencies on Chromium, pdfcpu, Ghostscript, and qpdf. If any of those binaries are missing from your Docker image or have a version mismatch, you get a cryptic subprocess error at runtime.
The LightningPDF approach
import requests
import base64
API = "https://lightningpdf.dev/api/v1"
HEADERS = {"Authorization": "Bearer lpdf_your_key"}
def generate_and_process(html_content, existing_pdf_path, password):
# Step 1: Generate PDF from HTML
gen_resp = requests.post(f"{API}/pdf/generate", headers=HEADERS,
json={"html": html_content, "options": {"format": "A4", "print_background": True}})
generated_b64 = base64.b64encode(gen_resp.content).decode()
# Step 2: Merge with existing PDF
with open(existing_pdf_path, "rb") as f:
existing_b64 = base64.b64encode(f.read()).decode()
merge_resp = requests.post(f"{API}/pdf/merge", headers=HEADERS,
json={"pdfs": [
{"source": "base64", "data": generated_b64},
{"source": "base64", "data": existing_b64}
]})
merged_b64 = base64.b64encode(merge_resp.content).decode()
# Step 3: Compress
comp_resp = requests.post(f"{API}/pdf/compress", headers=HEADERS,
json={"pdf": {"source": "base64", "data": merged_b64}})
compressed_b64 = base64.b64encode(comp_resp.content).decode()
# Step 4: Password protect
protect_resp = requests.post(f"{API}/pdf/protect", headers=HEADERS,
json={"pdf": {"source": "base64", "data": compressed_b64},
"password": password})
return protect_resp.content
Same result. No subprocess calls, no temp files, no system dependencies. The only import beyond requests is base64, which is in the standard library.
Latency and throughput
"But an API call is slower than a local binary." Sometimes. Let's look at actual numbers.
For a single PDF generation (10-page HTML document):
| Method | Time |
|---|---|
| Local Puppeteer (cold browser) | 3.2s |
| Local Puppeteer (warm browser) | 1.4s |
| LightningPDF API | 1.1s |
LightningPDF keeps a warm browser pool. You're not paying the cold start cost. For single operations, the API is competitive with a warm local browser and faster than a cold one.
For batch operations (100 invoices):
| Method | Total time | Per document |
|---|---|---|
| Local Puppeteer (5 concurrent) | 48s | 480ms |
| LightningPDF batch endpoint | 22s | 220ms |
The batch endpoint parallelizes across our infrastructure. You'd need to manage your own worker pool and concurrency limits to match that locally.
Where the API is slower: if you already have a warm browser pool, you're on a fast network, and your documents are simple. In that case, local generation might save you 50-100ms per document. Whether that matters depends on your use case. For most applications, it doesn't.
When the API approach doesn't work
Being honest: there are cases where a local toolchain is the right call.
Air-gapped environments. If your PDFs contain sensitive data that can't leave your network, you need local tools. LightningPDF processes documents on our servers. We don't store them, but the data does transit our infrastructure.
Sub-10ms latency requirements. If you're generating PDFs in a hot loop and every millisecond counts, local generation with a pre-warmed browser wins. This is rare — most PDF generation is triggered by user actions where 1-2 seconds is fine.
Extreme volume at low margin. If you're generating millions of PDFs per month and your margins are thin, the API cost might exceed the engineering cost of maintaining local tools. Do the math for your specific case.
For everyone else — teams that generate hundreds to hundreds of thousands of PDFs per month, that need multiple PDF operations, and that would rather ship features than debug Ghostscript segfaults — a single API is the right trade-off.
Getting started
The free tier includes 50 PDF generations per month. Enough to test everything and build your integration before committing.
# Generate a PDF
curl -X POST https://lightningpdf.dev/api/v1/pdf/generate \
-H "Authorization: Bearer lpdf_your_key" \
-H "Content-Type: application/json" \
-d '{"html": "<h1>Hello</h1><p>First PDF from a single API.</p>"}'
# Merge two PDFs
curl -X POST https://lightningpdf.dev/api/v1/pdf/merge \
-H "Authorization: Bearer lpdf_your_key" \
-H "Content-Type: application/json" \
-d '{
"pdfs": [
{"source": "url", "url": "https://example.com/doc1.pdf"},
{"source": "url", "url": "https://example.com/doc2.pdf"}
]
}'
One API key. One HTTP client. Fifteen PDF operations. That's the pitch — and it's also the reality we built because we needed it ourselves.
Check out the full API docs or grab an API key at lightningpdf.dev.
LightningPDF Team
Building fast, reliable PDF generation tools for developers.