Self-Hosted PDF Generation for Government & Enterprise
Deploy a self-hosted PDF generation API for government and enterprise environments. On-premise options, Docker deployment, and data sovereignty compliance.
Self-Hosted PDF Generation for Government and Enterprise
Note: Self-hosted deployment is currently in development. Contact us if you're interested in early access. The cloud API at lightningpdf.dev is available now.
Most SaaS teams are perfectly served by a cloud PDF API. Send HTML, get a PDF, move on. But certain organizations cannot send their data to external servers. Government agencies handling classified documents. Healthcare systems processing patient records. Financial institutions under strict data residency requirements. Defense contractors operating air-gapped networks.
Self-hosted PDF generation is the practice of running a PDF rendering engine on infrastructure you control, whether on-premise servers, private cloud VPCs, or air-gapped networks, so that document data never leaves your security perimeter.
For these organizations, self-hosted PDF generation is not a preference. It is a compliance requirement. This guide covers the self-hosted options available in 2026, how to deploy them, and how to architect a PDF generation service that meets enterprise security and scale requirements.
Why Self-Host PDF Generation
There are four primary reasons organizations choose self-hosted PDF generation over cloud APIs. Understanding which apply to your situation determines the right solution.
Data Sovereignty and Residency
Government agencies and regulated businesses must keep data within specific geographic boundaries. The EU's GDPR, Canada's PIPEDA, Australia's Privacy Act, and sector-specific regulations like ITAR (International Traffic in Arms Regulations) in the US all impose data residency requirements.
When you use a cloud PDF API, your document data travels to the API provider's servers. If those servers are in a different country, or even a different jurisdiction within the same country, you may be violating your compliance obligations. Self-hosting puts the rendering engine inside your own infrastructure, ensuring data never crosses boundaries.
Air-Gapped Networks
Military installations, intelligence agencies, certain financial trading systems, and critical infrastructure operators run networks with no internet connectivity. These air-gapped environments cannot reach external APIs by definition. The only option is a PDF engine that runs entirely within the isolated network.
Compliance and Audit Requirements
Regulations like HIPAA (healthcare), SOX (financial reporting), FedRAMP (US government cloud), and PCI DSS (payment processing) require detailed audit trails of how data is processed. When you self-host, every document generation event is logged in your SIEM, subject to your retention policies, and auditable by your compliance team. With a third-party API, you depend on the provider's logging and may not have the access you need for audits.
Performance and Cost at Extreme Scale
Organizations generating millions of PDFs per month may find that self-hosting is cheaper than API pricing at extreme volumes. If you already have Kubernetes clusters with spare capacity, running a PDF engine as another service adds marginal cost. The best PDF API comparison covers cloud pricing tiers in detail, but above roughly 500,000 documents per month, self-hosting often wins on cost alone.
Self-Hosted Options Compared
Several open-source and commercial tools offer self-hosted PDF generation. Here is how they compare.
Gotenberg
Gotenberg is an open-source, Docker-based API for converting HTML, Markdown, URLs, and Office documents to PDF. It wraps Chromium and LibreOffice behind a clean REST API.
Strengths:
- Mature open-source project with active community
- Converts HTML, URL, Markdown, and Office documents
- Clean REST API design
- Good Docker support with official images
Weaknesses:
- Chromium-only rendering (1-3 seconds per PDF, no fast path)
- High memory usage (500MB-1GB per instance due to Chromium)
- No template system or variable substitution
- No visual designer
- No native engine for simple documents
- Requires significant infrastructure tuning for high volumes
Best for: Organizations needing Office document conversion alongside HTML-to-PDF, at moderate volumes.
Stirling PDF
Stirling PDF is a self-hosted web application for PDF manipulation: merge, split, rotate, compress, convert, and OCR. It is not a PDF generation API in the traditional sense.
Strengths:
- Full PDF manipulation toolkit
- Good web UI for manual operations
- Docker deployment
- Active open-source development
Weaknesses:
- Not designed for programmatic HTML-to-PDF generation
- No REST API for document generation from templates
- Oriented toward manual/interactive use
- No template marketplace or variable substitution
Best for: IT departments needing a self-hosted PDF manipulation tool for manual workflows, not automated document generation.
WeasyPrint
WeasyPrint is a Python library that converts HTML/CSS to PDF. It does not use a browser engine; instead, it implements its own CSS rendering engine focused on the CSS Paged Media specification.
Strengths:
- No browser dependency (pure Python rendering)
- Good CSS Paged Media support (headers, footers, page counters)
- Low memory footprint compared to Chromium-based tools
- Can be wrapped in a Flask/FastAPI service
Weaknesses:
- Limited CSS support (no flexbox, no grid, no modern layouts)
- Slow for complex documents (2-5 seconds)
- No template marketplace
- No batch API
- Requires building your own API wrapper
- Python-only (performance ceiling)
Best for: Organizations with simple, print-focused document layouts and Python expertise. This is a viable wkhtmltopdf alternative for teams already in the Python ecosystem.
wkhtmltopdf
wkhtmltopdf was the standard self-hosted HTML-to-PDF tool for over a decade. It uses an old version of WebKit to render HTML.
Strengths:
- Extremely well-known and widely deployed
- Low resource usage compared to Chromium
- Simple command-line interface
Weaknesses:
- Officially deprecated and unmaintained
- Uses WebKit from 2012 (no flexbox, grid, modern CSS)
- Known security vulnerabilities (SSRF, file disclosure)
- Rendering inconsistencies with modern HTML
- No template system
Best for: Legacy systems that already use it. New deployments should choose a modern alternative. See the wkhtmltopdf alternative guide for migration paths.
LightningPDF (Self-Hosted)
LightningPDF offers a Docker-based self-hosted edition that runs the same dual-engine platform (native Go engine + Chromium) on your infrastructure.
Strengths:
- Dual engine: native Go engine (<100ms for invoices) + Chromium for complex layouts
- Same API as the cloud version (zero code changes to migrate)
- Template marketplace access (sync templates to your instance)
- Batch API for high-volume generation
- Visual template designer included
- Single Docker image, minimal configuration
- Commercial support with SLA
Weaknesses:
- Commercial license required for self-hosted (open-core model)
- Smaller community than Gotenberg (newer product)
Best for: Enterprise and government organizations that want reliable PDF generation with templates, fast native rendering, and commercial support, all running on their own infrastructure.
Comparison Matrix
| Feature | LightningPDF | Gotenberg | WeasyPrint | wkhtmltopdf | Stirling PDF |
|---|---|---|---|---|---|
| HTML to PDF | Yes | Yes | Yes | Yes | Limited |
| Native fast path | <100ms | No | No | No | No |
| Chromium engine | Yes | Yes | No | No (WebKit) | No |
| Template system | Yes | No | No | No | No |
| Visual designer | Yes | No | No | No | No |
| Batch API | Yes | No | No | No | No |
| Office conversion | No | Yes | No | No | Yes |
| PDF manipulation | No | No | No | No | Yes |
| Modern CSS | Full | Full | Partial | Minimal | N/A |
| Memory per instance | 100-300MB | 500MB-1GB | 50-100MB | 50-100MB | 200-500MB |
| Docker image | Yes | Yes | DIY | DIY | Yes |
| Commercial support | Yes (SLA) | Community | Community | Abandoned | Community |
| Maintained | Yes | Yes | Yes | No | Yes |
| License | Commercial | MIT | BSD | LGPL | AGPL |
Docker Deployment Guide
This section walks through deploying LightningPDF's self-hosted edition using Docker Compose. The same principles apply to Kubernetes, ECS, or any container orchestration platform.
Prerequisites
- Docker 20.10+ and Docker Compose 2.0+
- At least 2GB RAM (4GB recommended for concurrent generation)
- Disk space for font caches and temporary files (10GB recommended)
- Network access to pull the Docker image (one-time, or pre-load for air-gapped)
Basic Docker Compose Setup
version: '3.8'
services:
lightningpdf:
image: lightningpdf/enterprise:latest
ports:
- "8080:8080"
environment:
- LIGHTNING_LICENSE_KEY=${LIGHTNING_LICENSE_KEY}
- LIGHTNING_ENGINE=dual # native + chromium
- LIGHTNING_MAX_CONCURRENT=10 # concurrent PDF generations
- LIGHTNING_TIMEOUT=30000 # max generation time (ms)
- LIGHTNING_LOG_LEVEL=info
- LIGHTNING_METRICS_ENABLED=true # Prometheus metrics
volumes:
- ./templates:/app/templates # custom templates
- ./fonts:/app/fonts # custom fonts
- pdf-cache:/app/cache # generation cache
deploy:
resources:
limits:
memory: 4G
cpus: '2.0'
reservations:
memory: 2G
cpus: '1.0'
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
restart: unless-stopped
volumes:
pdf-cache:
Starting the Service
# Set your license key
export LIGHTNING_LICENSE_KEY="your-enterprise-license-key"
# Start the service
docker compose up -d
# Verify it is running
curl http://localhost:8080/health
# {"status": "ok", "version": "2.4.0", "engines": ["native", "chromium"]}
Generating Your First PDF
The self-hosted API is identical to the cloud API. The only difference is the endpoint URL:
curl -X POST http://localhost:8080/api/v1/pdf/generate \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"html": "<h1>Self-Hosted PDF</h1><p>Generated on your own infrastructure.</p>",
"options": {"format": "A4"}
}'
If you are migrating from the cloud API, the only change in your code is the base URL. Templates, options, and response format are all identical. This is covered in detail in the API documentation.
Air-Gapped Deployment
For networks without internet access, pre-load the Docker image:
# On an internet-connected machine
docker pull lightningpdf/enterprise:latest
docker save lightningpdf/enterprise:latest -o lightningpdf-enterprise.tar
# Transfer the tar file to the air-gapped network (USB, secure transfer, etc.)
# On the air-gapped machine
docker load -i lightningpdf-enterprise.tar
docker compose up -d
Pre-load any custom fonts and templates the same way. The self-hosted edition does not require internet access after the initial setup.
Architecture for Enterprise Scale
Here is a reference architecture.
Single-Node Architecture (Up to 1,000 PDFs/Hour)
For smaller deployments, a single Docker instance behind your existing load balancer is sufficient:
[Your Application] --> [Nginx/HAProxy] --> [LightningPDF Container]
|
[Template Volume]
[Font Volume]
This handles 1,000 PDFs per hour with the native engine (3.6 seconds per PDF at 100ms each, leaving headroom for Chromium renders).
Multi-Node Architecture (Up to 100,000 PDFs/Hour)
For high-volume enterprise deployments, run multiple instances behind a load balancer with shared storage:
[Load Balancer]
/ | \
[LightningPDF] [LightningPDF] [LightningPDF]
| | |
[Shared NFS / S3-compatible Storage]
(Templates, Fonts, Generated PDFs)
|
[Redis Cache]
(Template cache, rate limiting)
|
[PostgreSQL / MySQL]
(Audit logs, usage tracking)
Kubernetes Deployment
For Kubernetes environments, use a Deployment with horizontal pod autoscaling:
apiVersion: apps/v1
kind: Deployment
metadata:
name: lightningpdf
labels:
app: lightningpdf
spec:
replicas: 3
selector:
matchLabels:
app: lightningpdf
template:
metadata:
labels:
app: lightningpdf
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
containers:
- name: lightningpdf
image: lightningpdf/enterprise:latest
ports:
- containerPort: 8080
env:
- name: LIGHTNING_LICENSE_KEY
valueFrom:
secretKeyRef:
name: lightningpdf-secrets
key: license-key
- name: LIGHTNING_ENGINE
value: "dual"
- name: LIGHTNING_MAX_CONCURRENT
value: "10"
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
volumeMounts:
- name: templates
mountPath: /app/templates
- name: fonts
mountPath: /app/fonts
volumes:
- name: templates
persistentVolumeClaim:
claimName: lightningpdf-templates
- name: fonts
persistentVolumeClaim:
claimName: lightningpdf-fonts
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: lightningpdf-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: lightningpdf
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Security Considerations
Enterprise PDF generation requires attention to security at every layer.
Input Sanitization
User-supplied HTML can contain malicious content. Sanitize all input before passing it to the rendering engine:
const sanitizeHtml = require('sanitize-html');
function sanitizePdfInput(html) {
return sanitizeHtml(html, {
allowedTags: sanitizeHtml.defaults.allowedTags.concat([
'img', 'style', 'table', 'thead', 'tbody', 'tr', 'th', 'td',
'div', 'span', 'h1', 'h2', 'h3', 'h4', 'p', 'br', 'hr',
]),
allowedAttributes: {
'*': ['style', 'class', 'id'],
'img': ['src', 'alt', 'width', 'height'],
'td': ['colspan', 'rowspan'],
'th': ['colspan', 'rowspan'],
},
allowedSchemes: ['https', 'data'], // Block http:// and file://
});
}
Server-Side Request Forgery (SSRF) Prevention
Chromium-based rendering engines can be tricked into making requests to internal services. Configure network policies to prevent the PDF engine from accessing internal endpoints:
# Kubernetes NetworkPolicy: restrict LightningPDF egress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: lightningpdf-egress
spec:
podSelector:
matchLabels:
app: lightningpdf
policyTypes:
- Egress
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 10.0.0.0/8 # Block internal RFC1918
- 172.16.0.0/12 # Block internal RFC1918
- 192.168.0.0/16 # Block internal RFC1918
- 169.254.0.0/16 # Block link-local / metadata
ports:
- protocol: TCP
port: 443
Resource Limits
Prevent denial-of-service by limiting what the PDF engine can consume:
environment:
- LIGHTNING_MAX_HTML_SIZE=5242880 # 5MB max input
- LIGHTNING_TIMEOUT=30000 # 30s max generation
- LIGHTNING_MAX_PAGES=500 # Max pages per document
- LIGHTNING_MAX_CONCURRENT=10 # Concurrent renders
- LIGHTNING_CHROMIUM_DISABLE_JS=false # Set true if JS not needed
Audit Logging
Enable structured logging for compliance audits. Every PDF generation event should be logged with who requested it, what document type, when, and the credential or reference ID:
environment:
- LIGHTNING_LOG_FORMAT=json
- LIGHTNING_LOG_LEVEL=info
- LIGHTNING_AUDIT_LOG=true
This produces structured JSON logs that feed into your SIEM (Splunk, ELK, Datadog):
{
"timestamp": "2026-02-23T14:30:00Z",
"event": "pdf.generated",
"engine": "native",
"duration_ms": 85,
"pages": 1,
"template_id": "invoice-standard",
"request_id": "req-abc123",
"client_ip": "10.0.1.50",
"user_agent": "billing-service/2.1.0"
}
TLS and Authentication
In enterprise environments, always run the service behind TLS. Use mutual TLS (mTLS) for service-to-service authentication:
services:
lightningpdf:
environment:
- LIGHTNING_TLS_ENABLED=true
- LIGHTNING_TLS_CERT=/certs/server.crt
- LIGHTNING_TLS_KEY=/certs/server.key
- LIGHTNING_TLS_CA=/certs/ca.crt # For mTLS
- LIGHTNING_TLS_CLIENT_AUTH=require # Require client certs
volumes:
- ./certs:/certs:ro
Cloud vs Self-Hosted: Decision Framework
| Factor | Cloud API | Self-Hosted |
|---|---|---|
| Data leaves your network | Yes | No |
| Setup time | 5 minutes | 2-4 hours |
| Maintenance | Zero | Ongoing (updates, monitoring) |
| Air-gapped support | No | Yes |
| Compliance audit | Provider-dependent | Full control |
| Cost at 1K/mo | $9 (Starter plan) | Infrastructure + license |
| Cost at 10K/mo | $29 (Pro plan) | Infrastructure + license |
| Cost at 500K+/mo | $249+ | Often cheaper |
| Scaling | Automatic | Manual (K8s HPA helps) |
| Uptime SLA | 99.9% (managed) | Your responsibility |
| Template marketplace | Full access | Sync to your instance |
| Support | Included | Enterprise SLA available |
Choose cloud if you do not have data sovereignty requirements, want zero maintenance, and generate under 500,000 documents per month.
Choose self-hosted if you must keep data on-premise, operate in air-gapped environments, need full audit control, or generate at extreme volumes where self-hosting is more economical.
Hybrid approach: Many enterprises use the cloud API for non-sensitive documents (marketing materials, public reports) and self-hosted for sensitive data (financial statements, healthcare documents, contracts). The API is identical, so your application code routes requests to the appropriate endpoint based on document classification.
Migration from Other Self-Hosted Tools
If you are currently running Gotenberg, wkhtmltopdf, or WeasyPrint and want to migrate to LightningPDF's self-hosted edition, the process takes about 10 minutes.
From Gotenberg
Gotenberg uses a multipart form upload API. LightningPDF uses JSON. The migration requires changing your HTTP client code:
// Before: Gotenberg
const form = new FormData();
form.append('files', htmlBuffer, 'index.html');
const response = await fetch('http://gotenberg:3000/forms/chromium/convert/html', {
method: 'POST',
body: form,
});
// After: LightningPDF
const response = await fetch('http://lightningpdf:8080/api/v1/pdf/generate', {
method: 'POST',
headers: {
'X-API-Key': process.env.LIGHTNINGPDF_API_KEY,
'Content-Type': 'application/json',
},
body: JSON.stringify({
html: htmlString,
options: { format: 'A4' },
}),
});
From wkhtmltopdf
If you are calling wkhtmltopdf as a subprocess, replace the shell command with an HTTP call:
# Before: wkhtmltopdf subprocess
import subprocess
subprocess.run(['wkhtmltopdf', '--page-size', 'A4', 'input.html', 'output.pdf'])
# After: LightningPDF API
import requests
response = requests.post(
'http://lightningpdf:8080/api/v1/pdf/generate',
headers={
'X-API-Key': os.environ['LIGHTNINGPDF_API_KEY'],
'Content-Type': 'application/json',
},
json={
'html': open('input.html').read(),
'options': {'format': 'A4'},
},
)
pdf_bytes = base64.b64decode(response.json()['data']['pdf'])
For a comprehensive guide to migrating away from wkhtmltopdf, including CSS compatibility notes, see the wkhtmltopdf alternative guide.
Real-World Use Cases
Government: Citizen-Facing Documents
A state government agency generates tax forms, permit approvals, and correspondence for millions of residents. All data must remain within government-controlled infrastructure. They deploy LightningPDF in their private cloud, generate documents via the native engine at 80ms each, and store PDFs in their on-premise document management system.
Finance: Regulatory Statements
A fintech company generates monthly account statements for 200,000 customers. PCI DSS requires that cardholder data never leaves their certified environment. They run LightningPDF in their PCI-compliant Kubernetes cluster, use the batch API for monthly statement runs, and the native engine keeps the entire 200,000-statement batch under 6 hours.
Healthcare: Patient Records
A hospital system generates discharge summaries, lab reports, and referral letters containing PHI (Protected Health Information). HIPAA requires that PHI is processed only on BAA-covered infrastructure. Self-hosted LightningPDF runs in their HIPAA-compliant AWS GovCloud VPC, with audit logging feeding into their compliance platform. For more on this use case, see the healthcare document generation guide.
Next Steps
- Contact sales for enterprise licensing and self-hosted pricing
- Review the API documentation (identical for cloud and self-hosted)
- Try the cloud API free to validate your templates before deploying on-premise
- Browse the template marketplace for pre-built document designs
- Review the pricing page to compare cloud vs self-hosted economics
Frequently Asked Questions
Why would I self-host PDF generation instead of using a cloud API?
Self-hosting is necessary when your data cannot leave your security perimeter due to regulations like HIPAA, GDPR, FedRAMP, or ITAR. It is also required for air-gapped networks without internet access. At extreme volumes above 500,000 documents per month, self-hosting can also be more cost-effective than cloud API pricing.
Can I run LightningPDF in an air-gapped network with no internet access?
Yes. Export the Docker image as a tarball on an internet-connected machine, transfer it via secure media to your air-gapped environment, and load it with docker load. The self-hosted edition requires no internet connectivity after initial deployment. Templates and fonts are bundled locally.
How does self-hosted LightningPDF compare to Gotenberg for enterprise use?
LightningPDF offers a native Go engine that generates simple documents in under 100 milliseconds, compared to Gotenberg's Chromium-only approach at 1-3 seconds. LightningPDF also includes a template system, visual designer, batch API, and commercial support with SLA. Gotenberg is free and open-source but requires more infrastructure tuning at scale.
Related Reading
- Best PDF Generation APIs in 2026 — Cloud API comparison and pricing
- wkhtmltopdf Alternative — Migration guide from deprecated tools
- LightningPDF vs Puppeteer — Build vs buy for self-hosted Chromium
- Healthcare Document Generation — HIPAA-compliant PDF workflows
- PDF Generation for Fintech — Financial document compliance
- Generate PDFs in Go — Go integration for self-hosted deployments
- HTML to PDF: The Complete Guide — All rendering approaches compared
- PDF Contract Generation — Enterprise document generation patterns
LightningPDF Team
Building fast, reliable PDF generation tools for developers.