Self-Hosted PDF Generation for Government & Enterprise

Deploy a self-hosted PDF generation API for government and enterprise environments. On-premise options, Docker deployment, and data sovereignty compliance.

By LightningPDF Team · · 11 min read
Self-Hosted PDF Generation for Government & Enterprise
TL;DR: Government and enterprise organizations need self-hosted PDF generation for data sovereignty and compliance. LightningPDF is building a self-hosted Docker deployment option — contact us for early access.

Self-Hosted PDF Generation for Government and Enterprise

Note: Self-hosted deployment is currently in development. Contact us if you're interested in early access. The cloud API at lightningpdf.dev is available now.

Most SaaS teams are perfectly served by a cloud PDF API. Send HTML, get a PDF, move on. But certain organizations cannot send their data to external servers. Government agencies handling classified documents. Healthcare systems processing patient records. Financial institutions under strict data residency requirements. Defense contractors operating air-gapped networks.

Self-hosted PDF generation is the practice of running a PDF rendering engine on infrastructure you control, whether on-premise servers, private cloud VPCs, or air-gapped networks, so that document data never leaves your security perimeter.

For these organizations, self-hosted PDF generation is not a preference. It is a compliance requirement. This guide covers the self-hosted options available in 2026, how to deploy them, and how to architect a PDF generation service that meets enterprise security and scale requirements.

Why Self-Host PDF Generation

There are four primary reasons organizations choose self-hosted PDF generation over cloud APIs. Understanding which apply to your situation determines the right solution.

Data Sovereignty and Residency

Government agencies and regulated businesses must keep data within specific geographic boundaries. The EU's GDPR, Canada's PIPEDA, Australia's Privacy Act, and sector-specific regulations like ITAR (International Traffic in Arms Regulations) in the US all impose data residency requirements.

When you use a cloud PDF API, your document data travels to the API provider's servers. If those servers are in a different country, or even a different jurisdiction within the same country, you may be violating your compliance obligations. Self-hosting puts the rendering engine inside your own infrastructure, ensuring data never crosses boundaries.

Air-Gapped Networks

Military installations, intelligence agencies, certain financial trading systems, and critical infrastructure operators run networks with no internet connectivity. These air-gapped environments cannot reach external APIs by definition. The only option is a PDF engine that runs entirely within the isolated network.

Compliance and Audit Requirements

Regulations like HIPAA (healthcare), SOX (financial reporting), FedRAMP (US government cloud), and PCI DSS (payment processing) require detailed audit trails of how data is processed. When you self-host, every document generation event is logged in your SIEM, subject to your retention policies, and auditable by your compliance team. With a third-party API, you depend on the provider's logging and may not have the access you need for audits.

Performance and Cost at Extreme Scale

Organizations generating millions of PDFs per month may find that self-hosting is cheaper than API pricing at extreme volumes. If you already have Kubernetes clusters with spare capacity, running a PDF engine as another service adds marginal cost. The best PDF API comparison covers cloud pricing tiers in detail, but above roughly 500,000 documents per month, self-hosting often wins on cost alone.

Self-Hosted Options Compared

Several open-source and commercial tools offer self-hosted PDF generation. Here is how they compare.

Gotenberg

Gotenberg is an open-source, Docker-based API for converting HTML, Markdown, URLs, and Office documents to PDF. It wraps Chromium and LibreOffice behind a clean REST API.

Strengths:

  • Mature open-source project with active community
  • Converts HTML, URL, Markdown, and Office documents
  • Clean REST API design
  • Good Docker support with official images

Weaknesses:

  • Chromium-only rendering (1-3 seconds per PDF, no fast path)
  • High memory usage (500MB-1GB per instance due to Chromium)
  • No template system or variable substitution
  • No visual designer
  • No native engine for simple documents
  • Requires significant infrastructure tuning for high volumes

Best for: Organizations needing Office document conversion alongside HTML-to-PDF, at moderate volumes.

Stirling PDF

Stirling PDF is a self-hosted web application for PDF manipulation: merge, split, rotate, compress, convert, and OCR. It is not a PDF generation API in the traditional sense.

Strengths:

  • Full PDF manipulation toolkit
  • Good web UI for manual operations
  • Docker deployment
  • Active open-source development

Weaknesses:

  • Not designed for programmatic HTML-to-PDF generation
  • No REST API for document generation from templates
  • Oriented toward manual/interactive use
  • No template marketplace or variable substitution

Best for: IT departments needing a self-hosted PDF manipulation tool for manual workflows, not automated document generation.

WeasyPrint

WeasyPrint is a Python library that converts HTML/CSS to PDF. It does not use a browser engine; instead, it implements its own CSS rendering engine focused on the CSS Paged Media specification.

Strengths:

  • No browser dependency (pure Python rendering)
  • Good CSS Paged Media support (headers, footers, page counters)
  • Low memory footprint compared to Chromium-based tools
  • Can be wrapped in a Flask/FastAPI service

Weaknesses:

  • Limited CSS support (no flexbox, no grid, no modern layouts)
  • Slow for complex documents (2-5 seconds)
  • No template marketplace
  • No batch API
  • Requires building your own API wrapper
  • Python-only (performance ceiling)

Best for: Organizations with simple, print-focused document layouts and Python expertise. This is a viable wkhtmltopdf alternative for teams already in the Python ecosystem.

wkhtmltopdf

wkhtmltopdf was the standard self-hosted HTML-to-PDF tool for over a decade. It uses an old version of WebKit to render HTML.

Strengths:

  • Extremely well-known and widely deployed
  • Low resource usage compared to Chromium
  • Simple command-line interface

Weaknesses:

  • Officially deprecated and unmaintained
  • Uses WebKit from 2012 (no flexbox, grid, modern CSS)
  • Known security vulnerabilities (SSRF, file disclosure)
  • Rendering inconsistencies with modern HTML
  • No template system

Best for: Legacy systems that already use it. New deployments should choose a modern alternative. See the wkhtmltopdf alternative guide for migration paths.

LightningPDF (Self-Hosted)

LightningPDF offers a Docker-based self-hosted edition that runs the same dual-engine platform (native Go engine + Chromium) on your infrastructure.

Strengths:

  • Dual engine: native Go engine (<100ms for invoices) + Chromium for complex layouts
  • Same API as the cloud version (zero code changes to migrate)
  • Template marketplace access (sync templates to your instance)
  • Batch API for high-volume generation
  • Visual template designer included
  • Single Docker image, minimal configuration
  • Commercial support with SLA

Weaknesses:

  • Commercial license required for self-hosted (open-core model)
  • Smaller community than Gotenberg (newer product)

Best for: Enterprise and government organizations that want reliable PDF generation with templates, fast native rendering, and commercial support, all running on their own infrastructure.

Comparison Matrix

Feature LightningPDF Gotenberg WeasyPrint wkhtmltopdf Stirling PDF
HTML to PDF Yes Yes Yes Yes Limited
Native fast path <100ms No No No No
Chromium engine Yes Yes No No (WebKit) No
Template system Yes No No No No
Visual designer Yes No No No No
Batch API Yes No No No No
Office conversion No Yes No No Yes
PDF manipulation No No No No Yes
Modern CSS Full Full Partial Minimal N/A
Memory per instance 100-300MB 500MB-1GB 50-100MB 50-100MB 200-500MB
Docker image Yes Yes DIY DIY Yes
Commercial support Yes (SLA) Community Community Abandoned Community
Maintained Yes Yes Yes No Yes
License Commercial MIT BSD LGPL AGPL

Docker Deployment Guide

This section walks through deploying LightningPDF's self-hosted edition using Docker Compose. The same principles apply to Kubernetes, ECS, or any container orchestration platform.

Prerequisites

  • Docker 20.10+ and Docker Compose 2.0+
  • At least 2GB RAM (4GB recommended for concurrent generation)
  • Disk space for font caches and temporary files (10GB recommended)
  • Network access to pull the Docker image (one-time, or pre-load for air-gapped)

Basic Docker Compose Setup

version: '3.8'

services:
  lightningpdf:
    image: lightningpdf/enterprise:latest
    ports:
      - "8080:8080"
    environment:
      - LIGHTNING_LICENSE_KEY=${LIGHTNING_LICENSE_KEY}
      - LIGHTNING_ENGINE=dual           # native + chromium
      - LIGHTNING_MAX_CONCURRENT=10     # concurrent PDF generations
      - LIGHTNING_TIMEOUT=30000         # max generation time (ms)
      - LIGHTNING_LOG_LEVEL=info
      - LIGHTNING_METRICS_ENABLED=true  # Prometheus metrics
    volumes:
      - ./templates:/app/templates      # custom templates
      - ./fonts:/app/fonts              # custom fonts
      - pdf-cache:/app/cache            # generation cache
    deploy:
      resources:
        limits:
          memory: 4G
          cpus: '2.0'
        reservations:
          memory: 2G
          cpus: '1.0'
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
    restart: unless-stopped

volumes:
  pdf-cache:

Starting the Service

# Set your license key
export LIGHTNING_LICENSE_KEY="your-enterprise-license-key"

# Start the service
docker compose up -d

# Verify it is running
curl http://localhost:8080/health
# {"status": "ok", "version": "2.4.0", "engines": ["native", "chromium"]}

Generating Your First PDF

The self-hosted API is identical to the cloud API. The only difference is the endpoint URL:

curl -X POST http://localhost:8080/api/v1/pdf/generate \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "html": "<h1>Self-Hosted PDF</h1><p>Generated on your own infrastructure.</p>",
    "options": {"format": "A4"}
  }'

If you are migrating from the cloud API, the only change in your code is the base URL. Templates, options, and response format are all identical. This is covered in detail in the API documentation.

Air-Gapped Deployment

For networks without internet access, pre-load the Docker image:

# On an internet-connected machine
docker pull lightningpdf/enterprise:latest
docker save lightningpdf/enterprise:latest -o lightningpdf-enterprise.tar

# Transfer the tar file to the air-gapped network (USB, secure transfer, etc.)

# On the air-gapped machine
docker load -i lightningpdf-enterprise.tar
docker compose up -d

Pre-load any custom fonts and templates the same way. The self-hosted edition does not require internet access after the initial setup.

Architecture for Enterprise Scale

Here is a reference architecture.

Single-Node Architecture (Up to 1,000 PDFs/Hour)

For smaller deployments, a single Docker instance behind your existing load balancer is sufficient:

[Your Application] --> [Nginx/HAProxy] --> [LightningPDF Container]
                                                    |
                                            [Template Volume]
                                            [Font Volume]

This handles 1,000 PDFs per hour with the native engine (3.6 seconds per PDF at 100ms each, leaving headroom for Chromium renders).

Multi-Node Architecture (Up to 100,000 PDFs/Hour)

For high-volume enterprise deployments, run multiple instances behind a load balancer with shared storage:

                        [Load Balancer]
                       /       |       \
              [LightningPDF] [LightningPDF] [LightningPDF]
                  |              |              |
              [Shared NFS / S3-compatible Storage]
              (Templates, Fonts, Generated PDFs)
                              |
                       [Redis Cache]
              (Template cache, rate limiting)
                              |
                    [PostgreSQL / MySQL]
              (Audit logs, usage tracking)

Kubernetes Deployment

For Kubernetes environments, use a Deployment with horizontal pod autoscaling:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: lightningpdf
  labels:
    app: lightningpdf
spec:
  replicas: 3
  selector:
    matchLabels:
      app: lightningpdf
  template:
    metadata:
      labels:
        app: lightningpdf
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics"
    spec:
      containers:
        - name: lightningpdf
          image: lightningpdf/enterprise:latest
          ports:
            - containerPort: 8080
          env:
            - name: LIGHTNING_LICENSE_KEY
              valueFrom:
                secretKeyRef:
                  name: lightningpdf-secrets
                  key: license-key
            - name: LIGHTNING_ENGINE
              value: "dual"
            - name: LIGHTNING_MAX_CONCURRENT
              value: "10"
          resources:
            requests:
              memory: "2Gi"
              cpu: "1000m"
            limits:
              memory: "4Gi"
              cpu: "2000m"
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10
          volumeMounts:
            - name: templates
              mountPath: /app/templates
            - name: fonts
              mountPath: /app/fonts
      volumes:
        - name: templates
          persistentVolumeClaim:
            claimName: lightningpdf-templates
        - name: fonts
          persistentVolumeClaim:
            claimName: lightningpdf-fonts
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: lightningpdf-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: lightningpdf
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Security Considerations

Enterprise PDF generation requires attention to security at every layer.

Input Sanitization

User-supplied HTML can contain malicious content. Sanitize all input before passing it to the rendering engine:

const sanitizeHtml = require('sanitize-html');

function sanitizePdfInput(html) {
  return sanitizeHtml(html, {
    allowedTags: sanitizeHtml.defaults.allowedTags.concat([
      'img', 'style', 'table', 'thead', 'tbody', 'tr', 'th', 'td',
      'div', 'span', 'h1', 'h2', 'h3', 'h4', 'p', 'br', 'hr',
    ]),
    allowedAttributes: {
      '*': ['style', 'class', 'id'],
      'img': ['src', 'alt', 'width', 'height'],
      'td': ['colspan', 'rowspan'],
      'th': ['colspan', 'rowspan'],
    },
    allowedSchemes: ['https', 'data'],  // Block http:// and file://
  });
}

Server-Side Request Forgery (SSRF) Prevention

Chromium-based rendering engines can be tricked into making requests to internal services. Configure network policies to prevent the PDF engine from accessing internal endpoints:

# Kubernetes NetworkPolicy: restrict LightningPDF egress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: lightningpdf-egress
spec:
  podSelector:
    matchLabels:
      app: lightningpdf
  policyTypes:
    - Egress
  egress:
    - to:
        - ipBlock:
            cidr: 0.0.0.0/0
            except:
              - 10.0.0.0/8        # Block internal RFC1918
              - 172.16.0.0/12     # Block internal RFC1918
              - 192.168.0.0/16    # Block internal RFC1918
              - 169.254.0.0/16    # Block link-local / metadata
      ports:
        - protocol: TCP
          port: 443

Resource Limits

Prevent denial-of-service by limiting what the PDF engine can consume:

environment:
  - LIGHTNING_MAX_HTML_SIZE=5242880       # 5MB max input
  - LIGHTNING_TIMEOUT=30000               # 30s max generation
  - LIGHTNING_MAX_PAGES=500               # Max pages per document
  - LIGHTNING_MAX_CONCURRENT=10           # Concurrent renders
  - LIGHTNING_CHROMIUM_DISABLE_JS=false   # Set true if JS not needed

Audit Logging

Enable structured logging for compliance audits. Every PDF generation event should be logged with who requested it, what document type, when, and the credential or reference ID:

environment:
  - LIGHTNING_LOG_FORMAT=json
  - LIGHTNING_LOG_LEVEL=info
  - LIGHTNING_AUDIT_LOG=true

This produces structured JSON logs that feed into your SIEM (Splunk, ELK, Datadog):

{
  "timestamp": "2026-02-23T14:30:00Z",
  "event": "pdf.generated",
  "engine": "native",
  "duration_ms": 85,
  "pages": 1,
  "template_id": "invoice-standard",
  "request_id": "req-abc123",
  "client_ip": "10.0.1.50",
  "user_agent": "billing-service/2.1.0"
}

TLS and Authentication

In enterprise environments, always run the service behind TLS. Use mutual TLS (mTLS) for service-to-service authentication:

services:
  lightningpdf:
    environment:
      - LIGHTNING_TLS_ENABLED=true
      - LIGHTNING_TLS_CERT=/certs/server.crt
      - LIGHTNING_TLS_KEY=/certs/server.key
      - LIGHTNING_TLS_CA=/certs/ca.crt         # For mTLS
      - LIGHTNING_TLS_CLIENT_AUTH=require       # Require client certs
    volumes:
      - ./certs:/certs:ro

Cloud vs Self-Hosted: Decision Framework

Factor Cloud API Self-Hosted
Data leaves your network Yes No
Setup time 5 minutes 2-4 hours
Maintenance Zero Ongoing (updates, monitoring)
Air-gapped support No Yes
Compliance audit Provider-dependent Full control
Cost at 1K/mo $9 (Starter plan) Infrastructure + license
Cost at 10K/mo $29 (Pro plan) Infrastructure + license
Cost at 500K+/mo $249+ Often cheaper
Scaling Automatic Manual (K8s HPA helps)
Uptime SLA 99.9% (managed) Your responsibility
Template marketplace Full access Sync to your instance
Support Included Enterprise SLA available

Choose cloud if you do not have data sovereignty requirements, want zero maintenance, and generate under 500,000 documents per month.

Choose self-hosted if you must keep data on-premise, operate in air-gapped environments, need full audit control, or generate at extreme volumes where self-hosting is more economical.

Hybrid approach: Many enterprises use the cloud API for non-sensitive documents (marketing materials, public reports) and self-hosted for sensitive data (financial statements, healthcare documents, contracts). The API is identical, so your application code routes requests to the appropriate endpoint based on document classification.

Migration from Other Self-Hosted Tools

If you are currently running Gotenberg, wkhtmltopdf, or WeasyPrint and want to migrate to LightningPDF's self-hosted edition, the process takes about 10 minutes.

From Gotenberg

Gotenberg uses a multipart form upload API. LightningPDF uses JSON. The migration requires changing your HTTP client code:

// Before: Gotenberg
const form = new FormData();
form.append('files', htmlBuffer, 'index.html');
const response = await fetch('http://gotenberg:3000/forms/chromium/convert/html', {
  method: 'POST',
  body: form,
});

// After: LightningPDF
const response = await fetch('http://lightningpdf:8080/api/v1/pdf/generate', {
  method: 'POST',
  headers: {
    'X-API-Key': process.env.LIGHTNINGPDF_API_KEY,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    html: htmlString,
    options: { format: 'A4' },
  }),
});

From wkhtmltopdf

If you are calling wkhtmltopdf as a subprocess, replace the shell command with an HTTP call:

# Before: wkhtmltopdf subprocess
import subprocess
subprocess.run(['wkhtmltopdf', '--page-size', 'A4', 'input.html', 'output.pdf'])

# After: LightningPDF API
import requests
response = requests.post(
    'http://lightningpdf:8080/api/v1/pdf/generate',
    headers={
        'X-API-Key': os.environ['LIGHTNINGPDF_API_KEY'],
        'Content-Type': 'application/json',
    },
    json={
        'html': open('input.html').read(),
        'options': {'format': 'A4'},
    },
)
pdf_bytes = base64.b64decode(response.json()['data']['pdf'])

For a comprehensive guide to migrating away from wkhtmltopdf, including CSS compatibility notes, see the wkhtmltopdf alternative guide.

Real-World Use Cases

Government: Citizen-Facing Documents

A state government agency generates tax forms, permit approvals, and correspondence for millions of residents. All data must remain within government-controlled infrastructure. They deploy LightningPDF in their private cloud, generate documents via the native engine at 80ms each, and store PDFs in their on-premise document management system.

Finance: Regulatory Statements

A fintech company generates monthly account statements for 200,000 customers. PCI DSS requires that cardholder data never leaves their certified environment. They run LightningPDF in their PCI-compliant Kubernetes cluster, use the batch API for monthly statement runs, and the native engine keeps the entire 200,000-statement batch under 6 hours.

Healthcare: Patient Records

A hospital system generates discharge summaries, lab reports, and referral letters containing PHI (Protected Health Information). HIPAA requires that PHI is processed only on BAA-covered infrastructure. Self-hosted LightningPDF runs in their HIPAA-compliant AWS GovCloud VPC, with audit logging feeding into their compliance platform. For more on this use case, see the healthcare document generation guide.

Next Steps

  1. Contact sales for enterprise licensing and self-hosted pricing
  2. Review the API documentation (identical for cloud and self-hosted)
  3. Try the cloud API free to validate your templates before deploying on-premise
  4. Browse the template marketplace for pre-built document designs
  5. Review the pricing page to compare cloud vs self-hosted economics

Frequently Asked Questions

Why would I self-host PDF generation instead of using a cloud API?

Self-hosting is necessary when your data cannot leave your security perimeter due to regulations like HIPAA, GDPR, FedRAMP, or ITAR. It is also required for air-gapped networks without internet access. At extreme volumes above 500,000 documents per month, self-hosting can also be more cost-effective than cloud API pricing.

Can I run LightningPDF in an air-gapped network with no internet access?

Yes. Export the Docker image as a tarball on an internet-connected machine, transfer it via secure media to your air-gapped environment, and load it with docker load. The self-hosted edition requires no internet connectivity after initial deployment. Templates and fonts are bundled locally.

How does self-hosted LightningPDF compare to Gotenberg for enterprise use?

LightningPDF offers a native Go engine that generates simple documents in under 100 milliseconds, compared to Gotenberg's Chromium-only approach at 1-3 seconds. LightningPDF also includes a template system, visual designer, batch API, and commercial support with SLA. Gotenberg is free and open-source but requires more infrastructure tuning at scale.

L

LightningPDF Team

Building fast, reliable PDF generation tools for developers.

Ready to generate PDFs?

Start free with 50 PDFs per month. No credit card required.

Get Started Free