HTML to PDF - The Complete Guide for 2026
Comprehensive guide to converting HTML to PDF in 2026. Compare browser-based, library-based, and API approaches with code examples and best practices.
HTML to PDF: The Complete Guide for 2026
Converting HTML to PDF is one of the most common requirements in modern web development. Whether you're building invoices, reports, receipts, or certificates, you need a reliable way to transform HTML into print-ready PDFs.
This comprehensive guide covers everything you need to know about HTML-to-PDF conversion in 2026, including the different approaches, common pitfalls, performance considerations, and how to choose the right solution for your needs.
Why HTML to PDF?
HTML is the universal language of the web. It offers:
- Rich formatting: CSS for styling, layouts, and responsive design
- Dynamic content: Template engines for variable data
- Familiar tooling: Every developer knows HTML/CSS
- Reusability: Same HTML can power web pages and PDFs
The challenge? Browsers render HTML to screens, not paper. Converting HTML to PDF requires specialized tools that understand print layouts, page breaks, and PDF specifications.
The Three Approaches
There are three main approaches to HTML-to-PDF conversion, each with tradeoffs:
1. Browser-Based (Headless Chrome/Chromium)
How it works: Launch a headless browser, load HTML, trigger print-to-PDF.
Tools: Puppeteer, Playwright, Selenium
Pros:
- ✅ Excellent CSS support (same as Chrome)
- ✅ JavaScript execution
- ✅ Renders exactly like browser
- ✅ Handles modern web features (flexbox, grid, animations)
Cons:
- ❌ Slow (1-3 seconds per PDF)
- ❌ Memory-intensive (100-300MB per instance)
- ❌ Requires infrastructure (Docker, K8s)
- ❌ Complex page break handling
Best for: Complex web layouts, modern CSS, JavaScript-heavy content
2. Library-Based (wkhtmltopdf, PrinceXML)
How it works: Specialized rendering engines that convert HTML to PDF without a full browser.
Tools: wkhtmltopdf, PrinceXML, WeasyPrint, pdfkit
Pros:
- ✅ Faster than browsers (500ms-2s)
- ✅ Lower memory usage
- ✅ Better page break control (CSS Paged Media)
- ✅ Designed for print
Cons:
- ❌ Limited CSS support (especially modern features)
- ❌ No JavaScript execution (mostly)
- ❌ Inconsistent rendering vs browsers
- ❌ Licensing costs (PrinceXML: $3,800+)
Best for: Print-focused documents, advanced page layout, when CSS Paged Media features are needed
3. API-Based (Managed Services)
How it works: Cloud-hosted services handle all infrastructure, scaling, and rendering.
Tools: LightningPDF, DocRaptor, PDFShift, CraftMyPDF
Pros:
- ✅ Zero infrastructure management
- ✅ Fast (sub-100ms with native engines)
- ✅ Template marketplaces
- ✅ Batch processing
- ✅ Automatic scaling
Cons:
- ❌ Recurring costs (vs one-time library purchase)
- ❌ External dependency
- ❌ Data leaves your infrastructure (unless self-hosted)
Best for: Production applications, high volumes, teams wanting to focus on product not infrastructure
Comparison Table
| Approach | Speed | Cost | Maintenance | CSS Support | Scaling |
|---|---|---|---|---|---|
| Puppeteer | ⭐⭐ (1-3s) | Free* | ⭐⭐ High | ⭐⭐⭐⭐⭐ Full | Manual |
| wkhtmltopdf | ⭐⭐⭐ (500ms) | Free | ⭐⭐⭐ Low | ⭐⭐ Limited | Manual |
| PrinceXML | ⭐⭐⭐ (1-2s) | $3,800+ | ⭐⭐⭐ Low | ⭐⭐⭐⭐ Excellent | Manual |
| LightningPDF | ⭐⭐⭐⭐⭐ (<100ms) | $0.01/doc | ⭐⭐⭐⭐⭐ None | ⭐⭐⭐⭐⭐ Full | Automatic |
*Free library, but infrastructure costs $200-500/month for production
Common Pitfalls and Solutions
1. Page Breaks
Problem: Content splits awkwardly across pages
Solution: Use CSS page break properties
/* Avoid page breaks inside elements */
.invoice-item {
page-break-inside: avoid;
}
/* Force page break before element */
.new-section {
page-break-before: always;
}
/* Control orphan/widow lines */
p {
orphans: 3;
widows: 3;
}
2. Missing Fonts
Problem: PDFs show default fonts instead of custom fonts
Solution: Embed fonts with @font-face or use web-safe fonts
@font-face {
font-family: 'CustomFont';
src: url('https://yoursite.com/fonts/custom.woff2') format('woff2');
}
body {
font-family: 'CustomFont', 'Arial', sans-serif;
}
3. Images Not Loading
Problem: Images appear as broken in PDF
Solution: Use absolute URLs and ensure images load before PDF generation
<!-- Bad: Relative path -->
<img src="/images/logo.png">
<!-- Good: Absolute URL -->
<img src="https://yoursite.com/images/logo.png">
<!-- Better: Base64 embedded -->
<img src="data:image/png;base64,iVBORw0KG...">
4. CSS Not Applied
Problem: Styles don't render in PDF
Solution: Use inline styles or embedded <style> tags
<!-- External stylesheets may not load -->
<link rel="stylesheet" href="/styles.css">
<!-- Embed styles directly -->
<style>
body { font-family: Arial; }
.header { background: #4F46E5; }
</style>
5. Slow Generation
Problem: Each PDF takes 3-5 seconds to generate
Solution: Use native engines for simple documents
// Slow: Always use Chromium (1-3s)
const pdf = await page.pdf();
// Fast: Use native engine for invoices (<100ms)
const pdf = await lightningpdf.generate({
template: 'invoice',
engine: 'native' // Sub-100ms for simple layouts
});
Code Examples
Puppeteer (Node.js)
const puppeteer = require('puppeteer');
async function generatePDF(html) {
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox']
});
const page = await browser.newPage();
await page.setContent(html, { waitUntil: 'networkidle0' });
const pdf = await page.pdf({
format: 'A4',
printBackground: true,
margin: { top: '20px', right: '20px', bottom: '20px', left: '20px' }
});
await browser.close();
return pdf;
}
// Usage
const html = '<h1>Hello PDF</h1>';
const pdf = await generatePDF(html);
Performance: 2-3 seconds, 200MB memory
wkhtmltopdf (Shell)
# Install
apt-get install wkhtmltopdf
# Generate PDF
wkhtmltopdf \
--page-size A4 \
--margin-top 20mm \
input.html output.pdf
Performance: 500ms-1s, 50MB memory
LightningPDF (Any Language)
# cURL
curl https://lightningpdf.dev/api/v1/generate \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"html": "<h1>Hello PDF</h1>"}' \
-o output.pdf
// JavaScript
const response = await fetch('https://lightningpdf.dev/api/v1/generate', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({ html: '<h1>Hello PDF</h1>' })
});
const pdf = await response.buffer();
# Python
import requests
response = requests.post(
'https://lightningpdf.dev/api/v1/generate',
headers={'Authorization': 'Bearer YOUR_API_KEY'},
json={'html': '<h1>Hello PDF</h1>'}
)
pdf = response.content
Performance: <100ms for simple HTML (native engine), 10 lines of code
Advanced Techniques
Headers and Footers
<style>
@page {
margin: 2cm;
@top-center {
content: "Company Report";
}
@bottom-right {
content: "Page " counter(page) " of " counter(pages);
}
}
</style>
Conditional Page Breaks
/* Keep tables together */
table {
page-break-inside: avoid;
}
/* Start chapters on new page */
.chapter {
page-break-before: always;
}
/* Prevent lonely headings */
h2, h3 {
page-break-after: avoid;
}
Responsive Print Layouts
@media print {
/* Hide navigation in PDFs */
nav { display: none; }
/* Adjust colors for print */
body { color: #000; background: #fff; }
/* Show URLs for links */
a:after { content: " (" attr(href) ")"; }
}
Performance Optimization
1. Minimize HTML Size
// Bad: Lots of unused CSS
<link rel="stylesheet" href="bootstrap.min.css">
// Good: Only styles you need
<style>
.invoice { font-family: Arial; }
</style>
2. Preload Images
// Wait for images to load before generating PDF
await page.evaluate(() => {
return Promise.all(
Array.from(document.images)
.map(img => img.complete ? Promise.resolve() :
new Promise(resolve => { img.onload = resolve; }))
);
});
3. Reuse Browser Instances
// Bad: Launch new browser for each PDF (slow)
for (const html of documents) {
const browser = await puppeteer.launch();
await generatePDF(browser, html);
await browser.close(); // 3s overhead per PDF
}
// Good: Reuse browser (fast)
const browser = await puppeteer.launch();
for (const html of documents) {
await generatePDF(browser, html);
}
await browser.close();
4. Use Native Engines for Simple Documents
// Slow: Chromium for everything (1-3s each)
const invoicePDF = await chromium.generate(invoiceHTML);
const receiptPDF = await chromium.generate(receiptHTML);
// Fast: Native engine for simple docs (<100ms each)
const invoicePDF = await native.generate(invoiceHTML); // 80ms
const receiptPDF = await native.generate(receiptHTML); // 75ms
Choosing the Right Solution
Use Puppeteer/Playwright if you:
- Need exact browser rendering
- Already have infrastructure
- Generate <100 PDFs/day
- Have JavaScript-heavy content
- Need full control over rendering
Use wkhtmltopdf if you:
- Need simple, fast conversion
- Don't require modern CSS
- Have static content
- Want free, open-source solution
- Can tolerate rendering differences
Use PrinceXML if you:
- Need advanced print features (running headers, footnotes)
- Generate professional publications (books, magazines)
- Require PDF/A compliance
- Have budget for commercial license
Use LightningPDF if you:
- Generate invoices, receipts, reports at scale
- Want sub-100ms generation
- Need template marketplace
- Want batch processing (1000s of PDFs per call)
- Prefer managed service over self-hosting
- Need predictable $0.01/doc pricing
Real-World Use Case: E-commerce Invoices
Requirement: Generate 10,000 invoices/month
Option 1: Puppeteer (Self-Hosted)
Cost: $300/month infrastructure + 40 hours setup
Time: 2s × 10,000 = 5.5 hours/month
Code: 500+ lines (API, queue, storage, scaling)
Maintenance: 5 hours/month
Option 2: LightningPDF
Cost: $29/month (Pro plan)
Time: 0.08s × 10,000 = 13 minutes/month
Code: 10 lines
Maintenance: 0 hours (managed service)
Winner: LightningPDF saves $270/month and 45 hours/month while being 25x faster.
Security Considerations
Input Sanitization
// Sanitize user-provided HTML
const sanitizeHTML = require('sanitize-html');
const cleanHTML = sanitizeHTML(userHTML, {
allowedTags: ['h1', 'h2', 'p', 'strong', 'em', 'table'],
allowedAttributes: { 'td': ['colspan'], 'th': ['colspan'] }
});
const pdf = await generatePDF(cleanHTML);
Resource Limits
// Prevent malicious HTML from consuming resources
await page.setContent(html, {
timeout: 5000, // 5s max
waitUntil: 'domcontentloaded' // Don't wait for everything
});
Sandboxing
// Run Chromium in sandbox mode
const browser = await puppeteer.launch({
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage'
]
});
Testing Your PDFs
Visual Regression Testing
const { toMatchImageSnapshot } = require('jest-image-snapshot');
expect.extend({ toMatchImageSnapshot });
test('invoice renders correctly', async () => {
const pdf = await generatePDF(invoiceHTML);
const image = await convertPDFToImage(pdf);
expect(image).toMatchImageSnapshot();
});
Content Validation
const pdfParse = require('pdf-parse');
test('invoice contains correct data', async () => {
const pdf = await generateInvoice({ total: 1000 });
const data = await pdfParse(pdf);
expect(data.text).toContain('$1,000.00');
expect(data.text).toContain('Invoice #');
});
Conclusion
HTML-to-PDF conversion in 2026 offers multiple approaches:
- Browser-based (Puppeteer): Best for complex layouts, JavaScript-heavy content
- Library-based (wkhtmltopdf): Good for simple, static content
- API-based (LightningPDF): Best for production apps wanting speed, templates, and zero maintenance
For 95% of modern applications, an API like LightningPDF is the right choice:
- 10-25x faster than DIY
- Template marketplace (save 10+ hours per template)
- Batch processing
- Zero infrastructure management
- Predictable $0.01/doc pricing
Ready to start? Try LightningPDF free — 50 PDFs/month, full template marketplace, no credit card required.
Related Reading
- Best PDF Generation APIs in 2026 — Compare all 7 top PDF APIs
- How to Fix PDF Page Breaks — Solve the #1 HTML-to-PDF problem
- Generate PDFs in Node.js — Step-by-step Node.js tutorial
- Generate PDFs in Python — Python integration guide
- Generate PDFs in Go — Go tutorial with invoice example
- LightningPDF vs Puppeteer — Build vs buy analysis