Blog

Tutorials, guides, and insights about PDF generation

Indic script shaping in PDF: why most PDF tools render Hindi as boxes

The reason Hindi, Bengali, Tamil, and other Indic scripts come out as tofu boxes in DOMPDF, mPDF, TCPDF, and most other PDF libraries is not a font problem. It is a text-shaping problem. Here is what the shaping does, why HarfBuzz exists, and what running Chromium gets you for free.

By LightningPDF Jun 4, 2026 5 min read

internationalization indic harfbuzz pdf

Why our PDF API uses two engines

A single-engine PDF API forces a trade-off between speed and capability. Splitting the engine into a Go-native fast path and a Chromium fallback removes the trade-off for the most common shapes. Here is how we route, what we measured, and what we got wrong.

By LightningPDF Jun 4, 2026 5 min read

architecture performance pdf

Validator-passing Peppol BIS 3.0 without an enterprise SDK

How we built a Peppol BIS Billing 3.0 emitter that passes the official Schematron in around 200 lines of Go. The schema is shorter than the integration guide. The integration guide is short.

By LightningPDF Jun 3, 2026 4 min read

e-invoicing peppol belgium ubl go

WordPress 7.0 Is Here. Will Your PDF Invoice Plugin Survive It?

WordPress 7.0 dropped PHP 7.2/7.3, moved the editor into an iframe, and shipped a new admin theme. Here is what actually breaks invoice plugins, and why cloud-rendered PDFs do not care.

By LightningPDF Team May 21, 2026 4 min read

wordpress woocommerce compatibility invoice plugin

Extract Tables from PDFs: 5 Methods That Actually Work

A hands-on comparison of five ways to extract tables from PDFs in Python: pdfplumber, Camelot, Tabula, AWS Textract, and manual regex. With code, benchmarks, and honest pros and cons.

By LightningPDF Team Apr 1, 2026 5 min read

"pdf""python""tables""extraction""data"

Build a Document Pipeline in Python: From Database to PDF

A complete tutorial for building a Python document pipeline that queries a database, formats data with Jinja2, generates PDFs via API, and delivers them via email or S3.

By LightningPDF Team Apr 1, 2026 3 min read

"python""tutorial""automation""pipeline""database"

PDF to JSON: How to Extract Structured Data from PDFs

Three practical approaches to extracting structured data from PDFs into JSON: regex on raw text, template-based extraction, and AI-powered extraction with code for each.

By LightningPDF Team Apr 1, 2026 4 min read

"pdf""json""python""extraction""api"

Why We Built an All-in-One PDF API (and Why You Should Stop Using 3 Different Tools)

The hidden costs of cobbling together Puppeteer, pdfcpu, and Ghostscript for PDF tasks. How a single API replaces your entire PDF toolchain.

By LightningPDF Team Apr 1, 2026 6 min read

"pdf""api""devtools""product"

OCR PDF API: When You Need It and When You Don't

A practical guide to PDF OCR: how to check if a PDF actually needs OCR, Tesseract vs cloud APIs, and when you should skip OCR entirely by generating PDFs with real text layers.

By LightningPDF Team Apr 1, 2026 5 min read

"pdf""ocr""api""python""tesseract"