Automate Invoice Processing: From Raw Data to Branded PDF
Build an automated invoice processing pipeline that turns raw transaction data into branded PDF invoices. Complete working example with HTML template and API integration.
A SaaS company I worked with was generating invoices by hand until they hit 200 customers. The finance person would open a Google Doc template, manually type in the customer name, line items, and totals, export to PDF, and email it. It took about 8 minutes per invoice. At 200 invoices per month, that's 26 hours — more than three full workdays — spent on a task that a script handles in under a minute.
The fix isn't complicated. You need four things: a data source, an HTML template, a PDF generation API, and a delivery mechanism. This post walks through all four with working code.
The pipeline
Here's the full flow:
Database/API → Python script → HTML template → PDF API → Email/S3
| | | | |
raw data fetch + format inject data render PDF deliver
Each step is independent. You can swap the data source (Stripe API instead of a database), the template engine (Jinja2, Handlebars, plain f-strings), or the delivery method (email, S3, webhook) without touching the other parts.
Step 1: Define your invoice data
Start with a clear data structure. Every invoice needs these fields at minimum:
from dataclasses import dataclass, field
from datetime import date
from decimal import Decimal
from typing import Optional
@dataclass
class LineItem:
description: str
quantity: int
unit_price: Decimal
tax_rate: Decimal = Decimal("0.00")
@property
def subtotal(self) -> Decimal:
return self.quantity * self.unit_price
@property
def tax(self) -> Decimal:
return self.subtotal * self.tax_rate
@property
def total(self) -> Decimal:
return self.subtotal + self.tax
@dataclass
class Invoice:
invoice_number: str
issue_date: date
due_date: date
customer_name: str
customer_email: str
customer_address: str
items: list[LineItem] = field(default_factory=list)
currency: str = "USD"
notes: Optional[str] = None
@property
def subtotal(self) -> Decimal:
return sum(item.subtotal for item in self.items)
@property
def total_tax(self) -> Decimal:
return sum(item.tax for item in self.items)
@property
def total(self) -> Decimal:
return self.subtotal + self.total_tax
Using Decimal instead of float matters. Floating point arithmetic produces results like 19.99 * 3 = 59.96999999999999. On an invoice, that's a bug. Decimal("19.99") * 3 gives you Decimal("59.97").
Step 2: Fetch data from your source
Here's a realistic example pulling from a PostgreSQL database. Adapt this to your data source — Stripe API, a CSV file, whatever.
import psycopg2
from datetime import date, timedelta
from decimal import Decimal
def fetch_unbilled_orders(conn, since: date) -> list[Invoice]:
cursor = conn.cursor()
cursor.execute("""
SELECT
o.id, o.created_at,
c.name, c.email, c.address,
oi.description, oi.quantity, oi.unit_price, oi.tax_rate
FROM orders o
JOIN customers c ON c.id = o.customer_id
JOIN order_items oi ON oi.order_id = o.id
WHERE o.invoiced = false AND o.created_at >= %s
ORDER BY o.id, oi.id
""", (since,))
invoices = {}
for row in cursor.fetchall():
order_id = row[0]
if order_id not in invoices:
invoices[order_id] = Invoice(
invoice_number=f"INV-{order_id:06d}",
issue_date=date.today(),
due_date=date.today() + timedelta(days=30),
customer_name=row[2],
customer_email=row[3],
customer_address=row[4],
)
invoices[order_id].items.append(LineItem(
description=row[5],
quantity=row[6],
unit_price=Decimal(str(row[7])),
tax_rate=Decimal(str(row[8])),
))
return list(invoices.values())
If you're pulling from Stripe instead:
import stripe
def fetch_stripe_invoices(since_timestamp: int) -> list[Invoice]:
stripe.api_key = "sk_live_..."
stripe_invoices = stripe.Invoice.list(
created={"gte": since_timestamp},
status="open",
limit=100,
)
invoices = []
for si in stripe_invoices.data:
items = [
LineItem(
description=line.description or "Subscription",
quantity=line.quantity or 1,
unit_price=Decimal(str(line.amount / 100)),
)
for line in si.lines.data
]
invoices.append(Invoice(
invoice_number=si.number or f"INV-{si.id[-8:]}",
issue_date=date.fromtimestamp(si.created),
due_date=date.fromtimestamp(si.due_date) if si.due_date else date.today() + timedelta(days=30),
customer_name=si.customer_name or "",
customer_email=si.customer_email or "",
customer_address=si.customer_address or "",
items=items,
currency=si.currency.upper(),
))
return invoices
Step 3: Build the HTML template
This is where the invoice goes from data to something that looks professional. I use Jinja2 because it handles loops, conditionals, and formatting without fighting you.
pip install jinja2
Here's a complete invoice template. It's self-contained — all styles are inline so the PDF renderer doesn't need external CSS files.
INVOICE_TEMPLATE = """
<!DOCTYPE html>
<html>
<head>
<style>
* { margin: 0; padding: 0; box-sizing: border-box; }
body {
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
color: #1a1a2e;
font-size: 13px;
line-height: 1.5;
padding: 48px;
}
.header {
display: flex;
justify-content: space-between;
margin-bottom: 40px;
padding-bottom: 20px;
border-bottom: 2px solid #4F46E5;
}
.company h1 { font-size: 24px; color: #4F46E5; margin-bottom: 4px; }
.company p { color: #64748b; font-size: 12px; }
.invoice-title { text-align: right; }
.invoice-title h2 {
font-size: 28px; text-transform: uppercase;
letter-spacing: 2px; color: #1a1a2e;
}
.invoice-title .number { font-size: 14px; color: #4F46E5; margin-top: 4px; }
.details { display: flex; justify-content: space-between; margin-bottom: 32px; }
.details-block h3 { font-size: 11px; text-transform: uppercase; color: #94a3b8; margin-bottom: 6px; }
.details-block p { font-size: 13px; }
table { width: 100%; border-collapse: collapse; margin-bottom: 32px; }
thead th {
background: #f8fafc; padding: 10px 12px; text-align: left;
font-size: 11px; text-transform: uppercase; color: #64748b;
border-bottom: 2px solid #e2e8f0;
}
thead th:last-child, thead th:nth-child(2), thead th:nth-child(3), thead th:nth-child(4) {
text-align: right;
}
tbody td {
padding: 10px 12px; border-bottom: 1px solid #f1f5f9;
}
tbody td:last-child, tbody td:nth-child(2), tbody td:nth-child(3), tbody td:nth-child(4) {
text-align: right;
}
.totals { display: flex; justify-content: flex-end; }
.totals-table { width: 280px; }
.totals-table .row {
display: flex; justify-content: space-between;
padding: 6px 0; font-size: 13px;
}
.totals-table .total-row {
border-top: 2px solid #1a1a2e;
padding-top: 10px; margin-top: 6px;
font-size: 18px; font-weight: 700;
}
.notes { margin-top: 40px; padding: 16px; background: #f8fafc; border-radius: 6px; }
.notes h3 { font-size: 11px; text-transform: uppercase; color: #94a3b8; margin-bottom: 4px; }
.footer {
margin-top: 40px; padding-top: 16px;
border-top: 1px solid #e2e8f0;
text-align: center; color: #94a3b8; font-size: 11px;
}
</style>
</head>
<body>
<div class="header">
<div class="company">
<h1>{{ company_name }}</h1>
<p>{{ company_address }}</p>
<p>{{ company_email }}</p>
</div>
<div class="invoice-title">
<h2>Invoice</h2>
<div class="number">{{ invoice.invoice_number }}</div>
</div>
</div>
<div class="details">
<div class="details-block">
<h3>Bill To</h3>
<p><strong>{{ invoice.customer_name }}</strong></p>
<p>{{ invoice.customer_address }}</p>
<p>{{ invoice.customer_email }}</p>
</div>
<div class="details-block">
<h3>Invoice Date</h3>
<p>{{ invoice.issue_date.strftime('%B %d, %Y') }}</p>
<h3 style="margin-top: 12px;">Due Date</h3>
<p>{{ invoice.due_date.strftime('%B %d, %Y') }}</p>
</div>
</div>
<table>
<thead>
<tr>
<th>Description</th>
<th>Qty</th>
<th>Unit Price</th>
<th>Tax</th>
<th>Amount</th>
</tr>
</thead>
<tbody>
{% for item in invoice.items %}
<tr>
<td>{{ item.description }}</td>
<td>{{ item.quantity }}</td>
<td>{{ currency_symbol }}{{ "%.2f"|format(item.unit_price) }}</td>
<td>{{ "%.0f"|format(item.tax_rate * 100) }}%</td>
<td>{{ currency_symbol }}{{ "%.2f"|format(item.total) }}</td>
</tr>
{% endfor %}
</tbody>
</table>
<div class="totals">
<div class="totals-table">
<div class="row">
<span>Subtotal</span>
<span>{{ currency_symbol }}{{ "%.2f"|format(invoice.subtotal) }}</span>
</div>
<div class="row">
<span>Tax</span>
<span>{{ currency_symbol }}{{ "%.2f"|format(invoice.total_tax) }}</span>
</div>
<div class="row total-row">
<span>Total</span>
<span>{{ currency_symbol }}{{ "%.2f"|format(invoice.total) }}</span>
</div>
</div>
</div>
{% if invoice.notes %}
<div class="notes">
<h3>Notes</h3>
<p>{{ invoice.notes }}</p>
</div>
{% endif %}
<div class="footer">
<p>Thank you for your business. Payment is due within 30 days.</p>
<p>{{ company_name }} · {{ company_address }}</p>
</div>
</body>
</html>
"""
This template handles variable-length line items, optional notes, computed totals, and tax breakdowns. The CSS is designed for print — no media queries needed, no viewport dependencies.
Step 4: Render HTML and generate PDF
Now connect the data to the template and send it to a PDF API.
import requests
from jinja2 import Template
LIGHTNINGPDF_KEY = "lpdf_your_api_key"
COMPANY_NAME = "Acme Corp"
COMPANY_ADDRESS = "123 Main St, San Francisco, CA 94102"
COMPANY_EMAIL = "billing@acmecorp.com"
CURRENCY_SYMBOLS = {"USD": "$", "EUR": "\u20ac", "GBP": "\u00a3"}
def render_invoice_html(invoice: Invoice) -> str:
template = Template(INVOICE_TEMPLATE)
return template.render(
invoice=invoice,
company_name=COMPANY_NAME,
company_address=COMPANY_ADDRESS,
company_email=COMPANY_EMAIL,
currency_symbol=CURRENCY_SYMBOLS.get(invoice.currency, "$"),
)
def generate_pdf(html: str) -> bytes:
response = requests.post(
"https://lightningpdf.dev/api/v1/pdf/generate",
headers={
"Authorization": f"Bearer {LIGHTNINGPDF_KEY}",
"Content-Type": "application/json",
},
json={
"html": html,
"options": {
"format": "A4",
"print_background": True,
"margin": {
"top": "0.5in",
"right": "0.5in",
"bottom": "0.5in",
"left": "0.5in"
}
}
},
timeout=30,
)
response.raise_for_status()
return response.content
That's it. render_invoice_html fills in the template, generate_pdf sends it to the API and gets back raw PDF bytes. The API handles Chromium, font rendering, and page layout. You don't install anything beyond requests and jinja2.
Step 5: Deliver the invoice
Three common delivery methods. Pick the one that fits your workflow.
Email with SMTP
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.base import MIMEBase
from email.mime.text import MIMEText
from email import encoders
def email_invoice(invoice: Invoice, pdf_bytes: bytes):
msg = MIMEMultipart()
msg["From"] = "billing@acmecorp.com"
msg["To"] = invoice.customer_email
msg["Subject"] = f"Invoice {invoice.invoice_number} from Acme Corp"
body = f"""Hi {invoice.customer_name},
Please find attached invoice {invoice.invoice_number} for {CURRENCY_SYMBOLS.get(invoice.currency, '$')}{invoice.total:.2f}.
Payment is due by {invoice.due_date.strftime('%B %d, %Y')}.
Thank you for your business.
"""
msg.attach(MIMEText(body, "plain"))
attachment = MIMEBase("application", "pdf")
attachment.set_payload(pdf_bytes)
encoders.encode_base64(attachment)
attachment.add_header(
"Content-Disposition",
f"attachment; filename={invoice.invoice_number}.pdf"
)
msg.attach(attachment)
with smtplib.SMTP("smtp.example.com", 587) as server:
server.starttls()
server.login("billing@acmecorp.com", "smtp_password")
server.send_message(msg)
Upload to S3
import boto3
def upload_to_s3(invoice: Invoice, pdf_bytes: bytes) -> str:
s3 = boto3.client("s3")
key = f"invoices/{invoice.issue_date.year}/{invoice.issue_date.month:02d}/{invoice.invoice_number}.pdf"
s3.put_object(
Bucket="acme-invoices",
Key=key,
Body=pdf_bytes,
ContentType="application/pdf",
Metadata={
"invoice-number": invoice.invoice_number,
"customer": invoice.customer_name,
"amount": str(invoice.total),
}
)
return f"s3://acme-invoices/{key}"
Save to disk (for testing)
from pathlib import Path
def save_locally(invoice: Invoice, pdf_bytes: bytes) -> Path:
output_dir = Path("generated_invoices")
output_dir.mkdir(exist_ok=True)
path = output_dir / f"{invoice.invoice_number}.pdf"
path.write_bytes(pdf_bytes)
return path
Step 6: Tie it all together
Here's the complete script that fetches unbilled orders, generates PDFs, and emails them:
import psycopg2
from datetime import date, timedelta
def run_invoice_batch():
conn = psycopg2.connect(
host="db.example.com",
dbname="acme_production",
user="invoice_reader",
password="db_password",
)
invoices = fetch_unbilled_orders(conn, since=date.today() - timedelta(days=30))
print(f"Found {len(invoices)} unbilled orders")
results = {"sent": 0, "failed": 0, "errors": []}
for invoice in invoices:
try:
html = render_invoice_html(invoice)
pdf_bytes = generate_pdf(html)
email_invoice(invoice, pdf_bytes)
upload_to_s3(invoice, pdf_bytes) # archive a copy
results["sent"] += 1
print(f" Sent {invoice.invoice_number} to {invoice.customer_email}")
except Exception as e:
results["failed"] += 1
results["errors"].append(f"{invoice.invoice_number}: {e}")
print(f" FAILED {invoice.invoice_number}: {e}")
conn.close()
print(f"\nDone: {results['sent']} sent, {results['failed']} failed")
return results
if __name__ == "__main__":
run_invoice_batch()
Run this as a cron job and you have automated invoicing:
# Run at 9 AM on the 1st of every month
0 9 1 * * cd /opt/invoicing && python generate_invoices.py >> /var/log/invoicing.log 2>&1
curl: the quick test
Before writing all that Python, you can test the PDF generation with a single curl command:
curl -X POST https://lightningpdf.dev/api/v1/pdf/generate \
-H "Authorization: Bearer lpdf_your_key" \
-H "Content-Type: application/json" \
-o test-invoice.pdf \
-d '{
"html": "<div style=\"padding: 40px; font-family: sans-serif;\"><h1 style=\"color: #4F46E5;\">INVOICE</h1><p><strong>Invoice #:</strong> INV-000001</p><p><strong>Date:</strong> April 1, 2026</p><hr style=\"margin: 20px 0;\"><table style=\"width: 100%; border-collapse: collapse;\"><tr style=\"background: #f8fafc;\"><th style=\"text-align: left; padding: 8px;\">Item</th><th style=\"text-align: right; padding: 8px;\">Amount</th></tr><tr><td style=\"padding: 8px;\">Monthly subscription</td><td style=\"text-align: right; padding: 8px;\">$49.00</td></tr><tr><td style=\"padding: 8px;\">API overage (1,200 calls)</td><td style=\"text-align: right; padding: 8px;\">$12.00</td></tr><tr style=\"border-top: 2px solid #1a1a2e; font-weight: bold;\"><td style=\"padding: 8px;\">Total</td><td style=\"text-align: right; padding: 8px;\">$61.00</td></tr></table></div>",
"options": {"format": "A4", "print_background": true}
}'
Open test-invoice.pdf and you'll have a formatted invoice in about 1 second. From there, it's just a matter of making the HTML template prettier and wiring up your data source.
Performance at scale
Some numbers from actual production use:
- Single invoice generation: 800ms-1.2s (API call, including network latency)
- Batch of 500 invoices (using the batch endpoint): ~90 seconds total, 180ms per invoice
- PDF file size: 40-80KB for a typical 1-page invoice with no images
- With company logo (embedded base64 PNG): 90-150KB
If you're generating more than 50 invoices at once, use the batch endpoint instead of individual calls. It handles parallelization on the server side and returns all PDFs in a single response.
Common gotchas
Currency formatting. Don't use Python's built-in format for currencies — it doesn't handle locale-specific rules (comma vs period for decimals, symbol placement). The babel library does this correctly:
from babel.numbers import format_currency
amount_str = format_currency(invoice.total, invoice.currency, locale="en_US")
# "$1,234.56"
Sequential invoice numbers. Use a database sequence or an atomic counter, not a timestamp or random ID. Many jurisdictions require sequential, gapless invoice numbers for tax compliance.
Date formatting. Always include the timezone or use UTC. An invoice generated at 11:30 PM Pacific on March 31 is dated April 1 in UTC. Pick one and be consistent.
HTML escaping. If customer names or descriptions contain <, >, or &, your HTML breaks. Jinja2 auto-escapes by default, which is why I use it instead of f-strings for templates.
The whole pipeline — data query, template rendering, PDF generation, email delivery — runs in under 2 seconds per invoice. For most businesses, that turns a multi-day monthly chore into a script that finishes before your coffee gets cold.
LightningPDF Team
Building fast, reliable PDF generation tools for developers.