Automate Invoice Processing: From Raw Data to Branded PDF

Build an automated invoice processing pipeline that turns raw transaction data into branded PDF invoices. Complete working example with HTML template and API integration.

By LightningPDF Team · · 4 min read

A SaaS company I worked with was generating invoices by hand until they hit 200 customers. The finance person would open a Google Doc template, manually type in the customer name, line items, and totals, export to PDF, and email it. It took about 8 minutes per invoice. At 200 invoices per month, that's 26 hours — more than three full workdays — spent on a task that a script handles in under a minute.

The fix isn't complicated. You need four things: a data source, an HTML template, a PDF generation API, and a delivery mechanism. This post walks through all four with working code.

The pipeline

Here's the full flow:

Database/API → Python script → HTML template → PDF API → Email/S3
     |              |               |              |          |
  raw data     fetch + format   inject data    render PDF   deliver

Each step is independent. You can swap the data source (Stripe API instead of a database), the template engine (Jinja2, Handlebars, plain f-strings), or the delivery method (email, S3, webhook) without touching the other parts.

Step 1: Define your invoice data

Start with a clear data structure. Every invoice needs these fields at minimum:

from dataclasses import dataclass, field
from datetime import date
from decimal import Decimal
from typing import Optional

@dataclass
class LineItem:
    description: str
    quantity: int
    unit_price: Decimal
    tax_rate: Decimal = Decimal("0.00")

    @property
    def subtotal(self) -> Decimal:
        return self.quantity * self.unit_price

    @property
    def tax(self) -> Decimal:
        return self.subtotal * self.tax_rate

    @property
    def total(self) -> Decimal:
        return self.subtotal + self.tax

@dataclass
class Invoice:
    invoice_number: str
    issue_date: date
    due_date: date
    customer_name: str
    customer_email: str
    customer_address: str
    items: list[LineItem] = field(default_factory=list)
    currency: str = "USD"
    notes: Optional[str] = None

    @property
    def subtotal(self) -> Decimal:
        return sum(item.subtotal for item in self.items)

    @property
    def total_tax(self) -> Decimal:
        return sum(item.tax for item in self.items)

    @property
    def total(self) -> Decimal:
        return self.subtotal + self.total_tax

Using Decimal instead of float matters. Floating point arithmetic produces results like 19.99 * 3 = 59.96999999999999. On an invoice, that's a bug. Decimal("19.99") * 3 gives you Decimal("59.97").

Step 2: Fetch data from your source

Here's a realistic example pulling from a PostgreSQL database. Adapt this to your data source — Stripe API, a CSV file, whatever.

import psycopg2
from datetime import date, timedelta
from decimal import Decimal

def fetch_unbilled_orders(conn, since: date) -> list[Invoice]:
    cursor = conn.cursor()
    cursor.execute("""
        SELECT
            o.id, o.created_at,
            c.name, c.email, c.address,
            oi.description, oi.quantity, oi.unit_price, oi.tax_rate
        FROM orders o
        JOIN customers c ON c.id = o.customer_id
        JOIN order_items oi ON oi.order_id = o.id
        WHERE o.invoiced = false AND o.created_at >= %s
        ORDER BY o.id, oi.id
    """, (since,))

    invoices = {}
    for row in cursor.fetchall():
        order_id = row[0]
        if order_id not in invoices:
            invoices[order_id] = Invoice(
                invoice_number=f"INV-{order_id:06d}",
                issue_date=date.today(),
                due_date=date.today() + timedelta(days=30),
                customer_name=row[2],
                customer_email=row[3],
                customer_address=row[4],
            )
        invoices[order_id].items.append(LineItem(
            description=row[5],
            quantity=row[6],
            unit_price=Decimal(str(row[7])),
            tax_rate=Decimal(str(row[8])),
        ))

    return list(invoices.values())

If you're pulling from Stripe instead:

import stripe

def fetch_stripe_invoices(since_timestamp: int) -> list[Invoice]:
    stripe.api_key = "sk_live_..."
    stripe_invoices = stripe.Invoice.list(
        created={"gte": since_timestamp},
        status="open",
        limit=100,
    )

    invoices = []
    for si in stripe_invoices.data:
        items = [
            LineItem(
                description=line.description or "Subscription",
                quantity=line.quantity or 1,
                unit_price=Decimal(str(line.amount / 100)),
            )
            for line in si.lines.data
        ]
        invoices.append(Invoice(
            invoice_number=si.number or f"INV-{si.id[-8:]}",
            issue_date=date.fromtimestamp(si.created),
            due_date=date.fromtimestamp(si.due_date) if si.due_date else date.today() + timedelta(days=30),
            customer_name=si.customer_name or "",
            customer_email=si.customer_email or "",
            customer_address=si.customer_address or "",
            items=items,
            currency=si.currency.upper(),
        ))
    return invoices

Step 3: Build the HTML template

This is where the invoice goes from data to something that looks professional. I use Jinja2 because it handles loops, conditionals, and formatting without fighting you.

pip install jinja2

Here's a complete invoice template. It's self-contained — all styles are inline so the PDF renderer doesn't need external CSS files.

INVOICE_TEMPLATE = """
<!DOCTYPE html>
<html>
<head>
<style>
  * { margin: 0; padding: 0; box-sizing: border-box; }
  body {
    font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
    color: #1a1a2e;
    font-size: 13px;
    line-height: 1.5;
    padding: 48px;
  }
  .header {
    display: flex;
    justify-content: space-between;
    margin-bottom: 40px;
    padding-bottom: 20px;
    border-bottom: 2px solid #4F46E5;
  }
  .company h1 { font-size: 24px; color: #4F46E5; margin-bottom: 4px; }
  .company p { color: #64748b; font-size: 12px; }
  .invoice-title { text-align: right; }
  .invoice-title h2 {
    font-size: 28px; text-transform: uppercase;
    letter-spacing: 2px; color: #1a1a2e;
  }
  .invoice-title .number { font-size: 14px; color: #4F46E5; margin-top: 4px; }
  .details { display: flex; justify-content: space-between; margin-bottom: 32px; }
  .details-block h3 { font-size: 11px; text-transform: uppercase; color: #94a3b8; margin-bottom: 6px; }
  .details-block p { font-size: 13px; }
  table { width: 100%; border-collapse: collapse; margin-bottom: 32px; }
  thead th {
    background: #f8fafc; padding: 10px 12px; text-align: left;
    font-size: 11px; text-transform: uppercase; color: #64748b;
    border-bottom: 2px solid #e2e8f0;
  }
  thead th:last-child, thead th:nth-child(2), thead th:nth-child(3), thead th:nth-child(4) {
    text-align: right;
  }
  tbody td {
    padding: 10px 12px; border-bottom: 1px solid #f1f5f9;
  }
  tbody td:last-child, tbody td:nth-child(2), tbody td:nth-child(3), tbody td:nth-child(4) {
    text-align: right;
  }
  .totals { display: flex; justify-content: flex-end; }
  .totals-table { width: 280px; }
  .totals-table .row {
    display: flex; justify-content: space-between;
    padding: 6px 0; font-size: 13px;
  }
  .totals-table .total-row {
    border-top: 2px solid #1a1a2e;
    padding-top: 10px; margin-top: 6px;
    font-size: 18px; font-weight: 700;
  }
  .notes { margin-top: 40px; padding: 16px; background: #f8fafc; border-radius: 6px; }
  .notes h3 { font-size: 11px; text-transform: uppercase; color: #94a3b8; margin-bottom: 4px; }
  .footer {
    margin-top: 40px; padding-top: 16px;
    border-top: 1px solid #e2e8f0;
    text-align: center; color: #94a3b8; font-size: 11px;
  }
</style>
</head>
<body>

<div class="header">
  <div class="company">
    <h1>{{ company_name }}</h1>
    <p>{{ company_address }}</p>
    <p>{{ company_email }}</p>
  </div>
  <div class="invoice-title">
    <h2>Invoice</h2>
    <div class="number">{{ invoice.invoice_number }}</div>
  </div>
</div>

<div class="details">
  <div class="details-block">
    <h3>Bill To</h3>
    <p><strong>{{ invoice.customer_name }}</strong></p>
    <p>{{ invoice.customer_address }}</p>
    <p>{{ invoice.customer_email }}</p>
  </div>
  <div class="details-block">
    <h3>Invoice Date</h3>
    <p>{{ invoice.issue_date.strftime('%B %d, %Y') }}</p>
    <h3 style="margin-top: 12px;">Due Date</h3>
    <p>{{ invoice.due_date.strftime('%B %d, %Y') }}</p>
  </div>
</div>

<table>
  <thead>
    <tr>
      <th>Description</th>
      <th>Qty</th>
      <th>Unit Price</th>
      <th>Tax</th>
      <th>Amount</th>
    </tr>
  </thead>
  <tbody>
    {% for item in invoice.items %}
    <tr>
      <td>{{ item.description }}</td>
      <td>{{ item.quantity }}</td>
      <td>{{ currency_symbol }}{{ "%.2f"|format(item.unit_price) }}</td>
      <td>{{ "%.0f"|format(item.tax_rate * 100) }}%</td>
      <td>{{ currency_symbol }}{{ "%.2f"|format(item.total) }}</td>
    </tr>
    {% endfor %}
  </tbody>
</table>

<div class="totals">
  <div class="totals-table">
    <div class="row">
      <span>Subtotal</span>
      <span>{{ currency_symbol }}{{ "%.2f"|format(invoice.subtotal) }}</span>
    </div>
    <div class="row">
      <span>Tax</span>
      <span>{{ currency_symbol }}{{ "%.2f"|format(invoice.total_tax) }}</span>
    </div>
    <div class="row total-row">
      <span>Total</span>
      <span>{{ currency_symbol }}{{ "%.2f"|format(invoice.total) }}</span>
    </div>
  </div>
</div>

{% if invoice.notes %}
<div class="notes">
  <h3>Notes</h3>
  <p>{{ invoice.notes }}</p>
</div>
{% endif %}

<div class="footer">
  <p>Thank you for your business. Payment is due within 30 days.</p>
  <p>{{ company_name }} &middot; {{ company_address }}</p>
</div>

</body>
</html>
"""

This template handles variable-length line items, optional notes, computed totals, and tax breakdowns. The CSS is designed for print — no media queries needed, no viewport dependencies.

Step 4: Render HTML and generate PDF

Now connect the data to the template and send it to a PDF API.

import requests
from jinja2 import Template

LIGHTNINGPDF_KEY = "lpdf_your_api_key"
COMPANY_NAME = "Acme Corp"
COMPANY_ADDRESS = "123 Main St, San Francisco, CA 94102"
COMPANY_EMAIL = "billing@acmecorp.com"

CURRENCY_SYMBOLS = {"USD": "$", "EUR": "\u20ac", "GBP": "\u00a3"}

def render_invoice_html(invoice: Invoice) -> str:
    template = Template(INVOICE_TEMPLATE)
    return template.render(
        invoice=invoice,
        company_name=COMPANY_NAME,
        company_address=COMPANY_ADDRESS,
        company_email=COMPANY_EMAIL,
        currency_symbol=CURRENCY_SYMBOLS.get(invoice.currency, "$"),
    )

def generate_pdf(html: str) -> bytes:
    response = requests.post(
        "https://lightningpdf.dev/api/v1/pdf/generate",
        headers={
            "Authorization": f"Bearer {LIGHTNINGPDF_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "html": html,
            "options": {
                "format": "A4",
                "print_background": True,
                "margin": {
                    "top": "0.5in",
                    "right": "0.5in",
                    "bottom": "0.5in",
                    "left": "0.5in"
                }
            }
        },
        timeout=30,
    )
    response.raise_for_status()
    return response.content

That's it. render_invoice_html fills in the template, generate_pdf sends it to the API and gets back raw PDF bytes. The API handles Chromium, font rendering, and page layout. You don't install anything beyond requests and jinja2.

Step 5: Deliver the invoice

Three common delivery methods. Pick the one that fits your workflow.

Email with SMTP

import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.base import MIMEBase
from email.mime.text import MIMEText
from email import encoders

def email_invoice(invoice: Invoice, pdf_bytes: bytes):
    msg = MIMEMultipart()
    msg["From"] = "billing@acmecorp.com"
    msg["To"] = invoice.customer_email
    msg["Subject"] = f"Invoice {invoice.invoice_number} from Acme Corp"

    body = f"""Hi {invoice.customer_name},

Please find attached invoice {invoice.invoice_number} for {CURRENCY_SYMBOLS.get(invoice.currency, '$')}{invoice.total:.2f}.

Payment is due by {invoice.due_date.strftime('%B %d, %Y')}.

Thank you for your business.
"""
    msg.attach(MIMEText(body, "plain"))

    attachment = MIMEBase("application", "pdf")
    attachment.set_payload(pdf_bytes)
    encoders.encode_base64(attachment)
    attachment.add_header(
        "Content-Disposition",
        f"attachment; filename={invoice.invoice_number}.pdf"
    )
    msg.attach(attachment)

    with smtplib.SMTP("smtp.example.com", 587) as server:
        server.starttls()
        server.login("billing@acmecorp.com", "smtp_password")
        server.send_message(msg)

Upload to S3

import boto3

def upload_to_s3(invoice: Invoice, pdf_bytes: bytes) -> str:
    s3 = boto3.client("s3")
    key = f"invoices/{invoice.issue_date.year}/{invoice.issue_date.month:02d}/{invoice.invoice_number}.pdf"

    s3.put_object(
        Bucket="acme-invoices",
        Key=key,
        Body=pdf_bytes,
        ContentType="application/pdf",
        Metadata={
            "invoice-number": invoice.invoice_number,
            "customer": invoice.customer_name,
            "amount": str(invoice.total),
        }
    )
    return f"s3://acme-invoices/{key}"

Save to disk (for testing)

from pathlib import Path

def save_locally(invoice: Invoice, pdf_bytes: bytes) -> Path:
    output_dir = Path("generated_invoices")
    output_dir.mkdir(exist_ok=True)
    path = output_dir / f"{invoice.invoice_number}.pdf"
    path.write_bytes(pdf_bytes)
    return path

Step 6: Tie it all together

Here's the complete script that fetches unbilled orders, generates PDFs, and emails them:

import psycopg2
from datetime import date, timedelta

def run_invoice_batch():
    conn = psycopg2.connect(
        host="db.example.com",
        dbname="acme_production",
        user="invoice_reader",
        password="db_password",
    )

    invoices = fetch_unbilled_orders(conn, since=date.today() - timedelta(days=30))
    print(f"Found {len(invoices)} unbilled orders")

    results = {"sent": 0, "failed": 0, "errors": []}

    for invoice in invoices:
        try:
            html = render_invoice_html(invoice)
            pdf_bytes = generate_pdf(html)
            email_invoice(invoice, pdf_bytes)
            upload_to_s3(invoice, pdf_bytes)  # archive a copy
            results["sent"] += 1
            print(f"  Sent {invoice.invoice_number} to {invoice.customer_email}")
        except Exception as e:
            results["failed"] += 1
            results["errors"].append(f"{invoice.invoice_number}: {e}")
            print(f"  FAILED {invoice.invoice_number}: {e}")

    conn.close()
    print(f"\nDone: {results['sent']} sent, {results['failed']} failed")
    return results

if __name__ == "__main__":
    run_invoice_batch()

Run this as a cron job and you have automated invoicing:

# Run at 9 AM on the 1st of every month
0 9 1 * * cd /opt/invoicing && python generate_invoices.py >> /var/log/invoicing.log 2>&1

curl: the quick test

Before writing all that Python, you can test the PDF generation with a single curl command:

curl -X POST https://lightningpdf.dev/api/v1/pdf/generate \
  -H "Authorization: Bearer lpdf_your_key" \
  -H "Content-Type: application/json" \
  -o test-invoice.pdf \
  -d '{
    "html": "<div style=\"padding: 40px; font-family: sans-serif;\"><h1 style=\"color: #4F46E5;\">INVOICE</h1><p><strong>Invoice #:</strong> INV-000001</p><p><strong>Date:</strong> April 1, 2026</p><hr style=\"margin: 20px 0;\"><table style=\"width: 100%; border-collapse: collapse;\"><tr style=\"background: #f8fafc;\"><th style=\"text-align: left; padding: 8px;\">Item</th><th style=\"text-align: right; padding: 8px;\">Amount</th></tr><tr><td style=\"padding: 8px;\">Monthly subscription</td><td style=\"text-align: right; padding: 8px;\">$49.00</td></tr><tr><td style=\"padding: 8px;\">API overage (1,200 calls)</td><td style=\"text-align: right; padding: 8px;\">$12.00</td></tr><tr style=\"border-top: 2px solid #1a1a2e; font-weight: bold;\"><td style=\"padding: 8px;\">Total</td><td style=\"text-align: right; padding: 8px;\">$61.00</td></tr></table></div>",
    "options": {"format": "A4", "print_background": true}
  }'

Open test-invoice.pdf and you'll have a formatted invoice in about 1 second. From there, it's just a matter of making the HTML template prettier and wiring up your data source.

Performance at scale

Some numbers from actual production use:

  • Single invoice generation: 800ms-1.2s (API call, including network latency)
  • Batch of 500 invoices (using the batch endpoint): ~90 seconds total, 180ms per invoice
  • PDF file size: 40-80KB for a typical 1-page invoice with no images
  • With company logo (embedded base64 PNG): 90-150KB

If you're generating more than 50 invoices at once, use the batch endpoint instead of individual calls. It handles parallelization on the server side and returns all PDFs in a single response.

Common gotchas

Currency formatting. Don't use Python's built-in format for currencies — it doesn't handle locale-specific rules (comma vs period for decimals, symbol placement). The babel library does this correctly:

from babel.numbers import format_currency
amount_str = format_currency(invoice.total, invoice.currency, locale="en_US")
# "$1,234.56"

Sequential invoice numbers. Use a database sequence or an atomic counter, not a timestamp or random ID. Many jurisdictions require sequential, gapless invoice numbers for tax compliance.

Date formatting. Always include the timezone or use UTC. An invoice generated at 11:30 PM Pacific on March 31 is dated April 1 in UTC. Pick one and be consistent.

HTML escaping. If customer names or descriptions contain <, >, or &, your HTML breaks. Jinja2 auto-escapes by default, which is why I use it instead of f-strings for templates.

The whole pipeline — data query, template rendering, PDF generation, email delivery — runs in under 2 seconds per invoice. For most businesses, that turns a multi-day monthly chore into a script that finishes before your coffee gets cold.

L

LightningPDF Team

Building fast, reliable PDF generation tools for developers.

Ready to generate PDFs?

Start free with 100 PDFs per month. No credit card required.

Get Started Free