Automate PDF Report Generation from Your Data
Step-by-step guide to automating PDF report generation from databases, APIs, and spreadsheets. Python and Node.js examples included.
Automate PDF Report Generation from Your Data
Most businesses produce weekly or monthly reports -- sales summaries, financial snapshots, project updates. Most teams produce them, and most still rely on someone opening a spreadsheet, copying numbers into a document, and exporting a PDF by hand. This manual process is slow, error-prone, and impossible to scale.
Automated PDF report generation is the process of programmatically pulling data from databases, APIs, or spreadsheets, merging it into an HTML template, and converting the result into a polished PDF document via an API, eliminating manual formatting and data entry.
This guide walks you through building a complete automated reporting pipeline. You will connect to data sources, build dynamic HTML templates, generate PDFs through the LightningPDF API, and schedule reports to run automatically. Code examples are provided in Python and Node.js.
Why Automate Report Generation?
The Cost of Manual Reporting
Consider a SaaS company that sends monthly performance reports to 500 clients. Each report requires pulling metrics from the database, formatting tables and charts, adding the client's branding, and exporting to PDF. If each report takes 10 minutes to produce manually, that is over 83 hours of labor every month -- more than two full-time work weeks dedicated to a task that a script can handle in minutes.
Beyond the time cost, manual reporting introduces errors. A misplaced decimal, an outdated chart, or a copy-paste mistake in one report undermines credibility. When a client questions a number, tracking down the source of the error in a manually assembled document is painful.
What Automated Reporting Gives You
- Consistency: Every report follows the exact same template, formatting, and calculation logic.
- Speed: Generate hundreds of reports in minutes instead of days.
- Accuracy: Data flows directly from the source into the report with no manual transcription.
- Timeliness: Schedule reports to generate and deliver automatically, never missing a deadline.
- Audit trail: Every report is generated from a known data query at a known time, making discrepancies easy to trace.
Common Use Cases
Here's where automated reports show up:
- SaaS analytics: Monthly usage reports for customers showing API calls, storage, active users
- Financial services: Portfolio performance reports, transaction summaries, compliance documents
- E-commerce: Sales reports, inventory summaries, supplier performance reviews
- Healthcare: Patient visit summaries, lab result reports, billing statements
- Education: Student progress reports, course completion certificates, attendance records
- Marketing: Campaign performance reports, SEO audits, social media analytics
Connecting to Data Sources
Before you can generate a report, you need to pull the data that populates it. Here are the most common patterns.
From a Database (PostgreSQL/MySQL)
import psycopg2
from datetime import datetime, timedelta
def get_monthly_sales_data(client_id, month, year):
conn = psycopg2.connect(
host="db.example.com",
database="analytics",
user="report_reader",
password="secure-password"
)
cursor = conn.cursor()
cursor.execute("""
SELECT
product_name,
SUM(quantity) as total_qty,
SUM(revenue) as total_revenue,
COUNT(DISTINCT order_id) as order_count
FROM orders
WHERE client_id = %s
AND EXTRACT(MONTH FROM order_date) = %s
AND EXTRACT(YEAR FROM order_date) = %s
GROUP BY product_name
ORDER BY total_revenue DESC
""", (client_id, month, year))
columns = [desc[0] for desc in cursor.description]
rows = [dict(zip(columns, row)) for row in cursor.fetchall()]
cursor.close()
conn.close()
return rows
From a REST API
import requests
def get_analytics_data(api_token, date_range):
response = requests.get(
"https://analytics.example.com/api/v2/metrics",
headers={"Authorization": f"Bearer {api_token}"},
params={
"start_date": date_range["start"],
"end_date": date_range["end"],
"metrics": "pageviews,sessions,conversions,revenue"
}
)
return response.json()["data"]
From a Spreadsheet (CSV/Excel)
import pandas as pd
def get_spreadsheet_data(file_path):
df = pd.read_excel(file_path, sheet_name="Monthly Data")
# Clean and aggregate
summary = df.groupby("category").agg({
"revenue": "sum",
"units": "sum",
"profit_margin": "mean"
}).reset_index()
return summary.to_dict("records")
Building Dynamic Report Templates
HTML-based reports let you use the full web stack -- CSS for styling, JavaScript charting libraries for visualizations, and template engines for dynamic content. For a thorough grounding in HTML-to-PDF conversion approaches, our HTML-to-PDF complete guide covers the fundamentals.
A Complete Report Template
Here is a professional report template with a header, summary metrics, a data table, and a footer:
<!DOCTYPE html>
<html>
<head>
<style>
@page {
size: A4;
margin: 20mm;
@bottom-center {
content: "Page " counter(page) " of " counter(pages);
font-size: 10px;
color: #94a3b8;
}
}
* { margin: 0; padding: 0; box-sizing: border-box; }
body {
font-family: 'Helvetica Neue', Arial, sans-serif;
color: #1e293b;
font-size: 13px;
line-height: 1.6;
}
.report-header {
background: linear-gradient(135deg, #4F46E5 0%, #7C3AED 100%);
color: #fff;
padding: 30px 40px;
margin: -20mm -20mm 30px -20mm;
display: flex;
justify-content: space-between;
align-items: center;
}
.report-header h1 { font-size: 24px; font-weight: 700; }
.report-header .meta { text-align: right; opacity: 0.9; font-size: 13px; }
.summary-cards {
display: flex;
gap: 16px;
margin-bottom: 30px;
}
.card {
flex: 1;
background: #f8fafc;
border: 1px solid #e2e8f0;
border-radius: 8px;
padding: 20px;
text-align: center;
}
.card .value {
font-size: 28px;
font-weight: 700;
color: #4F46E5;
}
.card .label {
font-size: 11px;
text-transform: uppercase;
letter-spacing: 1px;
color: #64748b;
margin-top: 4px;
}
.card .change {
font-size: 12px;
margin-top: 8px;
}
.change.up { color: #16a34a; }
.change.down { color: #dc2626; }
h2 {
font-size: 16px;
color: #1e293b;
border-bottom: 2px solid #4F46E5;
padding-bottom: 8px;
margin: 30px 0 16px 0;
}
table {
width: 100%;
border-collapse: collapse;
margin-bottom: 20px;
page-break-inside: auto;
}
thead { display: table-header-group; }
tr { page-break-inside: avoid; }
th {
background: #4F46E5;
color: #fff;
padding: 10px 14px;
text-align: left;
font-size: 11px;
text-transform: uppercase;
letter-spacing: 0.5px;
}
td {
padding: 10px 14px;
border-bottom: 1px solid #e2e8f0;
}
tbody tr:nth-child(even) { background: #f8fafc; }
.chart-placeholder {
background: #f1f5f9;
border: 2px dashed #cbd5e1;
border-radius: 8px;
padding: 40px;
text-align: center;
color: #64748b;
margin: 20px 0;
}
.footer {
margin-top: 40px;
padding-top: 16px;
border-top: 1px solid #e2e8f0;
font-size: 11px;
color: #94a3b8;
text-align: center;
}
</style>
</head>
<body>
<div class="report-header">
<div>
<h1>Monthly Sales Report</h1>
<p>Acme Corp -- Client Analytics Division</p>
</div>
<div class="meta">
<p>Report Period: February 2026</p>
<p>Generated: Feb 23, 2026</p>
<p>Prepared for: Widget Industries</p>
</div>
</div>
<div class="summary-cards">
<div class="card">
<div class="value">$124,500</div>
<div class="label">Total Revenue</div>
<div class="change up">+12.3% vs last month</div>
</div>
<div class="card">
<div class="value">1,847</div>
<div class="label">Orders</div>
<div class="change up">+8.1% vs last month</div>
</div>
<div class="card">
<div class="value">$67.40</div>
<div class="label">Avg Order Value</div>
<div class="change up">+3.9% vs last month</div>
</div>
<div class="card">
<div class="value">94.2%</div>
<div class="label">Fulfillment Rate</div>
<div class="change down">-0.8% vs last month</div>
</div>
</div>
<h2>Revenue by Product Category</h2>
<table>
<thead>
<tr>
<th>Category</th>
<th>Units Sold</th>
<th>Revenue</th>
<th>Margin</th>
<th>MoM Change</th>
</tr>
</thead>
<tbody>
<tr>
<td>Electronics</td><td>423</td><td>$52,875</td>
<td>32.1%</td><td style="color:#16a34a">+15.2%</td>
</tr>
<tr>
<td>Accessories</td><td>891</td><td>$35,640</td>
<td>45.3%</td><td style="color:#16a34a">+9.8%</td>
</tr>
<tr>
<td>Software Licenses</td><td>156</td><td>$23,400</td>
<td>89.0%</td><td style="color:#16a34a">+22.1%</td>
</tr>
<tr>
<td>Support Plans</td><td>89</td><td>$12,585</td>
<td>72.5%</td><td style="color:#dc2626">-3.4%</td>
</tr>
</tbody>
</table>
<div class="footer">
<p>This report was auto-generated. Data sourced from analytics database.</p>
<p>Acme Corp -- Confidential</p>
</div>
</body>
</html>
Notice the CSS page-break-inside: avoid on table rows and display: table-header-group on thead -- these ensure tables reflow cleanly across pages. For more on handling page breaks, see our page break troubleshooting guide.
Generating Reports with LightningPDF
With your data source connected and your template built, generating the PDF takes one API call. Send the populated HTML to the LightningPDF API and receive a PDF back.
Python: Full Report Pipeline
import requests
import base64
from datetime import datetime
API_URL = "https://api.lightningpdf.dev/api/v1/pdf/generate"
API_KEY = "your-api-key"
def build_report_html(data, client_name, period):
"""Merge data into the report HTML template."""
# Build summary cards
total_revenue = sum(row["total_revenue"] for row in data)
total_orders = sum(row["order_count"] for row in data)
avg_order = total_revenue / total_orders if total_orders else 0
# Build table rows
table_rows = ""
for row in data:
margin = (row["total_revenue"] - row.get("cost", 0)) / row["total_revenue"] * 100
table_rows += f"""
<tr>
<td>{row['product_name']}</td>
<td>{row['total_qty']:,}</td>
<td>${row['total_revenue']:,.2f}</td>
<td>{margin:.1f}%</td>
</tr>"""
return f"""<!DOCTYPE html>
<html><head><style>
body {{ font-family: Helvetica, sans-serif; padding: 40px; color: #1e293b; }}
.header {{
background: linear-gradient(135deg, #4F46E5, #7C3AED);
color: #fff; padding: 30px; margin: -40px -40px 30px -40px;
}}
.header h1 {{ font-size: 24px; margin-bottom: 4px; }}
.cards {{ display: flex; gap: 16px; margin-bottom: 30px; }}
.card {{
flex: 1; background: #f8fafc; border: 1px solid #e2e8f0;
border-radius: 8px; padding: 20px; text-align: center;
}}
.card .value {{ font-size: 28px; font-weight: 700; color: #4F46E5; }}
.card .label {{
font-size: 11px; text-transform: uppercase;
letter-spacing: 1px; color: #64748b;
}}
h2 {{ border-bottom: 2px solid #4F46E5; padding-bottom: 8px;
margin: 24px 0 12px 0; font-size: 16px; }}
table {{ width: 100%; border-collapse: collapse; }}
th {{ background: #4F46E5; color: #fff; padding: 10px 14px;
text-align: left; font-size: 11px; text-transform: uppercase; }}
td {{ padding: 10px 14px; border-bottom: 1px solid #e2e8f0; }}
tr:nth-child(even) {{ background: #f8fafc; }}
tr {{ page-break-inside: avoid; }}
</style></head><body>
<div class="header">
<h1>Monthly Performance Report</h1>
<p>{client_name} -- {period}</p>
<p>Generated: {datetime.now().strftime('%B %d, %Y')}</p>
</div>
<div class="cards">
<div class="card">
<div class="value">${total_revenue:,.0f}</div>
<div class="label">Total Revenue</div>
</div>
<div class="card">
<div class="value">{total_orders:,}</div>
<div class="label">Total Orders</div>
</div>
<div class="card">
<div class="value">${avg_order:,.2f}</div>
<div class="label">Avg Order Value</div>
</div>
</div>
<h2>Revenue by Product</h2>
<table>
<thead><tr><th>Product</th><th>Units</th><th>Revenue</th><th>Margin</th></tr></thead>
<tbody>{table_rows}</tbody>
</table>
</body></html>"""
def generate_report(html, filename):
"""Send HTML to LightningPDF and save the PDF."""
response = requests.post(
API_URL,
headers={
"X-API-Key": API_KEY,
"Content-Type": "application/json"
},
json={
"html": html,
"options": {
"format": "A4",
"print_background": True,
"margin": {
"top": "20mm",
"bottom": "20mm",
"left": "15mm",
"right": "15mm"
}
}
}
)
if response.status_code == 200:
result = response.json()
pdf_bytes = base64.b64decode(result["data"]["pdf"])
with open(filename, "wb") as f:
f.write(pdf_bytes)
print(f"Report saved: {filename} ({result['data']['generation_time_ms']}ms)")
return pdf_bytes
else:
raise Exception(f"Generation failed: {response.status_code} {response.text}")
# Usage: generate a single report
data = get_monthly_sales_data(client_id=42, month=2, year=2026)
html = build_report_html(data, "Widget Industries", "February 2026")
generate_report(html, "report_widget_industries_feb_2026.pdf")
Node.js: Report Generation
const fs = require('fs');
const API_URL = 'https://api.lightningpdf.dev/api/v1/pdf/generate';
const API_KEY = 'your-api-key';
async function generateReport(data, clientName, period) {
const totalRevenue = data.reduce((sum, row) => sum + row.revenue, 0);
const totalOrders = data.reduce((sum, row) => sum + row.orders, 0);
const tableRows = data.map(row => `
<tr>
<td>${row.category}</td>
<td>${row.orders.toLocaleString()}</td>
<td>$${row.revenue.toLocaleString('en-US', {minimumFractionDigits: 2})}</td>
<td>${row.growth > 0 ? '+' : ''}${row.growth.toFixed(1)}%</td>
</tr>
`).join('');
const html = `<!DOCTYPE html>
<html><head><style>
body { font-family: Helvetica, sans-serif; padding: 40px; }
.header { background: linear-gradient(135deg, #4F46E5, #7C3AED);
color: #fff; padding: 30px; margin: -40px -40px 30px; }
.cards { display: flex; gap: 16px; margin-bottom: 30px; }
.card { flex: 1; background: #f8fafc; border: 1px solid #e2e8f0;
border-radius: 8px; padding: 20px; text-align: center; }
.card .value { font-size: 28px; font-weight: 700; color: #4F46E5; }
.card .label { font-size: 11px; text-transform: uppercase;
letter-spacing: 1px; color: #64748b; }
table { width: 100%; border-collapse: collapse; }
th { background: #4F46E5; color: #fff; padding: 10px; text-align: left; }
td { padding: 10px; border-bottom: 1px solid #e2e8f0; }
</style></head><body>
<div class="header">
<h1>Monthly Report</h1>
<p>${clientName} -- ${period}</p>
</div>
<div class="cards">
<div class="card">
<div class="value">$${totalRevenue.toLocaleString()}</div>
<div class="label">Revenue</div>
</div>
<div class="card">
<div class="value">${totalOrders.toLocaleString()}</div>
<div class="label">Orders</div>
</div>
</div>
<table>
<thead><tr><th>Category</th><th>Orders</th><th>Revenue</th><th>Growth</th></tr></thead>
<tbody>${tableRows}</tbody>
</table>
</body></html>`;
const response = await fetch(API_URL, {
method: 'POST',
headers: {
'X-API-Key': API_KEY,
'Content-Type': 'application/json'
},
body: JSON.stringify({
html,
options: { format: 'A4', print_background: true }
})
});
if (!response.ok) throw new Error(`API error: ${response.status}`);
const result = await response.json();
const pdfBuffer = Buffer.from(result.data.pdf, 'base64');
const filename = `report_${clientName.toLowerCase().replace(/\s+/g, '_')}_${period.replace(/\s+/g, '_')}.pdf`;
fs.writeFileSync(filename, pdfBuffer);
console.log(`Report saved: ${filename} (${result.data.generation_time_ms}ms)`);
return pdfBuffer;
}
Batch Report Generation
When you need to generate reports for many clients at once, the batch API is significantly more efficient than making individual calls. Instead of one HTTP request per report, you send a single request containing all the report data:
def generate_batch_reports(clients_data):
"""Generate reports for multiple clients in one API call."""
items = []
for client in clients_data:
html = build_report_html(
client["data"],
client["name"],
client["period"]
)
items.append({
"html": html,
"options": {"format": "A4", "print_background": True},
"metadata": {"client_id": client["id"]}
})
response = requests.post(
"https://api.lightningpdf.dev/api/v1/pdf/batch",
headers={
"X-API-Key": API_KEY,
"Content-Type": "application/json"
},
json={"items": items}
)
batch = response.json()
print(f"Queued {batch['data']['total']} reports for generation")
return batch["data"]["batch_id"]
For language-specific patterns beyond what is shown here, see our guides for Python, Node.js, and Go.
Scheduling Automated Reports
The final piece of the pipeline is scheduling. You want reports to generate and deliver themselves without any human intervention.
Cron Job (Linux/macOS)
The simplest approach is a cron job that runs your script on a schedule:
# Generate monthly reports on the 1st of every month at 6:00 AM
0 6 1 * * /usr/bin/python3 /opt/reports/generate_monthly_reports.py >> /var/log/reports.log 2>&1
Python Script with Schedule
For more control, use a scheduling library:
import schedule
import time
def monthly_report_job():
"""Generate and email monthly reports for all clients."""
clients = get_active_clients()
for client in clients:
data = get_monthly_sales_data(client["id"], month=2, year=2026)
html = build_report_html(data, client["name"], "February 2026")
pdf_bytes = generate_report(html, f"report_{client['id']}.pdf")
send_report_email(client["email"], pdf_bytes, client["name"])
print(f"Sent {len(clients)} monthly reports")
# Run on the 1st of each month
schedule.every().month.at("06:00").do(monthly_report_job)
while True:
schedule.run_pending()
time.sleep(60)
Node.js with node-cron
const cron = require('node-cron');
// Run at 6:00 AM on the 1st of every month
cron.schedule('0 6 1 * *', async () => {
console.log('Starting monthly report generation...');
const clients = await getActiveClients();
for (const client of clients) {
const data = await getClientData(client.id);
const pdf = await generateReport(data, client.name, 'February 2026');
await sendEmail(client.email, pdf);
}
console.log(`Sent ${clients.length} reports`);
});
Webhook-Triggered Reports
For event-driven reports (for example, generating a report each time a data import completes), use webhooks:
from flask import Flask, request
app = Flask(__name__)
@app.route("/webhook/data-import-complete", methods=["POST"])
def handle_import_complete():
payload = request.json
client_id = payload["client_id"]
period = payload["period"]
data = get_monthly_sales_data(client_id, period["month"], period["year"])
html = build_report_html(data, payload["client_name"], period["label"])
generate_report(html, f"report_{client_id}_{period['label']}.pdf")
return {"status": "report_generated"}, 200
LightningPDF vs Alternatives for Report Generation
| Feature | LightningPDF | Puppeteer (Self-Hosted) | DocRaptor | PDFShift |
|---|---|---|---|---|
| Generation speed | <100ms native, <2s Chromium | 1-3 seconds | 2-5 seconds | 1-3 seconds |
| Batch API | Up to 100 per call | Manual loops | Not available | Not available |
| Template marketplace | 15+ starter templates | None | None | None |
| CSS support | Full (flexbox, grid, gradients) | Full Chrome | PrinceXML subset | Full Chrome |
| Cost (5K reports/mo) | $29 (Pro) | ~$300 infra | $79 | $29 |
| Scheduling built-in | Webhooks + batch | DIY | Webhooks | Webhooks |
| Chart rendering | Inline SVG, Chart.js via Chromium | Full JS support | Limited JS | Full JS |
| Page break handling | Automatic + CSS control | CSS only | CSS Paged Media | CSS only |
For deeper comparisons, see LightningPDF vs Puppeteer for the build-vs-buy analysis, LightningPDF vs DocRaptor for print-quality considerations, and LightningPDF vs PDFShift for a feature-by-feature breakdown. Our full API comparison benchmarks all seven major PDF APIs across speed, features, and pricing.
Tips for Production-Quality Reports
Embedding Charts as SVG
Render charts as inline SVG -- they scale perfectly and don't need a network request. Unlike JavaScript-based charting libraries that require a browser runtime, SVG is static markup that renders identically in every PDF engine:
<svg width="400" height="200" viewBox="0 0 400 200">
<!-- Simple bar chart -->
<rect x="20" y="40" width="60" height="140" fill="#4F46E5" rx="4"/>
<rect x="100" y="80" width="60" height="100" fill="#7C3AED" rx="4"/>
<rect x="180" y="20" width="60" height="160" fill="#4F46E5" rx="4"/>
<rect x="260" y="100" width="60" height="80" fill="#7C3AED" rx="4"/>
<!-- Labels -->
<text x="50" y="195" text-anchor="middle" font-size="11" fill="#64748b">Q1</text>
<text x="130" y="195" text-anchor="middle" font-size="11" fill="#64748b">Q2</text>
<text x="210" y="195" text-anchor="middle" font-size="11" fill="#64748b">Q3</text>
<text x="290" y="195" text-anchor="middle" font-size="11" fill="#64748b">Q4</text>
</svg>
Handling Large Tables
When your report contains tables with hundreds of rows, page breaks become critical. Use CSS to keep the table header visible on every page and prevent rows from splitting:
thead { display: table-header-group; }
tfoot { display: table-footer-group; }
tr { page-break-inside: avoid; }
Multi-Page Layouts
For reports with distinct sections, force page breaks between them:
.section { page-break-before: always; }
.section:first-child { page-break-before: auto; }
Getting Started
- Sign up for a free LightningPDF account -- 50 PDFs per month, no credit card required.
- Grab your API key from the dashboard.
- Connect to your data source using the patterns above.
- Build or customize a report template from the marketplace, or design your own in the visual designer.
- Wire up generation with the Python or Node.js examples in this guide.
- Schedule your reports with cron, node-cron, or webhook triggers.
When you are ready to scale beyond the free tier, the Starter plan at $9/month covers 2,000 reports, and the Pro plan at $29/month handles 10,000 reports.
Read the full API documentation for additional options including landscape orientation, custom page sizes, header and footer injection, and watermarks.
Frequently Asked Questions
How do I automate PDF report generation?
Connect your data source to an HTML template using Python or Node.js, then send the populated HTML to the LightningPDF API. The API returns a base64-encoded PDF. Schedule the script with cron or a task scheduler to run automatically on a daily, weekly, or monthly basis without manual intervention.
Can I include charts and graphs in automated PDF reports?
Yes. The most reliable method is rendering charts as inline SVG directly in your HTML template. You can also use JavaScript charting libraries like Chart.js, which LightningPDF renders through its Chromium engine. For maximum compatibility and speed, pre-render charts server-side and embed them as SVG or base64 PNG images.
How many reports can I generate per month with LightningPDF?
The free tier includes 50 PDFs per month. The Starter plan at $9 per month covers 2,000 reports, the Pro plan at $29 per month handles 10,000, and higher tiers scale to 150,000 and beyond. Batch API calls let you generate up to 100 reports in a single request for maximum throughput.
Related Reading
- How to Auto-Generate PDF Invoices -- Invoice-specific automation patterns
- Best PDF Generation APIs in 2026 -- Compare all seven top PDF APIs
- Generate PDFs in Python -- Python integration guide
- Generate PDFs in Node.js -- Node.js tutorial with async examples
- Generate PDFs in Go -- Go tutorial with template examples
- HTML to PDF: The Complete Guide -- All conversion approaches compared
- How to Fix PDF Page Breaks -- Solve table and section break issues
- LightningPDF vs Puppeteer -- Build vs buy cost analysis
- WordPress PDF Plugin Guide -- Generate reports from WordPress
LightningPDF Team
Building fast, reliable PDF generation tools for developers.