Compression
Reduce PDF file sizes by 50-90% through intelligent image recompression and stream optimization. FolioPDF's two-phase compression pipeline handles photo-heavy documents, scanned pages, and bloated report PDFs without destroying visual quality.
Overview
PDF compression in FolioPDF runs in two phases:
- Phase 1: Image recompression (PDFium + Skia) — Extracts every embedded image, re-encodes opaque images as JPEG at a target quality, and replaces the original only when the result is smaller.
- Phase 2: Stream optimization (qpdf) — Re-compresses internal PDF content streams with Flate deflation and packs structure data into object streams.
Both phases are conservative: they only replace data when the result is actually smaller. Compression never makes a file larger.
Quick Start
using FolioPDF.Toolkit.Pdfium;
// Compress with defaults (JPEG quality 65, stream optimization on)
byte[] smaller = PdfCompressor.Compress(File.ReadAllBytes("large-report.pdf"));
File.WriteAllBytes("large-report-compressed.pdf", smaller);
Three API Entry Points
1. Static Method (Byte Array)
byte[] compressed = PdfCompressor.Compress(pdfBytes);
// With options
byte[] compressed = PdfCompressor.Compress(pdfBytes, new CompressionOptions
{
ImageQuality = 50,
RecompressStreams = true
});
2. File-to-File
PdfCompressor.CompressFile("input.pdf", "output.pdf");
// With options
PdfCompressor.CompressFile("input.pdf", "output.pdf", new CompressionOptions
{
ImageQuality = 40,
RecompressStreams = true
});
3. PdfEditor Fluent Chain
Compress as part of a larger editing pipeline:
using FolioPDF.Fluent;
PdfEditor.Open("invoice.pdf")
.SetTitle("Invoice #42")
.SetAuthor("Billing Dept")
.Compress(new CompressionOptions { ImageQuality = 65 })
.Encrypt(new Encryption256Bit { OwnerPassword = "secret" })
.Save("invoice-final.pdf");
Compression Options
| Property | Type | Default | Description |
|---|---|---|---|
ImageQuality |
int |
65 |
JPEG quality for image recompression (1-100). Lower values produce smaller files with more compression artifacts. Set to 100 to skip image recompression entirely. |
DownsampleDpi |
int |
0 (disabled) |
Target DPI for downsampling high-resolution images. Images above this resolution are scaled down before recompression. Common values: 150 for screen, 300 for print. Note: this option is accepted but has no effect in the current release — images are recompressed at their native resolution. True downsampling is a planned follow-up. |
RecompressStreams |
bool |
true |
Re-compress PDF content streams with Flate and use object streams for internal structure. Typically saves an additional 5-15% on top of image recompression. |
RemoveStructureTree |
bool |
false |
Remove the PDF structure tree (/StructTreeRoot). Saves space but destroys accessibility data (screen reader support). Only use when the document will not be consumed by assistive technology. |
Quality Level Guide
Choose the right ImageQuality value for your use case:
| Quality | Use Case | Typical Reduction | Visual Impact |
|---|---|---|---|
| 30-40 | Aggressive archival, legal discovery, email-friendly | 80-95% | Noticeable JPEG artifacts on photos. Text remains readable. |
| 50-60 | Standard archival, internal reports, bulk processing | 60-85% | Minor artifacts on close inspection. Good for documents where images are supplementary. |
| 65 (default) | General purpose — invoices, reports, contracts | 50-80% | Good balance of size and quality. Artifacts are subtle even on full-page photos. |
| 75-85 | Photography-heavy documents, marketing materials | 30-60% | Minimal visible difference from the original. |
| 90-95 | Near-lossless, prepress where JPEG is acceptable | 10-30% | Virtually indistinguishable from original. |
| 100 | Skip image recompression entirely | 5-15% (streams only) | None — only stream optimization runs. |
How Image Compression Works
The image recompression pipeline processes every page in the document:
- Scan page objects. PDFium walks the page's content stream and identifies all image objects (
FPDF_PAGEOBJ_IMAGE). - Skip tiny images. Images smaller than 32x32 pixels (icons, bullets, decorations) are left untouched — recompressing them yields negligible savings.
- Check for transparency. The rendered bitmap is inspected for meaningful alpha (non-opaque pixels). Images with transparency are skipped because JPEG cannot represent alpha, and replacing a well-compressed PNG with raw BGRA pixels would typically increase file size.
- Render to pixels. PDFium decodes the image (regardless of original format: JPEG, PNG/Flate, JPEG 2000, CCITT, raw) into a BGRA32 pixel buffer.
- Re-encode as JPEG. Skia's JPEG encoder compresses the pixel buffer at the target quality.
- Compare sizes. The new JPEG is only used if it is strictly smaller than the original compressed stream. Otherwise the original image is preserved.
- Regenerate content. PDFium rewrites only the pages where at least one image was replaced.
Alpha image handling: Images with transparency (PNG with alpha channel, TIFF with alpha) are never recompressed. JPEG cannot encode alpha, so replacing a transparent PNG with JPEG would require storing the alpha channel separately — which adds complexity for minimal savings. These images are preserved as-is.
How Stream Optimization Works
The stream optimization phase uses qpdf to:
- Re-compress Flate streams with optimal deflation parameters (some PDF generators use sub-optimal compression levels).
- Pack structure data into object streams, which reduces internal overhead by grouping small indirect objects into single compressed containers.
- Normalize stream delimiters to remove redundant whitespace before
endstreammarkers.
Stream optimization alone typically saves 5-15% for well-structured PDFs and up to 30% for PDFs generated by tools with poor compression (e.g. older versions of Microsoft Print to PDF).
Practical Examples
Batch Compression
var options = new CompressionOptions { ImageQuality = 60 };
foreach (string file in Directory.GetFiles("invoices/", "*.pdf"))
{
string outFile = Path.Combine("compressed/", Path.GetFileName(file));
PdfCompressor.CompressFile(file, outFile, options);
var original = new FileInfo(file).Length;
var compressed = new FileInfo(outFile).Length;
double ratio = 1.0 - (double)compressed / original;
Console.WriteLine($"{Path.GetFileName(file)}: {original:N0} -> {compressed:N0} ({ratio:P0} reduction)");
}
Compress Before Email
byte[] pdfBytes = GenerateMonthlyReport();
// Aggressive compression for email attachment
byte[] small = PdfCompressor.Compress(pdfBytes, new CompressionOptions
{
ImageQuality = 45,
RecompressStreams = true
});
Console.WriteLine($"Original: {pdfBytes.Length:N0} bytes");
Console.WriteLine($"Compressed: {small.Length:N0} bytes");
Console.WriteLine($"Reduction: {1.0 - (double)small.Length / pdfBytes.Length:P0}");
SendEmail("report@company.com", "Monthly Report", small);
Compress with Generate Pipeline
using FolioPDF;
using FolioPDF.Fluent;
using FolioPDF.Helpers;
using FolioPDF.Toolkit.Pdfium;
// Generate -> compress -> encrypt -> save
PdfEditor.Create(doc =>
{
doc.Page(page =>
{
page.Size(PageSizes.A4);
page.Margin(40);
page.Content().Column(col =>
{
col.Item().Text("Annual Report 2026").FontSize(24).Bold();
col.Item().Image(File.ReadAllBytes("hero-photo.jpg"));
col.Item().Text("Lorem ipsum dolor sit amet...");
});
});
})
.Compress(new CompressionOptions
{
ImageQuality = 70,
RecompressStreams = true
})
.Linearize()
.Save("annual-report.pdf");
Stream-Only Optimization
When images are already well-compressed (e.g. from a professional design tool) and you only want structural optimization:
byte[] optimized = PdfCompressor.Compress(pdfBytes, new CompressionOptions
{
ImageQuality = 100, // skip image recompression
RecompressStreams = true // only do stream optimization
});
Maximum Compression (Archival)
byte[] archived = PdfCompressor.Compress(pdfBytes, new CompressionOptions
{
ImageQuality = 40,
RecompressStreams = true,
RemoveStructureTree = true // WARNING: destroys accessibility
});
Accessibility warning: Setting RemoveStructureTree = true strips the document's semantic structure tree (/StructTreeRoot). This destroys screen reader support and makes the document non-compliant with PDF/UA. Only use this option for internal archival where accessibility is not required.
Size Comparison Example
Typical compression results for a 10-page document with embedded photographs:
| Configuration | Original | Compressed | Reduction |
|---|---|---|---|
| Streams only (quality 100) | 12.4 MB | 11.2 MB | 10% |
| Quality 85 | 12.4 MB | 5.8 MB | 53% |
| Quality 65 (default) | 12.4 MB | 3.1 MB | 75% |
| Quality 50 | 12.4 MB | 2.0 MB | 84% |
| Quality 40 | 12.4 MB | 1.5 MB | 88% |
Results vary based on image content, original compression, and the ratio of images to text. Text-heavy documents with few images see smaller improvements.
What Compression Does Not Do
- Does not remove pages or content. Every page and every visible element is preserved.
- Does not downsample images (yet). The
DownsampleDpioption is accepted but has no effect in the current release. Images are recompressed at their native resolution. - Does not convert vector art to raster. Vector paths, text, and gradients pass through unchanged.
- Does not strip fonts. Embedded fonts remain embedded.
- Does not modify form fields or annotations. Interactive elements are preserved.
- Never increases file size. Both phases compare output to input and use the smaller result. A no-op compression returns the original bytes.
Error Handling
| Exception | Cause |
|---|---|
ArgumentNullException | Null PDF bytes |
ArgumentException | Empty PDF bytes or empty/whitespace file paths |
FileNotFoundException | Input file not found (file-based overload) |
If either phase encounters an internal error (corrupt page, unreadable image), it falls back to the uncompressed input rather than throwing. This ensures compression is always safe to call on arbitrary input.