Compression

Reduce PDF file sizes by 50-90% through intelligent image recompression and stream optimization. FolioPDF's two-phase compression pipeline handles photo-heavy documents, scanned pages, and bloated report PDFs without destroying visual quality.

Overview

PDF compression in FolioPDF runs in two phases:

  1. Phase 1: Image recompression (PDFium + Skia) — Extracts every embedded image, re-encodes opaque images as JPEG at a target quality, and replaces the original only when the result is smaller.
  2. Phase 2: Stream optimization (qpdf) — Re-compresses internal PDF content streams with Flate deflation and packs structure data into object streams.

Both phases are conservative: they only replace data when the result is actually smaller. Compression never makes a file larger.

Quick Start

using FolioPDF.Toolkit.Pdfium;

// Compress with defaults (JPEG quality 65, stream optimization on)
byte[] smaller = PdfCompressor.Compress(File.ReadAllBytes("large-report.pdf"));
File.WriteAllBytes("large-report-compressed.pdf", smaller);

Three API Entry Points

1. Static Method (Byte Array)

byte[] compressed = PdfCompressor.Compress(pdfBytes);

// With options
byte[] compressed = PdfCompressor.Compress(pdfBytes, new CompressionOptions
{
    ImageQuality = 50,
    RecompressStreams = true
});

2. File-to-File

PdfCompressor.CompressFile("input.pdf", "output.pdf");

// With options
PdfCompressor.CompressFile("input.pdf", "output.pdf", new CompressionOptions
{
    ImageQuality = 40,
    RecompressStreams = true
});

3. PdfEditor Fluent Chain

Compress as part of a larger editing pipeline:

using FolioPDF.Fluent;

PdfEditor.Open("invoice.pdf")
    .SetTitle("Invoice #42")
    .SetAuthor("Billing Dept")
    .Compress(new CompressionOptions { ImageQuality = 65 })
    .Encrypt(new Encryption256Bit { OwnerPassword = "secret" })
    .Save("invoice-final.pdf");

Compression Options

PropertyTypeDefaultDescription
ImageQuality int 65 JPEG quality for image recompression (1-100). Lower values produce smaller files with more compression artifacts. Set to 100 to skip image recompression entirely.
DownsampleDpi int 0 (disabled) Target DPI for downsampling high-resolution images. Images above this resolution are scaled down before recompression. Common values: 150 for screen, 300 for print. Note: this option is accepted but has no effect in the current release — images are recompressed at their native resolution. True downsampling is a planned follow-up.
RecompressStreams bool true Re-compress PDF content streams with Flate and use object streams for internal structure. Typically saves an additional 5-15% on top of image recompression.
RemoveStructureTree bool false Remove the PDF structure tree (/StructTreeRoot). Saves space but destroys accessibility data (screen reader support). Only use when the document will not be consumed by assistive technology.

Quality Level Guide

Choose the right ImageQuality value for your use case:

QualityUse CaseTypical ReductionVisual Impact
30-40 Aggressive archival, legal discovery, email-friendly 80-95% Noticeable JPEG artifacts on photos. Text remains readable.
50-60 Standard archival, internal reports, bulk processing 60-85% Minor artifacts on close inspection. Good for documents where images are supplementary.
65 (default) General purpose — invoices, reports, contracts 50-80% Good balance of size and quality. Artifacts are subtle even on full-page photos.
75-85 Photography-heavy documents, marketing materials 30-60% Minimal visible difference from the original.
90-95 Near-lossless, prepress where JPEG is acceptable 10-30% Virtually indistinguishable from original.
100 Skip image recompression entirely 5-15% (streams only) None — only stream optimization runs.

How Image Compression Works

The image recompression pipeline processes every page in the document:

  1. Scan page objects. PDFium walks the page's content stream and identifies all image objects (FPDF_PAGEOBJ_IMAGE).
  2. Skip tiny images. Images smaller than 32x32 pixels (icons, bullets, decorations) are left untouched — recompressing them yields negligible savings.
  3. Check for transparency. The rendered bitmap is inspected for meaningful alpha (non-opaque pixels). Images with transparency are skipped because JPEG cannot represent alpha, and replacing a well-compressed PNG with raw BGRA pixels would typically increase file size.
  4. Render to pixels. PDFium decodes the image (regardless of original format: JPEG, PNG/Flate, JPEG 2000, CCITT, raw) into a BGRA32 pixel buffer.
  5. Re-encode as JPEG. Skia's JPEG encoder compresses the pixel buffer at the target quality.
  6. Compare sizes. The new JPEG is only used if it is strictly smaller than the original compressed stream. Otherwise the original image is preserved.
  7. Regenerate content. PDFium rewrites only the pages where at least one image was replaced.

Alpha image handling: Images with transparency (PNG with alpha channel, TIFF with alpha) are never recompressed. JPEG cannot encode alpha, so replacing a transparent PNG with JPEG would require storing the alpha channel separately — which adds complexity for minimal savings. These images are preserved as-is.

How Stream Optimization Works

The stream optimization phase uses qpdf to:

  • Re-compress Flate streams with optimal deflation parameters (some PDF generators use sub-optimal compression levels).
  • Pack structure data into object streams, which reduces internal overhead by grouping small indirect objects into single compressed containers.
  • Normalize stream delimiters to remove redundant whitespace before endstream markers.

Stream optimization alone typically saves 5-15% for well-structured PDFs and up to 30% for PDFs generated by tools with poor compression (e.g. older versions of Microsoft Print to PDF).

Practical Examples

Batch Compression

var options = new CompressionOptions { ImageQuality = 60 };

foreach (string file in Directory.GetFiles("invoices/", "*.pdf"))
{
    string outFile = Path.Combine("compressed/", Path.GetFileName(file));
    PdfCompressor.CompressFile(file, outFile, options);

    var original = new FileInfo(file).Length;
    var compressed = new FileInfo(outFile).Length;
    double ratio = 1.0 - (double)compressed / original;
    Console.WriteLine($"{Path.GetFileName(file)}: {original:N0} -> {compressed:N0} ({ratio:P0} reduction)");
}

Compress Before Email

byte[] pdfBytes = GenerateMonthlyReport();

// Aggressive compression for email attachment
byte[] small = PdfCompressor.Compress(pdfBytes, new CompressionOptions
{
    ImageQuality = 45,
    RecompressStreams = true
});

Console.WriteLine($"Original: {pdfBytes.Length:N0} bytes");
Console.WriteLine($"Compressed: {small.Length:N0} bytes");
Console.WriteLine($"Reduction: {1.0 - (double)small.Length / pdfBytes.Length:P0}");

SendEmail("report@company.com", "Monthly Report", small);

Compress with Generate Pipeline

using FolioPDF;
using FolioPDF.Fluent;
using FolioPDF.Helpers;
using FolioPDF.Toolkit.Pdfium;

// Generate -> compress -> encrypt -> save
PdfEditor.Create(doc =>
{
    doc.Page(page =>
    {
        page.Size(PageSizes.A4);
        page.Margin(40);
        page.Content().Column(col =>
        {
            col.Item().Text("Annual Report 2026").FontSize(24).Bold();
            col.Item().Image(File.ReadAllBytes("hero-photo.jpg"));
            col.Item().Text("Lorem ipsum dolor sit amet...");
        });
    });
})
.Compress(new CompressionOptions
{
    ImageQuality = 70,
    RecompressStreams = true
})
.Linearize()
.Save("annual-report.pdf");

Stream-Only Optimization

When images are already well-compressed (e.g. from a professional design tool) and you only want structural optimization:

byte[] optimized = PdfCompressor.Compress(pdfBytes, new CompressionOptions
{
    ImageQuality = 100,       // skip image recompression
    RecompressStreams = true   // only do stream optimization
});

Maximum Compression (Archival)

byte[] archived = PdfCompressor.Compress(pdfBytes, new CompressionOptions
{
    ImageQuality = 40,
    RecompressStreams = true,
    RemoveStructureTree = true  // WARNING: destroys accessibility
});

Accessibility warning: Setting RemoveStructureTree = true strips the document's semantic structure tree (/StructTreeRoot). This destroys screen reader support and makes the document non-compliant with PDF/UA. Only use this option for internal archival where accessibility is not required.

Size Comparison Example

Typical compression results for a 10-page document with embedded photographs:

ConfigurationOriginalCompressedReduction
Streams only (quality 100)12.4 MB11.2 MB10%
Quality 8512.4 MB5.8 MB53%
Quality 65 (default)12.4 MB3.1 MB75%
Quality 5012.4 MB2.0 MB84%
Quality 4012.4 MB1.5 MB88%

Results vary based on image content, original compression, and the ratio of images to text. Text-heavy documents with few images see smaller improvements.

What Compression Does Not Do

  • Does not remove pages or content. Every page and every visible element is preserved.
  • Does not downsample images (yet). The DownsampleDpi option is accepted but has no effect in the current release. Images are recompressed at their native resolution.
  • Does not convert vector art to raster. Vector paths, text, and gradients pass through unchanged.
  • Does not strip fonts. Embedded fonts remain embedded.
  • Does not modify form fields or annotations. Interactive elements are preserved.
  • Never increases file size. Both phases compare output to input and use the smaller result. A no-op compression returns the original bytes.

Error Handling

ExceptionCause
ArgumentNullExceptionNull PDF bytes
ArgumentExceptionEmpty PDF bytes or empty/whitespace file paths
FileNotFoundExceptionInput file not found (file-based overload)

If either phase encounters an internal error (corrupt page, unreadable image), it falls back to the uncompressed input rather than throwing. This ensures compression is always safe to call on arbitrary input.