The 8-stage pipeline

EggPdf is a pure data transformation pipeline. Each stage takes a well-defined input type and produces a well-defined output type. No stage holds state between renders.

Stage Input Output Project
1. HTML Parse HTML string DOM tree (HtmlDocument) EggPdf.Html
2. CSS Parse DOM + CSS text Style sheets (CssStyleSheet[]) EggPdf.Css
3. Style Resolve DOM + style sheets Styled tree (element → computed style) EggPdf.Style
4. Box Generate Styled tree Box tree (formatting boxes) EggPdf.Layout
5. Layout Box tree + page dimensions Layout tree (boxes with x/y/w/h) EggPdf.Layout
6. Fragment Layout tree + page size Paged frames (one frame per page) EggPdf.Fragmentation
7. Paint Paged frames Paint command list (abstract drawing ops) EggPdf.Paint
8. PDF Write Paint commands + fonts + images byte[] or Stream (PDF 1.7) EggPdf.Pdf

Project structure

The pipeline maps to independent C# projects with strict one-way dependencies. No circular dependencies are permitted; EggPdf.Core depends on nothing.

EggPdf (public API facade)
  ├── EggPdf.Html        (depends on: Core)
  ├── EggPdf.Css         (depends on: Core, Html)
  ├── EggPdf.Style       (depends on: Core, Css, Text)
  ├── EggPdf.Layout      (depends on: Core, Style, Text)
  ├── EggPdf.Fragmentation (depends on: Core, Layout)
  ├── EggPdf.Paint       (depends on: Core, Layout, Fragmentation, Text)
  ├── EggPdf.Pdf         (depends on: Core, Paint, Text)
  ├── EggPdf.Text        (depends on: Core)
  └── EggPdf.Core        (no dependencies — primitives only)

Key design decisions

Infallible parsers

The HTML and CSS parsers never throw. The HTML parser produces error-recovery DOM nodes per the HTML5 specification — any string input, however malformed, produces a valid HtmlDocument. The CSS parser skips invalid declarations and continues; no CSS parse error will crash a render.

Region-based pagination

Rather than separating layout and pagination into two distinct passes, each layout element receives a sequence of regions: the remaining space on the current page, then full subsequent pages. The element decides how to split itself across those regions. This means new layout modes (flex, grid, table) get correct pagination automatically without extra pagination logic.

Three-state layout response

Every layout element reports one of three outcomes after layout:

  • Fit — the element fits entirely in the current region.
  • Split — the element partially fits; render the part that fits now and continue the remainder on the next page.
  • Skip — the element does not fit at all; move it entirely to the next page.

This three-state contract drives automatic, lossless pagination across all layout modes.

Pluggable paint backend

The paint layer emits abstract drawing commands: draw text, draw rectangle, draw image, draw border. The PDF backend consumes these commands and produces PDF operators. A raster backend exists for visual regression testing — the layout engine produces pixel-accurate output without any PDF round-trip, making test diffs fast and deterministic.

Self-serializing PDF objects

Each PDF object writes itself to the output stream. A central reference table assigns object numbers and tracks byte offsets, which are then used to build the cross-reference table at the end of the file. This avoids buffering the entire document in memory.

Thread and memory model

HtmlToPdf is safe to use from multiple threads simultaneously. Each RenderAsync() call creates its own pipeline instances — DOM tree, CSS cascade, layout tree — with no shared mutable state. The only shared state (font cache, UA stylesheet) is read-only after initialization and requires no locking.

Memory scales with document size but is bounded: only the current page's layout and paint data are in memory at any time. Previous pages have already been written to the output stream and their memory released.

Error handling strategy

Error Behavior Warning code
Invalid HTML Error-recovery DOM (HTML5 spec) none
Invalid CSS declaration Skip rule, continue CSS_PARSE_ERROR
Unknown CSS property Silently ignore CSS_UNSUPPORTED
Font not found Fall back: next in stack → system → Helvetica FONT_NOT_FOUND
Image load failed Render alt text in placeholder box IMAGE_LOAD_FAILED
Layout overflow Clip to page bounds LAYOUT_OVERFLOW