The 8-stage pipeline
EggPdf is a pure data transformation pipeline. Each stage takes a well-defined input type and produces a well-defined output type. No stage holds state between renders.
| Stage | Input | Output | Project |
|---|---|---|---|
| 1. HTML Parse | HTML string | DOM tree (HtmlDocument) |
EggPdf.Html |
| 2. CSS Parse | DOM + CSS text | Style sheets (CssStyleSheet[]) |
EggPdf.Css |
| 3. Style Resolve | DOM + style sheets | Styled tree (element → computed style) | EggPdf.Style |
| 4. Box Generate | Styled tree | Box tree (formatting boxes) | EggPdf.Layout |
| 5. Layout | Box tree + page dimensions | Layout tree (boxes with x/y/w/h) | EggPdf.Layout |
| 6. Fragment | Layout tree + page size | Paged frames (one frame per page) | EggPdf.Fragmentation |
| 7. Paint | Paged frames | Paint command list (abstract drawing ops) | EggPdf.Paint |
| 8. PDF Write | Paint commands + fonts + images | byte[] or Stream (PDF 1.7) |
EggPdf.Pdf |
Project structure
The pipeline maps to independent C# projects with strict one-way dependencies.
No circular dependencies are permitted; EggPdf.Core depends on nothing.
EggPdf (public API facade)
├── EggPdf.Html (depends on: Core)
├── EggPdf.Css (depends on: Core, Html)
├── EggPdf.Style (depends on: Core, Css, Text)
├── EggPdf.Layout (depends on: Core, Style, Text)
├── EggPdf.Fragmentation (depends on: Core, Layout)
├── EggPdf.Paint (depends on: Core, Layout, Fragmentation, Text)
├── EggPdf.Pdf (depends on: Core, Paint, Text)
├── EggPdf.Text (depends on: Core)
└── EggPdf.Core (no dependencies — primitives only)Key design decisions
Infallible parsers
The HTML and CSS parsers never throw. The HTML parser produces error-recovery DOM nodes per
the HTML5 specification — any string input, however malformed, produces a valid
HtmlDocument. The CSS parser skips invalid declarations and continues; no CSS
parse error will crash a render.
Region-based pagination
Rather than separating layout and pagination into two distinct passes, each layout element receives a sequence of regions: the remaining space on the current page, then full subsequent pages. The element decides how to split itself across those regions. This means new layout modes (flex, grid, table) get correct pagination automatically without extra pagination logic.
Three-state layout response
Every layout element reports one of three outcomes after layout:
- Fit — the element fits entirely in the current region.
- Split — the element partially fits; render the part that fits now and continue the remainder on the next page.
- Skip — the element does not fit at all; move it entirely to the next page.
This three-state contract drives automatic, lossless pagination across all layout modes.
Pluggable paint backend
The paint layer emits abstract drawing commands: draw text, draw rectangle, draw image, draw border. The PDF backend consumes these commands and produces PDF operators. A raster backend exists for visual regression testing — the layout engine produces pixel-accurate output without any PDF round-trip, making test diffs fast and deterministic.
Self-serializing PDF objects
Each PDF object writes itself to the output stream. A central reference table assigns object numbers and tracks byte offsets, which are then used to build the cross-reference table at the end of the file. This avoids buffering the entire document in memory.
Thread and memory model
HtmlToPdf is safe to use from multiple threads simultaneously. Each
RenderAsync() call creates its own pipeline instances — DOM tree, CSS cascade,
layout tree — with no shared mutable state. The only shared state (font cache, UA stylesheet)
is read-only after initialization and requires no locking.
Memory scales with document size but is bounded: only the current page's layout and paint data are in memory at any time. Previous pages have already been written to the output stream and their memory released.
Error handling strategy
| Error | Behavior | Warning code |
|---|---|---|
| Invalid HTML | Error-recovery DOM (HTML5 spec) | none |
| Invalid CSS declaration | Skip rule, continue | CSS_PARSE_ERROR |
| Unknown CSS property | Silently ignore | CSS_UNSUPPORTED |
| Font not found | Fall back: next in stack → system → Helvetica | FONT_NOT_FOUND |
| Image load failed | Render alt text in placeholder box | IMAGE_LOAD_FAILED |
| Layout overflow | Clip to page bounds | LAYOUT_OVERFLOW |