Skip to main content

Postext Architecture

Updated: 2026-04-21|35 min|enes

If you have ever worked with React, you already understand the core trick. React builds a virtual DOM in memory, diffs it, and only then touches the real browser DOM. Postext does the same thing -- but instead of UI components, it builds a tree of pages, columns, text blocks, and bounding boxes. The entire geometry of a multi-page, multi-column document, computed before a single pixel is rendered. Every paragraph, heading, image, footnote, and pull quote placed at exact coordinates, respecting centuries-old typographic rules that CSS simply cannot express.

All of this is possible thanks to @chenglou/pretext, a DOM-free text measurement library that is 300--600x faster than browser layout reflow. For the story behind the project (a decade of failed attempts, the bottleneck that blocked them all, and the library that finally removed it), see the Introduction.

#The Core Idea

Picture a 74-page, two-column document. A company annual report, maybe, or a densely illustrated textbook. You hand it to Postext, and the engine constructs the entire layout in memory -- every page, every column, every paragraph's exact position and pixel dimensions. Want to know what sits on page 72, column 2? The answer is already there. No rendering required. The engine has already decided where to break every paragraph, where to place every image, how to avoid widows and orphans, and how to align baselines across adjacent columns.

Now here is why that matters so much.

Typographic rules are deeply, maddeningly interdependent. You fix a widow on page 5 -- a lonely last line stranded at the bottom of a column -- by pulling it back into the previous column. Great. But that change shortens page 5's column, which shifts content forward, which might just create an orphan on page 6. A single first line pushed onto a new column, disconnected from its paragraph. To even detect that you have created a new problem, you need the entire document layout available for inspection. And to fix it without creating yet another problem somewhere else, you need to be able to adjust, re-measure, and re-check the whole thing.

That is the "compute everything first, render later" philosophy. It is not a performance trick. It is the only way to apply the dozens of interconnected typographic rules that professional typesetters have used for centuries.

#Core Concepts

A quick glossary. The rest of the document assumes these terms — come back here whenever one has gone fuzzy.

TermDefinition
VDTVirtual Document Tree. The mutable in-place data structure that represents the entire document: pages, columns, blocks, inline segments, and bounding boxes. Analogous to a virtual DOM but for document layout geometry.
PageA fixed-size rectangular area. The engine is page-aware from the start. A document is an ordered sequence of pages.
ColumnA vertical subdivision of a page. Columns have a fixed width and maximum height. Text flows from one column to the next, then to the next page.
BlockA content unit that occupies vertical space in a column: paragraph, heading, image, table, blockquote, pull quote, or footnote area.
LineA measured line of text within a block, produced by Pretext. Each line has a bounding box and a baseline position.
Bounding Box x, y, width, height in pixels, relative to the page origin. Every node in the VDT carries one.
ResourceA non-text element (image, table, figure, pull quote) referenced from the markdown. Defined by PostextResource.
NoteA footnote, endnote, or margin note. Defined by PostextNote.
BackendA unified implementation of text measurement and output rendering for a specific target. Three are shipping today: canvas (bitmap preview), HTML (DOM-based screen reading) and PDF (print-ready output via postext-pdf).
PassOne stage of the layout pipeline. Each pass reads and mutates the VDT with a single responsibility.
Convergence LoopThe outer loop that re-runs layout passes when later passes invalidate earlier decisions. Bounded to a maximum of 5 iterations.

#System Architecture

Postext system architectureEnriched markdown and PostextConfig enter a parser that builds the Virtual Document Tree. Pretext measures text. Seven layout passes mutate the VDT inside a convergence loop. A backend renders the final VDT to HTML or PDF.EnrichedMarkdownPostextConfig(columns, rules, spacing)Layout EngineParserPass 1Virtual DocumentTree (VDT)mutable, in-placepages > columns > blocksevery node has a bboxdirty-flag trackingPretexttext measurementLayout Passes (read & mutate the VDT)Pass 2Text Measurement (via Pretext)Pass 3Page & Column PlacementPass 4Resource PlacementPass 5Typographic RefinementPass 6Column BalancingPass 7Vertical Rhythm Alignmentconvergence (max 5)Backend(unified interface)Canvas / BrowserPDFServer-side (future)HTML / PDFrendered output
Parser → VDT ↔ Pretext → seven layout passes → backend → output.

Here is the journey your content takes through the engine:

  1. The Parser reads your enriched markdown and config, building the initial VDT -- a tree of typed blocks with no positions yet, just content and structure
  2. Layout passes take over, mutating the VDT in sequence: measuring text via Pretext, flowing blocks into pages and columns, refining typography until it meets professional standards
  3. The Convergence loop watches for trouble -- when a later pass (say, fixing a widow) invalidates an earlier decision (say, column heights), the engine loops back and re-runs from the affected point. Up to 5 iterations, until everything settles
  4. The final VDT is the complete layout geometry: every element knows its page number, column assignment, position, and bounding box. The document is fully "typeset" before any rendering happens
  5. A backend walks the finished VDT and renders it to the target format -- a rasterized canvas bitmap, a DOM tree of positioned HTML elements, or a PDF document with embedded fonts. The same VDT feeds all three; picking a backend is purely an output decision

#Input Layer

#Content Model

The content model is a philosophy as much as it is a data structure. You describe what to say, not how to lay it out. The engine makes the layout decisions.

Content model: markdown, resources, notesThe input to Postext separates markdown (reading order and semantic structure) from resources (images, tables, figures) and notes (footnotes, endnotes). The engine resolves markers by ID and produces the VDT.PostextContentmarkdownreading order + reference markersresources[]images, tables, figuresnotes[]footnotes, endnotesresolved by id: ![fig:…], [^1]engineVDT
Markdown owns the reading order; resources and notes own the visual data.
// packages/postext/src/types.ts
 
interface PostextContent {
  markdown: string;              // enriched markdown with reference markers
  resources?: PostextResource[]; // images, tables, figures, pull quotes
  notes?: PostextNote[];         // footnotes, endnotes, margin notes
}

Resources carry visual metadata (dimensions, captions, alt text) and are referenced from the markdown by ID. Notes carry content and a marker style, referenced from inline positions in the markdown.

This separation is an opinionated design choice, and it matters more than it looks. The markdown owns the reading order and semantic structure -- what comes first, what is a heading, where a footnote is referenced. The resources and notes arrays own the visual data -- image dimensions, caption text, note content. By keeping them apart, the same markdown can be laid out in completely different ways just by changing the configuration. A two-column academic layout and a single-column blog post can share the same source content. And the engine can make placement decisions -- like deferring an image to the next column because it does not fit here -- without ever touching your source content.

// Example: a simple article with an image and a footnote
const content: PostextContent = {
  markdown: `
# The Art of Typography
 
The history of typography begins with Gutenberg's
movable type[^1]. His invention transformed the
production of books.
 
![fig:printing-press]
 
The technique spread rapidly across Europe, reaching
Italy by 1465 and France by 1470.
  `,
  resources: [
    {
      id: 'fig:printing-press',
      type: 'figure',
      src: '/images/gutenberg-press.jpg',
      alt: 'Reconstruction of Gutenberg\'s printing press',
      caption: 'A reconstruction of the original press.',
      width: 600,
      height: 400,
    },
  ],
  notes: [
    {
      id: '1',
      type: 'footnote',
      content: 'Johannes Gutenberg, c. 1400–1468, Mainz, Germany.',
    },
  ],
};

Notice how ![fig:printing-press] in the markdown is just a reference marker -- a name, nothing more. The engine resolves it against the resources array by ID, pulls the image dimensions, and decides where to place it based on the configured PlacementStrategy. Maybe it lands right there. Maybe the engine defers it to the top of the next column because the current one is almost full. Same story for [^1] -- the engine resolves it against the notes array and places the footnote at the bottom of the column, or the page, or the end of the section, depending on ReferenceConfig. The author never has to think about placement. The engine does.

Reference resolutionA markdown source uses a figure marker ![fig:printing-press] and a footnote marker [^1]. Resources and notes arrays provide the actual content by ID. The engine resolves both markers to produce a laid-out output with the figure in place and the footnote at the bottom.markdown# The Art of Typography![fig:printing-press]movable type[^1]. Invention …resources[]{ id: 'fig:printing-press', … }notes[]{ id: '1', content: 'Gutenberg…' }resolvelaid-out page[ figure ]¹ Gutenberg…
Markers in the markdown are just names. The engine resolves them against resources and notes.

#Configuration

Every aspect of the layout pipeline is controlled by PostextConfig:

ConfigControlsUsed in
ColumnConfigColumn count, gutter width, column rules, balancing flagPass 3, Pass 6
TypographyConfigLegacy typographic controls (spacing around figures, rag optimization). Per-field widow/orphan/runt/keep-together controls now live on BodyTextConfig and HeadingsConfig — see the Configuration page.Pass 5, Pass 7
ResourcePlacementConfigDefault placement strategy, deferred placement, aspect ratio preservationPass 4
ReferenceConfigFootnote placement, marker style, figure/table numbering, margin notesPass 1, Pass 3
PostextSectionOverridePer-section rule overrides via selectorsPass 1

#Parsing Strategy

Parsing is deliberately the simplest step in the pipeline. Markdown goes in, gets parsed into an AST, and each node becomes a VDTBlock. Resource and note references are resolved against the resources[] and notes[] arrays by ID. The output is a flat list of typed, content-filled blocks -- but with no page assignment, no column, no position.

Think of it as a manifest: "there is a heading, then a paragraph with 200 words, then a figure reference, then another paragraph." No measurements. No positioning. No layout decisions at all. The heavy work starts in Pass 2.

#Virtual Document Tree (VDT)

Imagine you asked a professional typesetter to lay out an entire book, but instead of handing you printed pages, they handed you a spreadsheet. Every row is an element. Every cell is a precise measurement: "the heading is at (40, 30), the first paragraph starts at (40, 78) and is 144px tall, the image goes at the top of column 2 on page 3..." That spreadsheet is the VDT.

The Virtual Document Tree is the central data structure of Postext -- a mutable, in-place tree representing every page, column, block, and line, each carrying a precise bounding box. Once the layout pipeline converges, the VDT is the answer. You can query "what is on page 72, column 2?" without rendering a single pixel.

#Why Mutable

This is the same approach used in game engine render pipelines, where a shared mutable world state is updated by successive systems on a tight loop. And for the same reason.

Immutable trees (like React's virtual DOM) allocate new objects on every change. That is fine for a UI with a few hundred components. But in a convergence loop that may run up to 5 iterations across 7 passes, potentially touching thousands of blocks, allocation pressure and garbage collection pauses become very real. The VDT uses in-place mutation with a dirty flag pattern instead: passes mark nodes dirty, and subsequent passes know exactly which nodes to re-examine. The engine remembers what changed so it does not redo valid work.

#Structure

Virtual Document Tree structureHierarchical VDT: document contains pages, each page contains columns, each column contains blocks (heading, paragraph, resource), each text block contains measured lines. Every node carries a bounding box, a dirty flag, and page/column indices.VDTDocumentVDTPage [0]VDTPage [1]VDTPage [n]VDTColumn [0]VDTColumn [1]headingparagraphresourceline 0line 1Every node carries:bbox: { x, y, w, h }dirty: booleanpageIndex: numbercolumnIndex: numberLines also carry:baseline: numberhyphenated: boolean
Every node carries bbox, dirty flag, and page/column indices.

#Type Definitions

// The root of the Virtual Document Tree
interface VDTDocument {
  pages: VDTPage[];
  config: PostextConfig;
  baselineGrid: number;       // baseline increment in px (e.g. 24 for 16px/1.5)
  converged: boolean;
  iterationCount: number;
}
 
// A physical page
interface VDTPage {
  index: number;
  width: number;
  height: number;
  columns: VDTColumn[];
  header?: VDTBlock;          // running header
  footer?: VDTBlock;          // running footer / page number
  marginNotes: VDTBlock[];
  footnoteArea?: VDTFootnoteArea;
}
 
// A column within a page
interface VDTColumn {
  index: number;
  bbox: BoundingBox;          // position within the page
  blocks: VDTBlock[];
  availableHeight: number;    // remaining vertical space
  baselineOffset: number;     // current baseline y-position
}
 
// A content block (paragraph, heading, image, etc.)
interface VDTBlock {
  id: string;
  type: 'paragraph' | 'heading' | 'resource' | 'blockquote'
      | 'listItem' | 'footnoteRef';
  bbox: BoundingBox;
  lines?: VDTLine[];          // for text blocks (populated by Pass 2)
  resource?: PostextResource; // for resource blocks
  pageIndex: number;
  columnIndex: number;
  dirty: boolean;             // needs re-layout
  snappedToGrid: boolean;     // baseline aligned to grid
}
 
// A measured line of text
interface VDTLine {
  text: string;
  bbox: BoundingBox;
  baseline: number;           // y-position of the text baseline
  hyphenated: boolean;        // line ends with a hyphen
}
 
// Bounding box — all values in px, relative to page origin
interface BoundingBox {
  x: number;
  y: number;
  width: number;
  height: number;
}
 
// Footnote area at the bottom of a page
interface VDTFootnoteArea {
  bbox: BoundingBox;
  notes: VDTBlock[];
  separator: boolean;         // draw a rule above footnotes
}

#Dirty Tracking

Dirty tracking is how the engine avoids redoing work it has already done correctly. When a pass moves or resizes a block, it sets dirty = true on that block and on every downstream block in the same column -- because their positions all depend on the changed block. The convergence loop can then skip unchanged subtrees entirely.

Here is a concrete example. Pass 5 inserts a hyphen into a paragraph on page 12, causing it to lose one line of height. That paragraph gets marked dirty. So do all blocks below it in the same column -- they all need to shift up by one line. But the blocks on page 11 and earlier? Untouched. Passes skip them completely on the next iteration.

The dirty flag doubles as the convergence signal: if no blocks are dirty after passes 5--7, the layout has converged and the engine stops iterating. Done.

#Layout Pipeline

Seven passes, each with one job. That is the entire layout pipeline.

The design borrows from game engine render pipelines -- shadow pass, lighting pass, post-processing pass -- where each system reads and mutates a shared world state and trusts that previous systems did their part. This makes individual passes easy to understand, test, and optimize in isolation. You can benchmark Pass 5 without thinking about Pass 3.

The key difference from a game engine is that a game renders each frame once and moves on. Postext cannot. Typographic decisions are deeply interdependent -- fixing a widow might change column heights, which affects balancing, which might create a new orphan -- so the pipeline may need to loop. Passes 3--7 run inside a convergence loop, iterating up to 5 times until the layout settles into a stable result.

#Pass 1: Content Structuring

  • Input: Raw PostextContent
  • Action: Parse markdown into an AST, resolve resource and note references against resources[] and notes[] by ID, create initial VDTBlock nodes
  • Output: Flat VDTBlock[] (typed and content-filled, but with no page or column assignment)
  • Runs once (not part of the convergence loop)

#Pass 2: Text Measurement

  • Input: VDTBlock[] with text content
  • Action: For each text block, call Pretext's prepare() to analyze the text, then layout() to compute height at the target column width. Store measured VDTLine[] and total height in each block
  • Key detail: Uses Pretext's layoutNextLine() for text flowing around obstacles (each line can have a different available width when a resource is floated alongside)
  • Output: Every text block has precise pixel dimensions
  • Re-runs when: Column widths change or text content changes (e.g., hyphenation inserted)

This is where Pretext earns its keep. The prepare() call is the expensive part -- it analyzes the text using the canvas font engine and caches the result. But the layout() call? Pure arithmetic, nearly free. That split is everything. Once text is prepared, the engine can re-layout at different widths -- trying column configurations, flowing text around an obstacle, testing what happens if a paragraph gains a hyphen -- all with negligible cost. Prepare once, layout as many times as you need.

// Simplified: how Pass 2 uses pretext internally
const prepared = prepare(paragraphText, '16px/1.5 Inter');
const { height } = layout(prepared, columnWidth, 24); // 24px line-height
// => "This paragraph is 168px tall at 320px column width — that's 7 lines."

#Pass 3: Page and Column Placement

  • Input: Measured blocks
  • Action: Flow blocks into pages and columns sequentially. Create VDTPage and VDTColumn nodes. Track availableHeight per column. When a block does not fit, advance to the next column or page
  • Strategy: Greedy first-fit placement. Column and page breaks follow the simplest valid assignment
  • Output: Every block has pageIndex, columnIndex, and bbox assigned

This is the moment the VDT becomes a real document. Before this pass, blocks are just a flat list with dimensions but no address. Pass 3 walks through them and assigns each one to a page and column, like pouring water into a grid of containers: fill column 1 until it overflows, spill into column 2, when the page is full start a new one.

Before any content blocks land, the pass reserves space for structural elements -- footnote areas at the bottom of pages (based on ReferenceConfig.footnotes.placement), running headers, footers, and margin columns. These reservations reduce the availableHeight of each column, so when content blocks start flowing in, the engine already knows exactly how much room is available.

#Pass 4: Resource Placement

  • Input: VDT with blocks placed in columns
  • Action: Place resources according to their PlacementStrategy:
StrategyBehavior
topOfColumnResource is placed at the top of the current or next column
inlineResource appears in the text flow at the reference point
floatLeftResource floats to the left; text wraps around it using layoutNextLine()
floatRightResource floats to the right; text wraps around it using layoutNextLine()
fullWidthBreakResource spans the full page width, breaking the column flow
marginResource is placed in a margin column alongside the referencing paragraph
  • Deferred placement: If a resource does not fit at its reference point, the engine finds the next viable position (controlled by ResourcePlacementConfig.deferPlacement)
  • Key complexity: Resources can displace text blocks, which may require re-measurement at different effective widths
  • Output: Resources positioned, surrounding text blocks adjusted

Resource placement is where things get interesting, because resources do not just occupy space -- they reshape the space around them. Here is the story of a floatRight image in a two-column layout. The paragraph that references the image now has to wrap around it. Some lines are shorter -- they share horizontal space with the image. Others are full-width -- they sit below the image. Pretext's layoutNextLine() API handles this elegantly, accepting a different available width for each line. But the consequence ripples outward: the paragraph's total height changes, which pushes subsequent blocks down, potentially spilling them onto the next column or the next page entirely.

And then there is deferred placement. Say an image is referenced at a point where only 50px of column space remains, but the image is 300px tall. It cannot go there. So the engine defers it to the top of the next column (or the next page), continues placing text, and inserts the image at the deferred position. The reader sees the image near (but not exactly at) the point where it is mentioned in the text. This is standard practice in professional typesetting; books do it constantly.

Placement rules. Beyond the strategy dispatch, resource placement follows strict editorial constraints:

  • After-reference rule. A figure, image, or table must always appear after its reference in the text, never before. The reader encounters the reference first, then sees the resource. If there is not enough space in the current column, the resource is deferred forward, never pulled backward.
  • Proximity rule. The resource must appear as close to its reference as possible. The engine minimizes the distance between the reference point and the actual placement, within the constraints of available space and the other rules.
  • Top-and-bottom placement. In both single-column and multi-column layouts, figures, images, illustrations, and tables are placed at the top or bottom of the page (never floating in the middle of a text block). Each resource appears with its caption and is numbered dynamically during layout, not in the source content.
  • Dynamic numbering. Resources are not numbered in the markdown. Numbering (Figure 1, Figure 2, Table 1...) is assigned during layout, after placement. This means inserting a new figure in the middle of the document does not require renumbering all subsequent references in the source.
  • Full-width in multi-column layouts. In multi-column layouts, resources that require full-width placement (spanning all columns) are placed at the top or bottom of the page, starting from the first column. They break the column flow, occupy the full page width, and the text resumes in the first column below (or above) the resource.

#Pass 5: Typographic Refinement

This is the pass that separates a layout engine from a text dumper. It enforces the editorial quality rules that professional typesetters have applied by hand for centuries -- and that naive text rendering completely ignores.

Pass 5 operates at two levels: penalty-based line breaking inside each paragraph, and structural keep-together enforcement between blocks. They work together, but they are distinct mechanisms.

Penalty-based widow/orphan/runt avoidance

Widows and orphans are the most visible signs of amateur typesetting:

  • A widow is a single line of a paragraph left alone at the bottom of a column. The paragraph continues in the next column, but that lone line looks stranded (as if the column ended prematurely).
  • An orphan is a single line of a paragraph stranded at the top of a column. The bulk of the paragraph is in the previous column, but one line spilled over (it looks disconnected from its context).
  • A runt is a paragraph whose last line is a single short word (or two) -- visually far too short to feel like a proper line of text. Less structurally severe than a widow, but just as jarring to a careful reader.

All three are handled by injecting demerits into the Knuth-Plass line-breaking algorithm. Rather than laying out a paragraph and then trying to repair a bad break after the fact, the engine teaches the line breaker that certain break sets are more expensive than others. The algorithm then picks the globally optimal break set that naturally avoids widows, orphans, and runts whenever possible.

Concretely, for every candidate break node in a paragraph:

  • If choosing this break would leave fewer than orphanMinLines lines at the top of the next column, add orphanPenalty (default 1000) to the node's demerits.
  • If choosing this break would leave fewer than widowMinLines lines at the bottom of the current column, add widowPenalty (default 1000).
  • If the last line produced from this break would be shorter than runtMinCharacters × normalSpaceWidth, inject runtPenalty (default 1000) as equivalent badness into the squared demerit formula -- so it competes on the same scale as line badness (which saturates at 10000) rather than being dwarfed by it.

These penalties sit alongside the usual demerits -- badness (squared adjustment ratio), hyphenation cost, and fitness-class mismatch -- in a single global optimisation. The algorithm is free to accept one of them if the alternative is worse (a paragraph with no legal break that satisfies every rule), but it will almost always find a break set that avoids them. List items opt into the same protection via avoidOrphansInLists, avoidWidowsInLists, avoidRuntsInLists (all true by default).

A fourth soft pressure, slackWeight, weights a squared "unused column space" cost so the algorithm prefers break sets that fill columns tightly. Together these demerits make Pass 5 a line-breaking refinement: most widow/orphan/runt cases are resolved inside the Knuth-Plass solver, not by letter-spacing tweaks after the fact.

All of this is tunable on BodyTextConfig -- see Configuration → Orphans, widows, runts, and keep-together rules. Setting any *Penalty to 0 effectively disables that rule.

Structural keep-together rules

Some groupings are bigger than a single paragraph -- they span adjacent blocks and cannot be addressed by line-breaking alone. Pass 5 enforces these at the block-placement level, moving entire groups forward when they would otherwise split across a column or page break:

  • Heading with its first paragraph. A heading must never appear at the bottom of a column if the paragraph it introduces would start in the next column. Enforced by headings.keepWithNext (default true): if there is not room for the heading plus the body's widow minimum (bodyText.widowMinLines, default 2) of the next block -- or just one line when avoidWidows is off -- the heading is pushed forward to travel with its text.
  • Consecutive headings. When multiple headings appear in sequence (e.g., an h2 followed by an h3 followed by a paragraph), the entire group must stay together. None of the headings can be left stranded at the bottom of a column without the content they introduce.
  • Colon-introduced lists. When a paragraph ends with a colon that directly introduces a list, the colon-bearing line must stay with the start of the list. Enforced by bodyText.keepColonWithList (default true): if placing the paragraph would leave no room for the first list item, the colon-bearing last line (or the whole paragraph, if it is one line) moves forward together with the list. Whenever this rule has to push the whole paragraph and a run of headings immediately precedes it in the column, those headings are pulled forward too so keepWithNext is not silently violated; the only exception is when the column contains just the heading(s) that a previous iteration already moved forward, in which case the engine keeps the paragraph with the heading and accepts the softer colon/list separation to avoid looping.
  • Figure with its caption. A figure and its caption are an inseparable unit. They always move together.

When a keep-together violation is detected, the engine pushes the entire group to the next column or page. The vacated space is handled by the normal column-filling mechanism (the line breaker has already chosen a break set that fits; if the resulting column is a little short, Pass 7 redistributes vertical space around grid-breaking elements to keep the baseline grid honest).

Output

Blocks whose measurements or placements changed are marked dirty for the next iteration of the convergence loop. In practice, because the heavy lifting is done inside Knuth-Plass rather than by post-hoc adjustments, most documents stabilize quickly -- the line breaker picks a good set of breaks the first time and subsequent iterations only have to deal with downstream effects of block movement and column balancing.

These corrections are invisible when done well (a reader should never notice them). But their absence is immediately obvious to anyone who reads carefully: that awkward single line at the top of a column, those uneven gaps where the engine gave up trying to make text fit. Professional publishers have entire style guides about preventing exactly these problems. Postext automates them.

#Pass 6: Column Balancing

  • Input: VDT with refined typography
  • Action: If ColumnConfig.balancing is true, equalize column heights on each page by moving blocks between columns to minimize the height difference
  • Constraint: Must not violate widow/orphan rules established in Pass 5
  • Output: Blocks may have moved between columns, marked dirty

You notice unbalanced columns immediately, especially on the last page of a chapter. A full left column and a nearly empty right column looks unfinished -- like the layout gave up halfway through. Balancing redistributes content so both columns land at roughly the same height, giving the spread a polished, intentional appearance.

The algorithm computes the total content height for all blocks on a page, divides by the number of columns to find the target height, and searches for the best column-break point that gets each column closest to that target. But it is not a simple split. This is a constraint satisfaction problem: the algorithm must respect keepTogether rules (a heading must stay with its first paragraph), honor minimum line counts, and -- crucially -- not undo the widow and orphan fixes that Pass 5 just worked so hard to establish.

#Pass 7: Vertical Rhythm Alignment

  • Input: VDT with balanced columns
  • Action: Snap baselines to the baseline grid by distributing spacing adjustments around headings, images, and other grid-breaking elements
  • Output: Adjusted spacing values; baselines aligned across columns
  • See: Vertical Rhythm System for the full algorithm

#Convergence Loop

Convergence loopPass 1 parses, Pass 2 measures, then passes 3 through 7 run inside a convergence loop. If any block remains dirty and the iteration count is under five, the loop re-runs from Pass 3.convergence loop (max 5 iterations)Pass 1parsePass 2measurePass 3placePass 4resourcesPass 5typographyPass 6balancePass 7rhythmif dirty blocks exist && iterations < 5
The engine loops back to Pass 3 until no dirty blocks remain (max 5 iterations).

Think of the convergence loop as the engine arguing with itself. Pass 5 picks a break set that avoids a widow inside paragraph A -- but doing so shortens paragraph A by a line, which leaves a gap at the bottom of column 2. Pass 6 re-balances the columns to compensate, which pushes a heading to a new column, which triggers keepWithNext and forces the heading back to the next column entirely. Pass 7 adjusts vertical rhythm, which might just create a new runt where the heading used to sit. So the engine loops back to Pass 3, re-places blocks with the updated measurements, and runs through the whole sequence again. Each iteration resolves more problems than it creates -- until eventually, nothing is dirty anymore.

Because most widow/orphan/runt cases are resolved inside the Knuth-Plass solver in a single pass of line breaking, typical documents now converge in 1--2 iterations. The loop is still needed when block-level events (a heading pushed forward by keepWithNext, a figure deferred by placement, or column balancing equalising heights) shift the column boundaries that Pass 5 measured against. When that happens, Pass 3 re-places, Pass 5 re-breaks with the new constraints, and the loop settles.

After passes 5--7 complete, the engine checks whether any blocks are marked dirty. If dirty blocks exist and the iteration count is below 5, the pipeline re-runs from Pass 3.

Convergence criteria:

  • No dirty blocks after passes 5--7, or
  • Maximum of 5 iterations reached (accept the best result so far)

The engine tracks a typographic violation score at each iteration -- a weighted sum of remaining problems: widows, orphans, unbalanced columns, baseline grid misalignment. Each type of violation carries a weight reflecting its visual severity (a widow is far more noticeable than a 2px grid drift). If the 5-iteration limit is reached without full convergence, the engine picks the iteration that produced the lowest violation score. Not necessarily the last one -- later iterations can sometimes overcorrect, fixing one problem while creating another.

In practice, most documents converge in 1--2 iterations thanks to penalty-based line breaking -- the Knuth-Plass solver resolves widows, orphans, and runts in a single pass of Pass 5. Further iterations are only needed when block-level events (keep-with-next, colon-with-list, figure deferral, column balancing) shift the column boundaries the line breaker measured against. The 5-iteration cap is a pragmatic safety valve: perfection is the enemy of done. Some pathological cases -- a page where every paragraph is exactly the wrong length to create widows no matter how you balance the columns -- will never fully converge. The engine accepts "best effort" and moves on.

#Vertical Rhythm System

Hold a well-typeset book up to the light. The lines on the left page align with the lines on the right. The baseline of line 5 in column 1 sits at exactly the same vertical position as the baseline of line 5 in column 2. That is vertical rhythm, and it is one of the first things a trained eye checks when evaluating typographic quality. It is also one of the key differentiators of Postext.

When both columns contain only body text at the same size, alignment is trivial -- every line is the same height, so baselines naturally match. The challenge arrives the moment one column contains a heading with a larger font size, an image with an arbitrary pixel height, or extra spacing around a block quote. These elements "break" the grid: the content below them shifts by an amount that is not a multiple of the baseline increment, and suddenly the baselines in that column fall out of sync with the adjacent column. The visual harmony is gone.

The goal is to get it back: baselines of body text in adjacent columns must align horizontally, even when headings, images, or other non-standard-height elements appear in one column but not the other.

#Baseline Grid

Everything anchors to a single number. The document defines a baselineGrid value derived from the body text's line height -- for example, body text set at 16px with a line-height of 1.5 produces a baseline grid of 24px. Every body text baseline should fall on a multiple of this value. That is the contract.

#Grid-Breaking Elements

Some elements inevitably break the grid because their height is not a multiple of baselineGrid:

  • Headings (larger font size, different line-height)
  • Images (arbitrary pixel height)
  • Tables (variable height)
  • Block quotes (may use different font size or padding)
  • Footnote separators (fixed height rule)

#Spacing Adjustment Algorithm

Vertical rhythm alignmentColumn 1 contains a heading that breaks the baseline grid by 12 pixels. The engine adds 12 pixels of spacing after the heading so the next body line falls back on the grid. Column 2 remains aligned throughout.Column 1Body text line (baseline: 24px)Body text lineHeading (36px tall)+12px spacing adjustmentBody text (back on grid)Body text lineBody text lineColumn 2Body text lineBody text lineBody text lineBody text lineBody text lineBody text lineBody text linealignedSpacing Adjustment Algorithm1. Walk each column top to bottom2. Track gridDrift = actualY - nearestGridLine(actualY)3. At each adjustable gap (heading space, figure space): compute correction needed to zero the drift4. Distribute correction — prefer expanding over compressing5. Clamp to min/max values from TypographyConfig.spacing024487296120144168
Spacing is adjusted after grid-breaking elements so baselines across columns stay in sync.

After adjusting spacing in each column independently, the engine verifies cross-column alignment: baselines at the same vertical position across columns should match. If they diverge -- because different columns have different grid-breaking elements -- a second alignment pass adjusts gaps in both columns to find a common rhythm.

Here is a concrete example. Column 1 has a 36px heading (1.5x the 24px grid). Column 2 has no heading. After the heading, column 1 has drifted 12px off the grid. The algorithm adds 12px of extra space after the heading -- bumping "space after heading" from 16px to 28px. Now the next body text line in column 1 falls on a grid line again, and its baseline matches the corresponding line in column 2. Harmony restored.

Edge cases:

  • A column with more grid-breaking elements than adjustable gaps accepts partial alignment (the algorithm does its best but cannot guarantee perfect grid alignment if there are too many disruptions and too few places to absorb the error)
  • An image taller than the column spans columns or pages (handled separately in Pass 4)
  • When the adjustment required would create visibly awkward spacing (e.g., 40px of space after a heading when the norm is 16px), the algorithm distributes the error across multiple gaps rather than concentrating it in one place

#Backend Interface

The backend is a single interface that handles both text measurement and output rendering for a specific target. Not two interfaces. One.

That is a deliberate choice, and it exists for a critical reason: the way you measure text must exactly match the way you render it. Imagine the measurement backend uses canvas font metrics, but the rendering backend uses a PDF library with slightly different kerning tables. The layout will not match the output. Lines that the engine measured as fitting in 320px might overflow or underflow when rendered. Every pixel of drift is a lie. By bundling measurement and rendering into one interface, each backend guarantees internal consistency: whatever font metrics it uses to measure are the same metrics it uses to draw.

This is why the PDF backend, for example, does not re-measure text: it consumes an already-converged VDT produced by the canvas measurement backend and translates its pixel coordinates into PDF points. The canvas metrics are the source of truth; PDF is a transport. Users of renderToPdf (from the postext-pdf package) pass the same VDT they would hand to renderToCanvas or renderToHtml, and the three outputs are guaranteed to agree on line breaks, column heights, and resource placement.

#Interface

interface PostextBackend {
  // Lifecycle
  initialize(config: PostextConfig): Promise<void>;
  dispose(): void;
 
  // Measurement
  measureText(text: string, style: TextStyle): MeasuredText;
  measureImage(resource: PostextResource): { width: number; height: number };
 
  // Rendering
  renderPage(page: VDTPage): void;
  renderBlock(block: VDTBlock): void;
  renderLine(line: VDTLine, style: TextStyle): void;
}
 
interface TextStyle {
  font: string;       // CSS font shorthand (e.g. '16px/1.5 Inter')
  tracking?: number;  // letter-spacing adjustment in px
  hyphenate?: boolean;
}
 
interface MeasuredText {
  lines: VDTLine[];
  height: number;
  width: number;
}

#Backends

BackendMeasurementRenderingStatus
CanvasPretext (canvas font metrics)Bitmap drawing on an HTMLCanvasElement (renderToCanvas, renderPage, renderPageToCanvas)Shipping
HTMLPretext (same metrics as canvas)Absolutely-positioned DOM nodes with editorial CSS (renderToHtml, renderToHtmlIndexed)Shipping
PDFConsumes the VDT already measured with PretextPDF page construction via pdf-lib with per-weight font embedding (renderToPdf in postext-pdf)Shipping
Server-sidePretext + node-canvasHeadless rendering for SSR / batch generationFuture

All three shipping backends consume the same VDTDocument. The split between postext (which exports the canvas and HTML backends) and postext-pdf (which exports the PDF backend) is purely about dependencies: the PDF path pulls in pdf-lib and @pdf-lib/fontkit, and most web integrations do not need them. Install postext-pdf only when you actually want to emit PDF bytes.

Browser-only constraint: In Phase 1, all layout computation happens client-side in the browser. The pipeline can run either on the main thread (buildDocument) or inside a dedicated Web Worker (createLayoutWorker from postext/worker) -- the worker path is the recommended integration for UI-driven apps because it keeps measurement and the convergence loop off the main thread, supports last-wins cancellation via AbortSignal, and owns its own measurement cache and math raster cache so successive rebuilds stay cheap. See Configuration -> Running layout in a Web Worker for the full integration pattern. Server-side rendering remains a deliberate scope decision for later -- nail the browser experience first, expand to other targets later.

#Performance Strategy

The difference between a sluggish tool and a magical one is about 10x. A 500ms layout means the user sees a visible stutter every time they resize the window. A 50ms layout feels instantaneous -- like the document was always there. That factor cannot be patched in later. It has to be designed in from day one.

Consider what the engine is up against: thousands of text blocks across hundreds of pages, with the entire layout potentially recomputed on every viewport resize. This is the same class of problem that game engines face -- processing thousands of objects (geometry, physics, lighting, AI) 60 times per second. They solve it with a pipeline architecture (multiple passes over shared mutable state, each pass doing one thing fast) and aggressive avoidance of unnecessary work (culling, dirty flags, spatial partitioning). Postext borrows every one of these ideas.

#Principles

  1. In-memory computation. The entire VDT fits in memory. No DOM reads during layout. The DOM is only touched at the very end, during rendering.

  2. Dirty tracking. Blocks carry a dirty flag. Passes skip clean subtrees. The convergence loop only re-runs from the earliest dirty point.

  3. Bounded convergence. Maximum 5 iterations is a hard guarantee. Worst-case performance is predictable and measurable.

  4. Pretext speed. Text measurement at 300–600x DOM speed means the engine can afford to re-measure text speculatively (trying different column widths, hyphenation points, tracking adjustments) without blocking the main thread.

  5. Off-main-thread builds. The postext/worker entry point runs the entire pipeline inside a dedicated Web Worker. The main thread posts { content, config } and an AbortSignal; the worker registers fonts (transferred as ArrayBuffers), runs the convergence loop, and posts back the finished VDTDocument. A newer build() call cancels the previous one cooperatively — the worker checks a per-block cancellation hook inside buildDocument and throws BuildCancelledError, so a user typing into an editor never waits on a superseded layout. The worker also maintains its own persistent measurement cache and a content-keyed math raster cache so that structured-cloned MathRender objects survive across rebuilds without re-rasterising.

  6. Flat numeric fields. Bounding boxes are stored as flat x, y, width, height fields on each node, not as nested objects. This avoids pointer chasing and is more cache-friendly.

  7. Dual-access VDT. The tree (pages > columns > blocks) gives hierarchical access for passes that need to work page-by-page or column-by-column (like Pass 6, column balancing). A parallel flat blocks[] array gives O(1) indexed access for passes that need to iterate all blocks regardless of their location (like Pass 5, widow/orphan detection). Both views reference the same block objects (there is no duplication, just two ways to traverse the same data).

#Resize Handling

When the user resizes the viewport, the engine does not rebuild from scratch. It updates column widths in the VDT, marks all text blocks as dirty, and re-runs the pipeline from Pass 2. Page and column structures are reused.

This is the mutable VDT paying dividends. Instead of discarding the entire layout and starting from zero, the engine reuses as much work as possible. The Pretext prepare() results are still valid -- they depend on font and text content, not width -- so only the cheap layout() calls need to re-run. A 50-page document can be fully re-laid-out by re-measuring all text blocks (fast, because prepare() is cached) and re-running passes 3--7, without re-parsing the markdown or re-resolving references. The user drags the window edge and the layout follows in real time.

#Benchmarking from Day One

Every pass is independently benchmarkable. Tests assert on performance, not just correctness:

// Example benchmark test
bench('layout 50-page document', () => {
  const vdt = createVDT(fiftyPageContent, config);
  runPipeline(vdt);
}, { time: 100 }); // must complete in under 100ms

Performance tests run alongside unit tests in vitest, and regressions are caught in CI. If a refactor makes Pass 5 twice as slow, the build breaks. Performance is a feature, not a hope.

#Data Flow

Data flow through the engineContent and config enter Pass 1 (parse) and Pass 2 (measure). Passes 3 through 7 run inside the convergence loop. Once converged, the VDT is handed to the backend, which renders HTML or PDF.Content +Config1 Parse2 Measureconvergence loop (max 5)3place4resources5typo6balance7rhythmif dirty && iterations < 5VDT (converged)Backend: RenderHTML / PDF
End-to-end data flow: parse, measure, converge, render.

#Non-Goals

Each of these limits is a deliberate choice — the engine is complex enough on its own, and taking on responsibilities that belong elsewhere would be the fastest way to never finish.

  • Server-side rendering. All layout runs in the browser. The engine depends on canvas font metrics (via Pretext), which require a browser environment. A server-side backend using node-canvas may come later, but it is not part of the initial design. Browser first.
  • WYSIWYG editing. Postext is a layout engine, not an editor. Content in, geometry out. Building an interactive editing surface -- cursor management, selection, undo/redo, input handling -- is an entirely separate problem. Postext can serve as the rendering backend for an editor, but it does not provide editing capabilities itself.
  • CSS column-count wrapper. Postext replaces CSS multi-column layout; it does not wrap it. It computes precise positioned geometry from scratch, because the browser's column layout algorithm lacks control over resource placement, widow/orphan prevention, and cross-column typographic rules. Those are the whole point.
  • Responsive breakpoint management. Postext computes layout at a given page size. The consumer decides when to re-layout (on viewport resize, on orientation change). Postext does not manage breakpoints, media queries, or responsive design decisions. That is your job.
  • Real-time collaborative editing. Postext is a stateless layout computation -- content in, geometry out -- not a collaborative document system with conflict resolution, operational transforms, or multi-user awareness.
  • Font loading or management. Postext assumes fonts are already loaded and available for measurement. Font loading, font fallback chains, and font subsetting are the consumer's responsibility. If a font is not loaded when Postext measures text, the measurements will use the browser's fallback font, and the layout will be wrong once the real font loads. Load your fonts first.

#Appendix: Relationship to Existing Types

Here is how each type already defined in packages/postext/src/types.ts maps to the architecture described above:

TypeArchitectural Role
PostextContentEntry point: the input to the engine (Pass 1)
PostextConfigControls all pipeline behavior across every pass
PostextResourceBecomes a VDTBlock of type 'resource' in Pass 1
PostextNoteBecomes a footnote, endnote, or margin note block in Pass 1
PlacementStrategyDispatches resource placement behavior in Pass 4
ColumnConfigDrives page/column creation (Pass 3) and balancing (Pass 6)
TypographyConfigDrives typographic refinement (Pass 5) and vertical rhythm (Pass 7)
ResourcePlacementConfigControls resource placement strategy and deferral in Pass 4
ReferenceConfigControls reference resolution (Pass 1) and footnote area reservation (Pass 3)
PostextSectionOverrideCreates zone-specific config overrides during Pass 1 parsing

#New Types Introduced by This Architecture

The VDT types (VDTDocument, VDTPage, VDTColumn, VDTBlock, VDTLine, BoundingBox, VDTFootnoteArea) and the backend types (PostextBackend, TextStyle, MeasuredText) are new to this architecture. They will live in dedicated files alongside the existing types.ts:

  • packages/postext/src/vdt.ts (Virtual Document Tree types)
  • packages/postext/src/backend.ts (Backend interface types)

These extend the existing type definitions without modifying them -- the current types remain untouched.