The difference between amateur and professional typesetting lives in the spaces between words.
Open any paperback novel. The text is justified — both edges of every paragraph are perfectly aligned. But look closer. The spaces between words are nearly uniform, line after line. No rivers of white space running down the page. No lines where two words sit marooned with a vast ocean between them. Achieving this is far harder than it looks, and it is the single problem that has consumed more typographic engineering effort than any other.
Postext solves it with the same algorithm that TeX has used since 1981: Knuth-Plass optimal line breaking. Combined with TeX-quality hyphenation patterns, configurable word-spacing bounds, and a visual debug system for identifying problem lines, the engine produces justified text that meets publication standards.
#The Problem
When text is set with textAlign: 'justify', every line (except the last) must be stretched or compressed to fill the exact column width. The engine distributes the difference between the natural content width and the column width across the inter-word spaces on that line.
If a line has many words, each space absorbs a tiny adjustment — invisible to the reader. But if a line has few words (because one long word forced an early break), each space must stretch dramatically. The result is a loose line: a line where the word spacing is so wide that it disrupts reading rhythm and creates ugly visual gaps.
The opposite problem exists too. If the engine packs too many words onto a line, the spaces shrink below their natural width, producing a tight line where words feel cramped together.
A naive line-breaking algorithm — the kind CSS uses — makes decisions one line at a time. It fills the current line with as many words as possible, breaks, and moves on. This greedy first-fit approach has a fundamental weakness: it cannot see the future. A decision that looks optimal for line 5 might force line 6 into a terrible break. By the time the algorithm reaches line 6, it is too late — line 5 is already committed.
#Knuth-Plass: Seeing the Whole Paragraph
The Knuth-Plass algorithm, published by Donald Knuth and Michael Plass in 1981, takes a radically different approach. Instead of breaking one line at a time, it considers every possible way to break the entire paragraph and picks the combination that minimizes the total "badness" across all lines. It is the algorithm that powers TeX, and it is why TeX-typeset documents have been the gold standard for justified text for over four decades.
#The Box-Glue-Penalty Model
Knuth-Plass does not think in terms of words and spaces. It models text as a sequence of three primitives:
| Primitive | Represents | Behavior |
|---|---|---|
| Box | A word or text fragment | Has a fixed width. Cannot be stretched or compressed. Cannot be broken. |
| Glue | Inter-word space | Has a natural width, a stretch capacity, and a shrink capacity. The engine can adjust glue within these bounds to fill the line. |
| Penalty | A potential break point | Has a cost. Low penalty = cheap break. High penalty = expensive break. A flagged penalty means a hyphenation point (adds a visible hyphen if used). |
A paragraph becomes a sequence like:
[box "The"] [glue] [box "quick"] [glue] [box "brown"] [glue] [box "fox"]
[penalty -∞] ← forced break at paragraph end
When hyphenation is active, long words are split into fragments separated by penalties:
[box "ty"] [penalty 50, flagged] [box "pog"] [penalty 50, flagged] [box "raphy"]
Each flagged penalty carries a cost of 50 — not free, but cheaper than producing a loose line.
#How It Finds the Optimum
The algorithm uses dynamic programming. It maintains a set of active nodes — potential breakpoints that could start new lines — and evaluates every feasible break from each active node. For each candidate break, it computes:
-
Adjustment ratio (r): how much the glue on this line needs to stretch or shrink.
r = 0means the line fits perfectly.r > 0means stretching (loose).r < 0means shrinking (tight). -
Badness: a measure of how uneven the spacing is, computed as
100 × |r|³. The cubic growth means that a slightly loose line is tolerable, but a very loose line is severely penalized. A line withr = 2has badness 800; a line withr = 0.5has badness 12. -
Demerits: the total cost of breaking here, combining badness, any penalty at the break point, and two additional penalties:
- Consecutive hyphen demerit (default 3000): penalizes two hyphenated lines in a row, because stacked hyphens are visually distracting.
- Fitness class demerit (default 100): penalizes adjacent lines of very different tightness. If a tight line sits next to a very loose line, the contrast is jarring.
-
Fitness class: each line is classified as tight (
r < -0.5), normal (-0.5 ≤ r < 0.5), loose (0.5 ≤ r < 1.0), or very loose (r ≥ 1.0). Adjacent lines with fitness classes more than one step apart incur the fitness demerit.
The four costs are summed into a single number per candidate break, and the algorithm picks the sequence with the lowest total cost across the whole paragraph — that is the entire advantage over greedy.
The algorithm traces back through the active nodes to find the path with the lowest total demerits — the globally optimal set of breakpoints for the entire paragraph.
Orphan, widow, and runt penalties
Postext extends the standard demerit set with three editorial penalties that push the line breaker away from break sets that would leave visually poor paragraph ends:
- Orphan penalty applies when choosing this break would leave fewer than
orphanMinLineslines at the top of the next column. DefaultorphanPenaltyis1000. - Widow penalty applies when the break would leave fewer than
widowMinLineslines at the bottom of the current column. DefaultwidowPenaltyis1000. - Runt penalty applies when the paragraph's final line would be shorter than roughly
runtMinCharacters × normalSpaceWidthpixels. DefaultruntPenaltyis1000. Unlike orphan/widow (which add linearly to the split demerit), runt is injected as equivalent badness inside the Knuth–Plass squared formula so it competes on the same scale as line badness (which saturates at 10000) rather than being dwarfed by it.
All three are added to the candidate node's demerits before the algorithm selects the globally optimal path. Because they live inside the same optimisation, the solver is free to trade a slightly looser line for the elimination of a widow, and it will naturally prefer break sets that avoid all three whenever possible. Tune the trade-off by adjusting orphanPenalty, widowPenalty, or runtPenalty; set any of them to 0 to disable that rule entirely. List items opt in via avoidOrphansInLists, avoidWidowsInLists, and avoidRuntsInLists (all true by default).
#Why It Matters
The practical difference is visible. In a greedy layout, you will find paragraphs where one line is noticeably looser than its neighbors — and if you look carefully, you will see it happened because the previous line grabbed one word too many. Knuth-Plass avoids this by trading a slightly worse current line for a much better next one, because it can see the consequences.
#Postext's Implementation
Postext implements the full Knuth-Plass algorithm in knuthPlass.ts (approximately 860 lines). Two adapter paths convert text into the box-glue-penalty model:
- Plain text path: uses
@chenglou/pretextfor DOM-free text measurement. Pretext provides segment widths and discretionary-hyphen widths; Postext converts them to KP items. - Rich text path: handles bold and italic spans using Canvas-based measurement. Each styled token becomes one or more boxes, with hyphenation break points inserted as penalties.
Both paths compute per-line justifiedSpaceRatio — the actual space width divided by the natural space width — which feeds into the loose-line debug system described below.
Fallback behavior: if Knuth-Plass produces no valid breaks (which can happen with extremely narrow columns or very long words), the engine falls back to Pretext's greedy layoutNextLine(). This ensures the layout always completes.
#Hyphenation
Hyphenation and justification are inseparable. Without hyphenation, the engine's only way to avoid a loose line is to move a word to the next line — which often just shifts the problem. Hyphenation gives the engine a much larger set of break points to consider, dramatically improving the quality of justified text.
#TeX-Quality Patterns
Postext uses Hypher (hypher v0.2.5) for hyphenation, powered by TeX/Liang hyphenation patterns. These are the same patterns that TeX has used since 1983 — a compact representation of syllable-boundary rules derived by Frank Liang's pattern-generation algorithm from large word corpora.
The patterns encode a set of numbered rules that, when overlaid on a word, indicate where breaks are allowed (odd numbers) and forbidden (even numbers). The leftmin and rightmin parameters in each language's pattern file ensure a minimum number of characters before and after any break point. For English (en-us), these are typically 2 and 3 respectively — so a word must have at least 2 characters before the hyphen and 3 after.
#Supported Locales
| Locale code | Language |
|---|---|
'en-us' | English (US) |
'es' | Spanish |
'fr' | French |
'de' | German |
'it' | Italian |
'pt' | Portuguese |
'ca' | Catalan |
'nl' | Dutch |
Each locale loads its own pattern set. Hypher instances are lazily created and cached — the first call for a given locale pays the initialization cost; subsequent calls are instant. If an unknown locale is passed, the engine falls back to en-us.
#Why Hypher (and Not a Custom Algorithm)
The original Postext hyphenation system used a custom vowel-based heuristic: it detected syllable boundaries by finding vowel clusters, common prefixes (over-, under-, inter-), and common suffixes (-tion, -ment, -sion). It was simple and fast, but fundamentally limited:
| Aspect | Custom heuristic | Hypher (TeX patterns) |
|---|---|---|
| Accuracy | Good for common words, unreliable for unusual ones. Vowel detection misses many valid break points and creates invalid ones. | Near-perfect. Patterns are generated from large corpora and have been refined for over 40 years. |
| Language coverage | Had to manually define vowel sets, prefixes, and suffixes for each language — tedious and error-prone. | Pattern files exist for 50+ languages, maintained by the TeX community. Adding a language means adding one import. |
| Industry standard | Not a recognized standard. No tooling or community support. | The same patterns used by TeX, LibreOffice, Firefox, Chrome, and virtually every professional typesetting system. |
| Maintenance | Every edge case is a bug to fix manually. | Community-maintained pattern files. Bug fixes come from upstream. |
| Bundle size | ~170 lines, no dependencies. | Hypher core is ~3 KB. Each language pattern file adds 20–80 KB (gzipped: 5–20 KB). All eight bundled locales add roughly 300 KB (gzipped: ~80 KB). |
| Performance | Very fast (simple string scanning). | Fast (trie lookup per character). Negligible in practice — hyphenation is never the bottleneck. |
The trade-off is clear: a larger bundle in exchange for dramatically better correctness and zero maintenance burden. For a layout engine targeting publication-grade output, correctness wins. A single bad hyphenation in a printed book is more expensive than a few extra kilobytes of patterns.
#How Hyphenation Integrates with Knuth-Plass
Before text enters the Knuth-Plass algorithm, the engine pre-processes it with Hypher, inserting soft hyphens (Unicode \u00AD) at every legal break point. These invisible characters are then mapped to KP penalties with a cost of 50 and flagged: true.
The algorithm treats hyphenation breaks as just another option to evaluate alongside natural word boundaries (where glue allows a break). If using a hyphen produces a lower total demerits than leaving the line loose, the algorithm takes the hyphen. If not, it leaves the word intact.
The consecutive-hyphen demerit (default 3000) ensures the algorithm strongly avoids placing hyphens on two adjacent lines — a typographic convention that virtually all style guides enforce.
Hyphenation is only applied when both bodyText.hyphenation.enabled is true and bodyText.textAlign is 'justify'. Left-aligned text does not benefit from hyphenation because the right edge is intentionally ragged.
#Word Spacing Bounds
The glue model gives the engine explicit bounds on how much inter-word spaces can stretch or shrink. These bounds are controlled by two configuration properties on bodyText:
| Property | Default | Description |
|---|---|---|
maxWordSpacing | 2 | Upper bound for word spacing, as a multiplier of the normal space width. At the default value, spaces can stretch up to 200% of their natural width. |
minWordSpacing | 0.6 | Lower bound for word spacing, as a multiplier of the normal space width. At the default value, spaces can shrink down to 60% of their natural width. |
These multipliers translate directly to the glue's stretch and shrink values in the Knuth-Plass model:
stretchPerSpace = normalSpaceWidth × (maxWordSpacing - 1)
shrinkPerSpace = normalSpaceWidth × (1 - minWordSpacing)
At the defaults (2 / 0.6), if the normal space width is 4 px:
- Each space can stretch by 4 px (from 4 px to 8 px)
- Each space can shrink by 1.6 px (from 4 px to 2.4 px)
Tighter bounds (e.g., maxWordSpacing: 1.2) produce more uniform spacing but give the algorithm less room to maneuver, which may result in more hyphenation or, in extreme cases, overflow. Looser bounds (e.g., maxWordSpacing: 2.5) give the algorithm more flexibility but allow visibly uneven spacing on some lines.
The defaults of 2 and 0.6 favor the algorithm's flexibility — giving Knuth-Plass enough room to avoid overflow, hyphenation, and runts across narrow columns, while still being well within ranges typographic literature considers acceptable.
#Optimal vs. Greedy Line Breaking
The bodyText.optimalLineBreaking property (default: true) controls which line-breaking algorithm the engine uses:
true: Knuth-Plass dynamic-programming algorithm. Evaluates all possible break sets and picks the globally optimal one. This is the recommended setting for any justified text.false: Greedy first-fit, powered by Pretext'slayoutNextLine(). Faster but produces lower-quality results. Use this only when performance matters more than typographic quality (e.g., real-time preview at very high character counts).
When Knuth-Plass is active and produces no valid breaks (which can happen with extremely narrow columns or words longer than the column width), the engine automatically falls back to greedy breaking for that paragraph.
#Loose-Line Debugging
Even with Knuth-Plass and hyphenation, some lines will be looser than ideal — especially in narrow columns with long words, or in languages with few hyphenation opportunities. The loose-line highlight debug feature helps you find these problem lines instantly.
#How It Works
Every line in the Virtual Document Tree carries a justifiedSpaceRatio — the ratio of the actual justified space width to the font's natural space width. A value of 1.0 means the spaces are at their natural width. A value of 2.5 means the spaces are 2.5 times wider than normal.
When debug.looseLineHighlight.enabled is true, the renderer paints a semi-transparent overlay on every line whose justifiedSpaceRatio exceeds the configured threshold. The default threshold is 3.0 — meaning only lines with spaces three times wider than normal are highlighted. This is a deliberately high bar; lines this loose are genuine typographic problems.
#Configuration
The loose-line highlight is part of the debug section of PostextConfig:
| Property | Type | Default | Description |
|---|---|---|---|
looseLineHighlight.enabled | boolean | false | Whether to highlight loose lines. |
looseLineHighlight.color | ColorValue | #ff000040 | Color of the highlight overlay. The default is a semi-transparent red. |
looseLineHighlight.threshold | number | 3 | Multiplier of the normal space width above which a line is considered loose. Lower values catch more lines; higher values highlight only the worst offenders. |
debug: {
looseLineHighlight: {
enabled: true,
threshold: 2.5,
color: { hex: '#ff660040', model: 'hex' },
},
}#Interpreting the Results
When you enable loose-line highlighting and see red bands across certain lines, it means the engine could not find a way to set those lines without excessive word spacing. Start at the top: causes are ordered from most to least likely.
| Cause | Solution |
|---|---|
| Column is too narrow for the font size | Increase column width, decrease font size, or switch to a single-column layout. |
| Long words with few hyphenation points | Verify hyphenation is enabled and the correct locale is set. Some technical terms or proper nouns have no valid break points. |
| Hyphenation is disabled | Enable bodyText.hyphenation.enabled. Justification without hyphenation is almost always worse. |
| Word spacing bounds are too tight | Increase maxWordSpacing slightly (e.g., from 2 to 2.5). This gives the algorithm more room. |
| Language has long compound words (e.g., German) | Ensure the correct locale is set. German hyphenation patterns handle compound words well, but only if the engine knows it is German. |
#Complete Example
A full configuration showcasing all hyphenation and justification settings:
import { buildDocument } from 'postext';
const vdt = buildDocument(content, {
bodyText: {
fontFamily: 'EB Garamond',
fontSize: { value: 9, unit: 'pt' },
textAlign: 'justify',
// Knuth-Plass optimal line breaking (default: true)
optimalLineBreaking: true,
// Hyphenation
hyphenation: {
enabled: true,
locale: 'es',
},
// Word spacing bounds (multipliers of normal space width)
maxWordSpacing: 2, // spaces stretch up to 200%
minWordSpacing: 0.6, // spaces shrink down to 60%
},
// Debug: highlight lines with excessive spacing
debug: {
looseLineHighlight: {
enabled: true,
threshold: 2.5,
color: { hex: '#ff000040', model: 'hex' },
},
},
});For the complete list of body text configuration options, see the Configuration page. For how the layout pipeline uses these settings during text measurement, see the Architecture page.