Skip to main content

Hyphenation & Justification

Updated: 2026-04-21|20 min|enes

The difference between amateur and professional typesetting lives in the spaces between words.

Open any paperback novel. The text is justified — both edges of every paragraph are perfectly aligned. But look closer. The spaces between words are nearly uniform, line after line. No rivers of white space running down the page. No lines where two words sit marooned with a vast ocean between them. Achieving this is far harder than it looks, and it is the single problem that has consumed more typographic engineering effort than any other.

Postext solves it with the same algorithm that TeX has used since 1981: Knuth-Plass optimal line breaking. Combined with TeX-quality hyphenation patterns, configurable word-spacing bounds, and a visual debug system for identifying problem lines, the engine produces justified text that meets publication standards.

#The Problem

When text is set with textAlign: 'justify', every line (except the last) must be stretched or compressed to fill the exact column width. The engine distributes the difference between the natural content width and the column width across the inter-word spaces on that line.

If a line has many words, each space absorbs a tiny adjustment — invisible to the reader. But if a line has few words (because one long word forced an early break), each space must stretch dramatically. The result is a loose line: a line where the word spacing is so wide that it disrupts reading rhythm and creates ugly visual gaps.

The opposite problem exists too. If the engine packs too many words onto a line, the spaces shrink below their natural width, producing a tight line where words feel cramped together.

A naive line-breaking algorithm — the kind CSS uses — makes decisions one line at a time. It fills the current line with as many words as possible, breaks, and moves on. This greedy first-fit approach has a fundamental weakness: it cannot see the future. A decision that looks optimal for line 5 might force line 6 into a terrible break. By the time the algorithm reaches line 6, it is too late — line 5 is already committed.

#Knuth-Plass: Seeing the Whole Paragraph

The Knuth-Plass algorithm, published by Donald Knuth and Michael Plass in 1981, takes a radically different approach. Instead of breaking one line at a time, it considers every possible way to break the entire paragraph and picks the combination that minimizes the total "badness" across all lines. It is the algorithm that powers TeX, and it is why TeX-typeset documents have been the gold standard for justified text for over four decades.

#The Box-Glue-Penalty Model

Knuth-Plass does not think in terms of words and spaces. It models text as a sequence of three primitives:

PrimitiveRepresentsBehavior
BoxA word or text fragmentHas a fixed width. Cannot be stretched or compressed. Cannot be broken.
GlueInter-word spaceHas a natural width, a stretch capacity, and a shrink capacity. The engine can adjust glue within these bounds to fill the line.
PenaltyA potential break pointHas a cost. Low penalty = cheap break. High penalty = expensive break. A flagged penalty means a hyphenation point (adds a visible hyphen if used).

A paragraph becomes a sequence like:

[box "The"] [glue] [box "quick"] [glue] [box "brown"] [glue] [box "fox"]
[penalty -∞]  ← forced break at paragraph end

When hyphenation is active, long words are split into fragments separated by penalties:

[box "ty"] [penalty 50, flagged] [box "pog"] [penalty 50, flagged] [box "raphy"]

Each flagged penalty carries a cost of 50 — not free, but cheaper than producing a loose line.

Box-glue-penalty primitivesVisual key for the three primitives: box for a word fragment, glue for inter-word space, and penalty for a potential break point. The example paragraph shows how a hyphenation point becomes a flagged penalty.Boxfixed wordGluestretchable spacePenaltypossible breakExample: “The art of typography is old.”The·art·of·typog-raphy·is·old.breakpoints ⇄ glue and penalty positions
Every paragraph becomes a sequence of boxes, glues, and penalties.

#How It Finds the Optimum

The algorithm uses dynamic programming. It maintains a set of active nodes — potential breakpoints that could start new lines — and evaluates every feasible break from each active node. For each candidate break, it computes:

  1. Adjustment ratio (r): how much the glue on this line needs to stretch or shrink. r = 0 means the line fits perfectly. r > 0 means stretching (loose). r < 0 means shrinking (tight).

  2. Badness: a measure of how uneven the spacing is, computed as 100 × |r|³. The cubic growth means that a slightly loose line is tolerable, but a very loose line is severely penalized. A line with r = 2 has badness 800; a line with r = 0.5 has badness 12.

  3. Demerits: the total cost of breaking here, combining badness, any penalty at the break point, and two additional penalties:

    • Consecutive hyphen demerit (default 3000): penalizes two hyphenated lines in a row, because stacked hyphens are visually distracting.
    • Fitness class demerit (default 100): penalizes adjacent lines of very different tightness. If a tight line sits next to a very loose line, the contrast is jarring.
  4. Fitness class: each line is classified as tight (r < -0.5), normal (-0.5 ≤ r < 0.5), loose (0.5 ≤ r < 1.0), or very loose (r ≥ 1.0). Adjacent lines with fitness classes more than one step apart incur the fitness demerit.

The four costs are summed into a single number per candidate break, and the algorithm picks the sequence with the lowest total cost across the whole paragraph — that is the entire advantage over greedy.

Badness as a function of adjustment ratioBadness grows as 100 times the absolute value of r cubed. Small stretches or compressions are cheap, but extreme stretches become extremely costly, which is why the algorithm avoids them.r = 0tight (r < 0)loose (r > 0)rbadness0badness(r) = 100 · |r|³
Badness grows cubically: slightly uneven is tolerable, very uneven is punished hard.
Fitness classesEvery line is classified as tight, normal, loose, or very loose based on its adjustment ratio. Adjacent lines whose classes differ by more than one step are penalized.Tightr < -0.5squeezedNormal-0.5 ≤ r < 0.5just rightLoose0.5 ≤ r < 1.0a bit airyVery looser ≥ 1.0very airyFitness demerit applies when neighbouring lines are more than one class apart.
Adjacent lines more than one class apart incur the fitness demerit.

The algorithm traces back through the active nodes to find the path with the lowest total demerits — the globally optimal set of breakpoints for the entire paragraph.

Orphan, widow, and runt penalties

Postext extends the standard demerit set with three editorial penalties that push the line breaker away from break sets that would leave visually poor paragraph ends:

  • Orphan penalty applies when choosing this break would leave fewer than orphanMinLines lines at the top of the next column. Default orphanPenalty is 1000.
  • Widow penalty applies when the break would leave fewer than widowMinLines lines at the bottom of the current column. Default widowPenalty is 1000.
  • Runt penalty applies when the paragraph's final line would be shorter than roughly runtMinCharacters × normalSpaceWidth pixels. Default runtPenalty is 1000. Unlike orphan/widow (which add linearly to the split demerit), runt is injected as equivalent badness inside the Knuth–Plass squared formula so it competes on the same scale as line badness (which saturates at 10000) rather than being dwarfed by it.

All three are added to the candidate node's demerits before the algorithm selects the globally optimal path. Because they live inside the same optimisation, the solver is free to trade a slightly looser line for the elimination of a widow, and it will naturally prefer break sets that avoid all three whenever possible. Tune the trade-off by adjusting orphanPenalty, widowPenalty, or runtPenalty; set any of them to 0 to disable that rule entirely. List items opt in via avoidOrphansInLists, avoidWidowsInLists, and avoidRuntsInLists (all true by default).

#Why It Matters

Greedy first-fit vs Knuth-Plass optimalSide-by-side illustration: the greedy algorithm takes the first line that fits and poisons later lines, producing ragged paragraphs with uneven spacing. Knuth-Plass evaluates the whole paragraph globally and produces even lines.Greedy first-fitDecides line by line, no look-ahead✗ One bad break poisons the next line✗ Uneven spacing across the paragraph✗ Rivers of white space✗ No cost model for adjacent lines= What CSS doesKnuth-Plass optimalEvaluates every possible break set✓ Globally minimal demerits✓ Even spacing throughout✓ Consecutive-hyphen penalty✓ Fitness-class smoothness= What Postext does
A local-greedy decision loses where a global-optimal plan wins.

The practical difference is visible. In a greedy layout, you will find paragraphs where one line is noticeably looser than its neighbors — and if you look carefully, you will see it happened because the previous line grabbed one word too many. Knuth-Plass avoids this by trading a slightly worse current line for a much better next one, because it can see the consequences.

#Postext's Implementation

Postext implements the full Knuth-Plass algorithm in knuthPlass.ts (approximately 860 lines). Two adapter paths convert text into the box-glue-penalty model:

  • Plain text path: uses @chenglou/pretext for DOM-free text measurement. Pretext provides segment widths and discretionary-hyphen widths; Postext converts them to KP items.
  • Rich text path: handles bold and italic spans using Canvas-based measurement. Each styled token becomes one or more boxes, with hyphenation break points inserted as penalties.

Both paths compute per-line justifiedSpaceRatio — the actual space width divided by the natural space width — which feeds into the loose-line debug system described below.

Fallback behavior: if Knuth-Plass produces no valid breaks (which can happen with extremely narrow columns or very long words), the engine falls back to Pretext's greedy layoutNextLine(). This ensures the layout always completes.

#Hyphenation

Hyphenation and justification are inseparable. Without hyphenation, the engine's only way to avoid a loose line is to move a word to the next line — which often just shifts the problem. Hyphenation gives the engine a much larger set of break points to consider, dramatically improving the quality of justified text.

Justified text with and without hyphenationThe left column has no hyphenation enabled: line widths vary wildly because the engine can only move whole words. The right column has hyphenation enabled: line widths are nearly uniform and one word is broken with a hyphen.Without hyphenationUneven lines, long gaps, visible rivers.With hyphenation-Even lines, one hyphen absorbs the variance.
Hyphenation dramatically reduces spacing variance in justified text.

#TeX-Quality Patterns

Postext uses Hypher (hypher v0.2.5) for hyphenation, powered by TeX/Liang hyphenation patterns. These are the same patterns that TeX has used since 1983 — a compact representation of syllable-boundary rules derived by Frank Liang's pattern-generation algorithm from large word corpora.

The patterns encode a set of numbered rules that, when overlaid on a word, indicate where breaks are allowed (odd numbers) and forbidden (even numbers). The leftmin and rightmin parameters in each language's pattern file ensure a minimum number of characters before and after any break point. For English (en-us), these are typically 2 and 3 respectively — so a word must have at least 2 characters before the hyphen and 3 after.

#Supported Locales

Locale codeLanguage
'en-us'English (US)
'es'Spanish
'fr'French
'de'German
'it'Italian
'pt'Portuguese
'ca'Catalan
'nl'Dutch

Each locale loads its own pattern set. Hypher instances are lazily created and cached — the first call for a given locale pays the initialization cost; subsequent calls are instant. If an unknown locale is passed, the engine falls back to en-us.

#Why Hypher (and Not a Custom Algorithm)

The original Postext hyphenation system used a custom vowel-based heuristic: it detected syllable boundaries by finding vowel clusters, common prefixes (over-, under-, inter-), and common suffixes (-tion, -ment, -sion). It was simple and fast, but fundamentally limited:

AspectCustom heuristicHypher (TeX patterns)
AccuracyGood for common words, unreliable for unusual ones. Vowel detection misses many valid break points and creates invalid ones.Near-perfect. Patterns are generated from large corpora and have been refined for over 40 years.
Language coverageHad to manually define vowel sets, prefixes, and suffixes for each language — tedious and error-prone.Pattern files exist for 50+ languages, maintained by the TeX community. Adding a language means adding one import.
Industry standardNot a recognized standard. No tooling or community support.The same patterns used by TeX, LibreOffice, Firefox, Chrome, and virtually every professional typesetting system.
MaintenanceEvery edge case is a bug to fix manually.Community-maintained pattern files. Bug fixes come from upstream.
Bundle size~170 lines, no dependencies.Hypher core is ~3 KB. Each language pattern file adds 20–80 KB (gzipped: 5–20 KB). All eight bundled locales add roughly 300 KB (gzipped: ~80 KB).
PerformanceVery fast (simple string scanning).Fast (trie lookup per character). Negligible in practice — hyphenation is never the bottleneck.

The trade-off is clear: a larger bundle in exchange for dramatically better correctness and zero maintenance burden. For a layout engine targeting publication-grade output, correctness wins. A single bad hyphenation in a printed book is more expensive than a few extra kilobytes of patterns.

#How Hyphenation Integrates with Knuth-Plass

Before text enters the Knuth-Plass algorithm, the engine pre-processes it with Hypher, inserting soft hyphens (Unicode \u00AD) at every legal break point. These invisible characters are then mapped to KP penalties with a cost of 50 and flagged: true.

The algorithm treats hyphenation breaks as just another option to evaluate alongside natural word boundaries (where glue allows a break). If using a hyphen produces a lower total demerits than leaving the line loose, the algorithm takes the hyphen. If not, it leaves the word intact.

The consecutive-hyphen demerit (default 3000) ensures the algorithm strongly avoids placing hyphens on two adjacent lines — a typographic convention that virtually all style guides enforce.

Hyphenation is only applied when both bodyText.hyphenation.enabled is true and bodyText.textAlign is 'justify'. Left-aligned text does not benefit from hyphenation because the right edge is intentionally ragged.

#Word Spacing Bounds

The glue model gives the engine explicit bounds on how much inter-word spaces can stretch or shrink. These bounds are controlled by two configuration properties on bodyText:

PropertyDefaultDescription
maxWordSpacing2Upper bound for word spacing, as a multiplier of the normal space width. At the default value, spaces can stretch up to 200% of their natural width.
minWordSpacing0.6Lower bound for word spacing, as a multiplier of the normal space width. At the default value, spaces can shrink down to 60% of their natural width.

These multipliers translate directly to the glue's stretch and shrink values in the Knuth-Plass model:

stretchPerSpace = normalSpaceWidth × (maxWordSpacing - 1)
shrinkPerSpace  = normalSpaceWidth × (1 - minWordSpacing)

At the defaults (2 / 0.6), if the normal space width is 4 px:

  • Each space can stretch by 4 px (from 4 px to 8 px)
  • Each space can shrink by 1.6 px (from 4 px to 2.4 px)

Tighter bounds (e.g., maxWordSpacing: 1.2) produce more uniform spacing but give the algorithm less room to maneuver, which may result in more hyphenation or, in extreme cases, overflow. Looser bounds (e.g., maxWordSpacing: 2.5) give the algorithm more flexibility but allow visibly uneven spacing on some lines.

The defaults of 2 and 0.6 favor the algorithm's flexibility — giving Knuth-Plass enough room to avoid overflow, hyphenation, and runts across narrow columns, while still being well within ranges typographic literature considers acceptable.

#Optimal vs. Greedy Line Breaking

The bodyText.optimalLineBreaking property (default: true) controls which line-breaking algorithm the engine uses:

  • true: Knuth-Plass dynamic-programming algorithm. Evaluates all possible break sets and picks the globally optimal one. This is the recommended setting for any justified text.
  • false: Greedy first-fit, powered by Pretext's layoutNextLine(). Faster but produces lower-quality results. Use this only when performance matters more than typographic quality (e.g., real-time preview at very high character counts).

When Knuth-Plass is active and produces no valid breaks (which can happen with extremely narrow columns or words longer than the column width), the engine automatically falls back to greedy breaking for that paragraph.

#Loose-Line Debugging

Even with Knuth-Plass and hyphenation, some lines will be looser than ideal — especially in narrow columns with long words, or in languages with few hyphenation opportunities. The loose-line highlight debug feature helps you find these problem lines instantly.

#How It Works

Every line in the Virtual Document Tree carries a justifiedSpaceRatio — the ratio of the actual justified space width to the font's natural space width. A value of 1.0 means the spaces are at their natural width. A value of 2.5 means the spaces are 2.5 times wider than normal.

When debug.looseLineHighlight.enabled is true, the renderer paints a semi-transparent overlay on every line whose justifiedSpaceRatio exceeds the configured threshold. The default threshold is 3.0 — meaning only lines with spaces three times wider than normal are highlighted. This is a deliberately high bar; lines this loose are genuine typographic problems.

#Configuration

The loose-line highlight is part of the debug section of PostextConfig:

PropertyTypeDefaultDescription
looseLineHighlight.enabledbooleanfalseWhether to highlight loose lines.
looseLineHighlight.colorColorValue#ff000040Color of the highlight overlay. The default is a semi-transparent red.
looseLineHighlight.thresholdnumber3Multiplier of the normal space width above which a line is considered loose. Lower values catch more lines; higher values highlight only the worst offenders.
debug: {
  looseLineHighlight: {
    enabled: true,
    threshold: 2.5,
    color: { hex: '#ff660040', model: 'hex' },
  },
}

#Interpreting the Results

When you enable loose-line highlighting and see red bands across certain lines, it means the engine could not find a way to set those lines without excessive word spacing. Start at the top: causes are ordered from most to least likely.

CauseSolution
Column is too narrow for the font sizeIncrease column width, decrease font size, or switch to a single-column layout.
Long words with few hyphenation pointsVerify hyphenation is enabled and the correct locale is set. Some technical terms or proper nouns have no valid break points.
Hyphenation is disabledEnable bodyText.hyphenation.enabled. Justification without hyphenation is almost always worse.
Word spacing bounds are too tightIncrease maxWordSpacing slightly (e.g., from 2 to 2.5). This gives the algorithm more room.
Language has long compound words (e.g., German)Ensure the correct locale is set. German hyphenation patterns handle compound words well, but only if the engine knows it is German.

#Complete Example

A full configuration showcasing all hyphenation and justification settings:

import { buildDocument } from 'postext';
 
const vdt = buildDocument(content, {
  bodyText: {
    fontFamily: 'EB Garamond',
    fontSize: { value: 9, unit: 'pt' },
    textAlign: 'justify',
 
    // Knuth-Plass optimal line breaking (default: true)
    optimalLineBreaking: true,
 
    // Hyphenation
    hyphenation: {
      enabled: true,
      locale: 'es',
    },
 
    // Word spacing bounds (multipliers of normal space width)
    maxWordSpacing: 2,     // spaces stretch up to 200%
    minWordSpacing: 0.6,   // spaces shrink down to 60%
  },
 
  // Debug: highlight lines with excessive spacing
  debug: {
    looseLineHighlight: {
      enabled: true,
      threshold: 2.5,
      color: { hex: '#ff000040', model: 'hex' },
    },
  },
});

For the complete list of body text configuration options, see the Configuration page. For how the layout pipeline uses these settings during text measurement, see the Architecture page.