Garsia–Wachs algorithm
The Garsia–Wachs algorithm is a fast method for building optimal alphabetic binary trees, which can be used for both efficient binary search trees and alphabetical Huffman codes. It runs in about n log n time.
What goes in and what comes out
- Input: A sequence of n + 1 non-negative weights w0, w1, ..., wn.
- Output: A full binary tree with n internal nodes and n + 1 leaves, where leaves are in the given order. The path length from the root to a leaf, weighted by that leaf’s weight, is minimized.
Why this matters
- When the leaves are in a fixed order, the resulting tree gives the smallest possible weighted external path length. This makes searches faster on average and yields alphabetical Huffman codes: the code for each value is the sequence of left (0) and right (1) steps from the root to its leaf, with the codes preserving the original order of the values.
How the algorithm works (high level)
- Phase 1: Add two sentinel values (∞) at the start and end of the sequence. Think of starting with a forest where each weight is its own one-leaf tree.
- Phase 2: Repeatedly merge adjacent trees to form larger trees, keeping a running sequence of their weights. The goal is to arrange merges so that the final single tree minimizes the weighted path lengths. The algorithm uses a data structure (often a self-balancing binary search tree) to manage the sequence and to perform merges efficiently in logarithmic time.
- A key detail is that, at each step, the even-positioned weights form one decreasing sequence and the odd-positioned weights form another decreasing sequence. This structure lets the algorithm locate where to insert the new merged weight (the sum x + y) using two binary searches in logarithmic time.
- Phase 3: After enough merges, you obtain a single tree that is optimal for the given order of leaves.
Efficiency and complexity
- The overall running time is O(n log n), thanks to the use of the balanced tree and the logarithmic searches for each merge.
Historical notes
- The algorithm was published in 1977 by Adriano Garsia and Michelle L. Wachs. It provides a streamlined approach to the problem that is related to, but distinct from, the Hu–Tucker algorithm. The original correctness proof was later simplified by Kingston (1988) and by Karpinski, Larmore, and Rytter (1997).
This page was last edited on 2 February 2026, at 20:24 (CET).