BK-tree

Content sourced from Wikipedia, licensed under CC BY-SA 3.0.

A BK-tree (Burkhard-Keller tree) is a data structure designed for fast approximate matching in discrete spaces, such as words in a dictionary. It helps you find close matches to a query word using a distance measure like the Levenshtein (edit) distance.

How it works
- Structure: Each node stores a word. Each child is labeled with a distance: the distance between the node’s word and the child’s subtree words. A child’s subtree is itself a BK-tree.
- Inserting a word: Start at the root and compute the distance between the current node’s word and the new word. If there’s already a child at that distance, move to that child and repeat. If not, create a new child at that distance containing the new word.
- Searching (finding close matches): Decide a maximum allowed distance D. Starting at the root, compute the distance between the query word and the node’s word. For every child, let delta be the edge’s distance label. You only explore a child if abs(dist - delta) ≤ D. This pruning uses the triangle inequality to skip parts of the tree that cannot contain a match within D. You continue until you collect all words within distance D.

Key points
- The distance metric matters (common choice is Levenshtein distance).
- BK-trees distribute words so that similar words are near each other, which makes searches fast on average.
- Insertion and lookup are typically O(log n) on average if the tree stays balanced, but can degrade to O(n) in the worst case.

Uses
- BK-trees are widely used for approximate string matching, spell checking, and fuzzy searching in dictionaries, especially when exact matches aren’t required and you want nearby alternatives quickly.

This page was last edited on 2 February 2026, at 13:51 (CET).