B-Trees
Summary: The most widely used indexing structure, B-trees break the database down into fixed-size pages and update them in-place, mimicking the block-based nature of hardware disks.
Sources: chapter3
Last updated: 2026-04-15
Design Philosophy
Unlike log-structured indexes that append to files, B-trees break the database into fixed-size blocks or pages (traditionally 4 KB). They read or write one page at a time (source: chapter3).
Pages are organized into a tree structure. Each page contains keys and references to child pages, each responsible for a continuous range of keys. This ensures the tree remains balanced with a depth of (source: chapter3).
Reliability and Concurrency
To make B-trees resilient to crashes, they often use a write-ahead-log (WAL). Every modification is written to this append-only log before being applied to the tree pages (source: chapter3).
Concurrency control is typically handled using latches (lightweight locks) to protect the tree’s data structures from inconsistent states (source: chapter3).
Comparison with LSM-Trees
- Reads: Generally thought to be faster in B-trees because each key exists in exactly one place (source: chapter3).
- Writes: Typically slower than lsm-trees because they must write data at least twice (to the WAL and the page) and may involve page splits (source: chapter3).