B-Trees

Summary: The most widely used indexing structure, B-trees break the database down into fixed-size pages and update them in-place, mimicking the block-based nature of hardware disks.

Sources: chapter3

Last updated: 2026-04-15


Design Philosophy

Unlike log-structured indexes that append to files, B-trees break the database into fixed-size blocks or pages (traditionally 4 KB). They read or write one page at a time (source: chapter3).

Pages are organized into a tree structure. Each page contains keys and references to child pages, each responsible for a continuous range of keys. This ensures the tree remains balanced with a depth of (source: chapter3).

Reliability and Concurrency

To make B-trees resilient to crashes, they often use a write-ahead-log (WAL). Every modification is written to this append-only log before being applied to the tree pages (source: chapter3).

Concurrency control is typically handled using latches (lightweight locks) to protect the tree’s data structures from inconsistent states (source: chapter3).

Comparison with LSM-Trees

  • Reads: Generally thought to be faster in B-trees because each key exists in exactly one place (source: chapter3).
  • Writes: Typically slower than lsm-trees because they must write data at least twice (to the WAL and the page) and may involve page splits (source: chapter3).