LSM-Trees

Summary: Log-Structured Merge-Trees (LSM-Trees) are storage engines that build upon the principle of merging and compacting sorted files (SSTables).

Sources: chapter3

Last updated: 2026-04-15


Algorithm

LSM-trees operate by:

  1. Writing incoming data to an in-memory memtable (source: chapter3).
  2. Periodically flushing the memtable to disk as an sstables segment (source: chapter3).
  3. Serving read requests by checking the memtable first, then the most recent on-disk segments (source: chapter3).
  4. Merging and compacting segments in the background to reclaim space and maintain read performance (source: chapter3).

To prevent data loss during a crash, writes are also appended to a write-ahead-log on disk before being added to the memtable (source: chapter3).

Performance Characteristics

LSM-trees are typically faster for writes because they turn random-access writes into sequential writes on disk (source: chapter3). However, reads can be slower as they may need to check multiple segments (source: chapter3). Techniques like bloom-filters are used to optimize these reads.

Implementations

Popular implementations include LevelDB, RocksDB, Cassandra, and HBase (source: chapter3).