SSTables

Summary: Sorted String Tables (SSTables) are a storage format where the sequence of key-value pairs is sorted by key, allowing for efficient merging and sparse indexing.

Sources: chapter3

Last updated: 2026-04-15


Advantages over Hash Indexes

SSTables offer several advantages over simple log segments with hash indexes:

  1. Efficient Merging: Merging segments is simple and efficient, even if files are larger than available memory, using a merge-sort-like algorithm (source: chapter3).
  2. Sparse Indexing: You no longer need an index of all keys in memory. A sparse index can point to the offsets of some keys, allowing you to scan a small block to find the exact key (source: chapter3).
  3. Block Compression: Records can be grouped into blocks and compressed before writing to disk, saving space and reducing I/O bandwidth (source: chapter3).

Construction

SSTables are typically maintained by using an in-memory balanced tree (like a red-black tree), often called a memtable. When the memtable exceeds a size threshold, it is written to disk as a new SSTable segment (source: chapter3).