SSTables
Summary: Sorted String Tables (SSTables) are a storage format where the sequence of key-value pairs is sorted by key, allowing for efficient merging and sparse indexing.
Sources: chapter3
Last updated: 2026-04-15
Advantages over Hash Indexes
SSTables offer several advantages over simple log segments with hash indexes:
- Efficient Merging: Merging segments is simple and efficient, even if files are larger than available memory, using a merge-sort-like algorithm (source: chapter3).
- Sparse Indexing: You no longer need an index of all keys in memory. A sparse index can point to the offsets of some keys, allowing you to scan a small block to find the exact key (source: chapter3).
- Block Compression: Records can be grouped into blocks and compressed before writing to disk, saving space and reducing I/O bandwidth (source: chapter3).
Construction
SSTables are typically maintained by using an in-memory balanced tree (like a red-black tree), often called a memtable. When the memtable exceeds a size threshold, it is written to disk as a new SSTable segment (source: chapter3).