Hash Indexes

Summary: A simple indexing strategy where an in-memory hash map stores the byte offsets of keys in an append-only log file on disk.

Sources: chapter3

Last updated: 2026-04-15


Mechanism

In a key-value store, the simplest indexing strategy is to maintain an in-memory hash map where every key maps to a byte offset in a data file. When a value is updated, the new data is appended to the log, and the hash map is updated with the new offset (source: chapter3).

This approach is used by storage engines like Bitcask. It provides high-performance reads and writes, provided all keys fit in the available RAM (source: chapter3).

Compaction

To prevent the log from growing indefinitely, the log is broken into segments. A background process performs compaction, throwing away duplicate keys and keeping only the most recent update (source: chapter3).

Limitations

  • Memory Constraints: All keys must fit in RAM. Maintaining the hash map on disk is difficult to make performant due to random access I/O and collision handling (source: chapter3).
  • Range Queries: Range queries are inefficient because you must look up each key individually in the hash map (source: chapter3).