Hash Partitioning
Summary: A partitioning strategy that uses a hash function to determine the partition for a given key, aiming to distribute data and load evenly across nodes.
Sources: chapter6
Last updated: 2026-04-15
To avoid the risk of skew and hot-spots, many distributed datastores use a hash function to determine the partition for a given key. (source: chapter6)
Characteristics
- Load Balancing: Distributes keys fairly among partitions even if the input data is skewed. (source: chapter6)
- Range Queries: Inefficient, as keys that were once adjacent are now scattered across all partitions. Any range query must be sent to all partitions (scatter/gather). (source: chapter6)
- Consistent Hashing: A specific technique often associated with hash partitioning for minimizing data movement during rebalancing. (source: chapter6)
Systems Using This
- MongoDB, Cassandra, Riak, Voldemort. (source: chapter6)