Rebalancing

Summary: The process of moving data and requests from one node in a cluster to another to accommodate changes in load, dataset size, or cluster membership.

Sources: chapter6

Last updated: 2026-04-15


As time passes, things change in a database: query throughput increases, dataset size grows, or machines fail and are replaced. Rebalancing is necessary to ensure the load is shared fairly between nodes. (source: chapter6)

Rebalancing Strategies

Fixed Number of Partitions

Create many more partitions than there are nodes and assign several partitions to each node. When a node is added, it “steals” a few partitions from every existing node until the load is balanced again. Only entire partitions are moved. (source: chapter6)

Dynamic Partitioning

Used by databases that use key-range-partitioning (e.g., HBase, RethinkDB). When a partition grows to exceed a configured size, it is split into two. Conversely, if a lot of data is deleted, a partition can be merged with an adjacent one. (source: chapter6)

Partitioning Proportionally to Nodes

The number of partitions is proportional to the number of nodes (e.g., Cassandra). Each node has a fixed number of partitions. When a new node joins, it randomly chooses a fixed number of existing partitions to split, taking ownership of one half of each split partition. (source: chapter6)

Automatic vs. Manual Rebalancing

Fully automated rebalancing can be convenient but unpredictable. It can overload the network or nodes and harm the performance of other requests. Many systems (e.g., Couchbase, Riak) generate a suggested partition assignment but require a human to commit it. (source: chapter6)