Partitioning

Summary: The process of breaking a large dataset into smaller subsets, also known as sharding, to distribute data and query load across multiple nodes.

Sources: chapter6

Last updated: 2026-04-15

Partitioning is a fundamental technique for achieving scalability in data-intensive applications. It is often used in conjunction with replication for fault tolerance. (source: chapter6)

Key Concepts

Sharding: Another term for partitioning, commonly used in MongoDB, Elasticsearch, and SolrCloud. (source: chapter6)
hot-spots: Partitions with disproportionately high load due to skewed data or access patterns. (source: chapter6)
rebalancing: The process of moving load from one node to another in the cluster. (source: chapter6)

Strategies

key-range-partitioning

Assigns a continuous range of keys to each partition.

Pros: Efficient range queries.
Cons: Risk of skew and hot spots (e.g., if keys are timestamps). (source: chapter6)

hash-partitioning

Uses a hash function on the key to determine the partition.

Pros: Good at distributing load evenly and avoiding hot spots.
Cons: Range queries require searching all partitions (scatter/gather). (source: chapter6)

Secondary Indexes in Partitioned Databases

local-index: Each partition is independent; indexes only the documents in that partition.
global-index: A global index that is itself partitioned (term-partitioned). (source: chapter6)

Quartz 4

Explorer

partitioning

Partitioning

Key Concepts

Strategies

key-range-partitioning

hash-partitioning

Secondary Indexes in Partitioned Databases

Graph View

Table of Contents

Backlinks

Quartz 4

Explorer

partitioning

Partitioning

Key Concepts

Strategies

key-range-partitioning

hash-partitioning

Secondary Indexes in Partitioned Databases

Related pages

Graph View

Table of Contents

Backlinks