Replication

Summary: Keeping a copy of the same data on several different nodes, potentially in different locations, to provide redundancy, reduce latency, and improve performance.

Sources: chapter5

Last updated: 2026-04-15


Replication is a fundamental technique for distributing data across multiple machines (source: chapter5, p. 151). It serves three primary purposes:

  1. High Availability: Keeping the system running even if one machine (or several) fails (source: chapter5, p. 151).
  2. Latency Reduction: Placing data geographically closer to users to reduce travel time for network packets (source: chapter5, p. 151).
  3. Read Scalability: Increasing the volume of read queries the system can handle by serving them from multiple replicas (source: chapter5, p. 151).

Replication Architectures

There are three main algorithms for managing changes to replicated data:

Key Challenges

The main difficulty in replication lies in handling changes to the data. If the data is immutable, replication is straightforward: just copy it to every node once. However, for data that changes over time, all replicas must eventually reflect the same updates.

In systems with asynchronous replication, the followers may lag behind the leader, leading to replication lag. This can cause various anomalies that require specific consistency guarantees to resolve: