Chapter 8: The Trouble with Distributed Systems

Summary: This chapter explores the fundamental challenges of building reliable systems on top of unreliable components, focusing on partial failures, network delays, clock drift, and process pauses.

Sources: chapter8

Last updated: 2026-04-17


Key Themes

Faults and Partial Failures

In a single-node system, software usually either works or crashes. In a distributed system, partial-failures are common: some parts of the system are broken while others work fine. These failures are non-deterministic and can be difficult to detect (source: chapter8, p. 275).

Unreliable Networks

Distributed systems communicate via asynchronous networks, which are unreliable-networks. Packets can be lost, delayed, reordered, or duplicated. The only way to detect a failure is through a timeout, which cannot distinguish between a crashed node, a network fault, or a slow response (source: chapter8, p. 278).

Unreliable Clocks

Nodes in a distributed system have their own local unreliable-clocks (quartz oscillators) which drift at different rates. Time-of-day clocks can jump backward (e.g., due to NTP synchronization), making them dangerous for ordering events across nodes. logical-clocks are often a safer alternative for ordering (source: chapter8, p. 291).

Knowledge, Truth, and Lies

In a distributed system, a node cannot know anything for sure; it can only make inferences based on the messages it receives. Truth is often defined by a quorum—a majority of nodes must agree on a fact (source: chapter8, p. 300).

  • fencing-tokens: Used to ensure that a node whose lease has expired cannot perform actions that interfere with its successor (source: chapter8, p. 303).
  • byzantine-faults: Nodes that may lie or act maliciously, as opposed to simply crashing (source: chapter8, p. 304).

System Models

To reason about distributed algorithms, we use system-models that formalize assumptions about timing (synchronous, partially synchronous, asynchronous) and failures (crash-stop, crash-recovery, Byzantine) (source: chapter8, p. 306).