System Models
Summary: Abstract frameworks that formalize the assumptions we make about the timing and failure characteristics of a distributed system.
Sources: chapter8
Last updated: 2026-04-17
Algorithms need to be written in a way that doesn’t depend too heavily on the details of the hardware and software. We use system models to specify what an algorithm may assume about the environment (source: chapter8, p. 306).
Timing Assumptions
- Synchronous model: Assumes bounded network delay, bounded process pauses, and bounded clock drift. This is not realistic for most practical systems.
- Partially synchronous model: Assumes that the system behaves like a synchronous model most of the time, but sometimes exceeds the bounds. This is the most useful model for reasoning about real-world systems (source: chapter8, p. 307).
- Asynchronous model: An algorithm is not allowed to make any timing assumptions at all (not even a clock). This is very restrictive.
(source: chapter8, p. 307)
Node Failure Assumptions
- Crash-stop faults: A node fails only by crashing, and it never comes back.
- Crash-recovery faults: Nodes may crash at any time and may resume responding after some unknown time. Durable storage is preserved across crashes, but in-memory state is lost. This is the most common model for real systems (source: chapter8, p. 307).
- Byzantine (arbitrary) faults: Nodes may do absolutely anything, including trying to trick other nodes.
(source: chapter8, p. 307)