Pregel
Summary: A specialized distributed batch processing framework for iterative graph algorithms, using a “think like a vertex” model.
Sources: raw/chapter10
Last updated: 2026-04-18
Many graph algorithms (e.g., PageRank, shortest paths) are iterative: they perform a computation on each vertex and its neighbors, then repeat until a condition is met. While this can be done in MapReduce, it is highly inefficient because it requires a full pass over the dataset for each iteration (source: chapter10, p. 424).
The Bulk Synchronous Parallel (BSP) Model
Pregel implements the BSP model. In each iteration (superstep), a function is called for every vertex. This function can:
- Receive messages sent to the vertex in the previous superstep.
- Update the vertex’s state.
- Send messages to other vertices for the next superstep (source: chapter10, p. 425).
Fault Tolerance
Pregel achieves fault tolerance by periodically checkpointing the state of all vertices at the end of a superstep. If a node fails, the entire graph computation can be rolled back to the last successful checkpoint (source: chapter10, p. 426).