Pregel

Summary: A specialized distributed batch processing framework for iterative graph algorithms, using a “think like a vertex” model.

Sources: raw/chapter10

Last updated: 2026-04-18


Many graph algorithms (e.g., PageRank, shortest paths) are iterative: they perform a computation on each vertex and its neighbors, then repeat until a condition is met. While this can be done in MapReduce, it is highly inefficient because it requires a full pass over the dataset for each iteration (source: chapter10, p. 424).

The Bulk Synchronous Parallel (BSP) Model

Pregel implements the BSP model. In each iteration (superstep), a function is called for every vertex. This function can:

  1. Receive messages sent to the vertex in the previous superstep.
  2. Update the vertex’s state.
  3. Send messages to other vertices for the next superstep (source: chapter10, p. 425).

Fault Tolerance

Pregel achieves fault tolerance by periodically checkpointing the state of all vertices at the end of a superstep. If a node fails, the entire graph computation can be rolled back to the last successful checkpoint (source: chapter10, p. 426).