Chapter 11: Stream Processing

Summary: This chapter explores techniques for processing unbounded data that arrives continuously over time, contrasting log-based message brokers with traditional messaging systems and discussing how to maintain state and handle time in a streaming context.

Sources: chapter11

Last updated: 2026-04-18


Key Concepts

Transmitting Event Streams

In stream processing, data is represented as an event-streams—a small, immutable object containing details of something that happened. Events are generated by producers and processed by consumers.

  • Messaging Systems: Used to notify consumers about new events.
    • Direct Messaging: Producers and consumers communicate directly (e.g., UDP multicast, ZeroMQ, Webhooks).
    • Message Brokers: A centralized store where producers send messages and consumers receive them.
      • message-brokers (AMQP/JMS style): Focused on individual message delivery and acknowledgment; messages are deleted once processed.
      • log-based-message-brokers: Uses an append-only append-only-log on disk (e.g., Apache Kafka); allows replaying messages and provides higher throughput.

Databases and Streams

The chapter explores the relationship between databases and event streams, treating state as a derivative of an immutable event log.

  • change-data-capture: The process of observing all data changes written to a database and extracting them in a form that can be replicated to other systems.
  • event-sourcing: An application architecture where all changes to application state are stored as a sequence of events.
  • State and Immutability: State is an integration of an event stream over time, while a stream is the derivative of state.

Processing Streams

Stream processors (e.g., Apache Flink, Kafka Streams) perform continuous computation on streams.

  • Uses: Fraud detection, trading systems, manufacturing monitoring, and complex-event-processing.
  • windowing: Dividing an unbounded stream into finite buckets of time (Tumbling, Hopping, Sliding, and Session windows).
  • stream-joins: Joining streams with other streams or with database tables.
    • Stream-stream joins (windowed).
    • Stream-table joins (enrichment).
    • Table-table joins (materialized view maintenance).

Fault Tolerance

Ensuring correctness in the face of failures is harder in streaming than in batch processing.

  • exactly-once-semantics: The goal of ensuring that each event is processed as if it happened exactly once, even if failures occur.
  • Techniques: Microbatching, checkpointing, and idempotence.