Chapter 11: Stream Processing
Summary: This chapter explores techniques for processing unbounded data that arrives continuously over time, contrasting log-based message brokers with traditional messaging systems and discussing how to maintain state and handle time in a streaming context.
Sources: chapter11
Last updated: 2026-04-18
Key Concepts
Transmitting Event Streams
In stream processing, data is represented as an event-streams—a small, immutable object containing details of something that happened. Events are generated by producers and processed by consumers.
- Messaging Systems: Used to notify consumers about new events.
- Direct Messaging: Producers and consumers communicate directly (e.g., UDP multicast, ZeroMQ, Webhooks).
- Message Brokers: A centralized store where producers send messages and consumers receive them.
- message-brokers (AMQP/JMS style): Focused on individual message delivery and acknowledgment; messages are deleted once processed.
- log-based-message-brokers: Uses an append-only append-only-log on disk (e.g., Apache Kafka); allows replaying messages and provides higher throughput.
Databases and Streams
The chapter explores the relationship between databases and event streams, treating state as a derivative of an immutable event log.
- change-data-capture: The process of observing all data changes written to a database and extracting them in a form that can be replicated to other systems.
- event-sourcing: An application architecture where all changes to application state are stored as a sequence of events.
- State and Immutability: State is an integration of an event stream over time, while a stream is the derivative of state.
Processing Streams
Stream processors (e.g., Apache Flink, Kafka Streams) perform continuous computation on streams.
- Uses: Fraud detection, trading systems, manufacturing monitoring, and complex-event-processing.
- windowing: Dividing an unbounded stream into finite buckets of time (Tumbling, Hopping, Sliding, and Session windows).
- stream-joins: Joining streams with other streams or with database tables.
- Stream-stream joins (windowed).
- Stream-table joins (enrichment).
- Table-table joins (materialized view maintenance).
Fault Tolerance
Ensuring correctness in the face of failures is harder in streaming than in batch processing.
- exactly-once-semantics: The goal of ensuring that each event is processed as if it happened exactly once, even if failures occur.
- Techniques: Microbatching, checkpointing, and idempotence.