Stream Processing

Summary: The continuous processing of event-streams as they happen, contrasting with batch processing which operates on a finite set of historical data.

Sources: chapter11

Last updated: 2026-04-18


Comparison with Batch Processing

  • Latency: Stream processing aims for low latency (seconds or milliseconds), whereas batch processing often has latencies of hours or days.
  • Boundedness: Batch processing works on bounded datasets; stream processing works on unbounded data.
  • Sorting: Sorting is common in batch processing but impossible on an infinite stream.

Applications

  • Fraud Detection: Identifying unusual patterns in credit card transactions in real-time.
  • Monitoring: Tracking system metrics or manufacturing sensor data for anomalies.
  • Real-time Analytics: Computing rolling averages or 99th percentile response times.
  • Materialized View Maintenance: Keeping caches and search indexes in sync with a primary database.

Key Challenges

  • Time Handling: Dealing with the difference between event time (when the event occurred) and processing time (when the event was processed by the system).
  • Fault Tolerance: Achieving exactly-once-semantics despite node failures or network issues.
  • State Management: Maintaining counts, windows, or join states across multiple events.