Stream Processing
Summary: The continuous processing of event-streams as they happen, contrasting with batch processing which operates on a finite set of historical data.
Sources: chapter11
Last updated: 2026-04-18
Comparison with Batch Processing
- Latency: Stream processing aims for low latency (seconds or milliseconds), whereas batch processing often has latencies of hours or days.
- Boundedness: Batch processing works on bounded datasets; stream processing works on unbounded data.
- Sorting: Sorting is common in batch processing but impossible on an infinite stream.
Applications
- Fraud Detection: Identifying unusual patterns in credit card transactions in real-time.
- Monitoring: Tracking system metrics or manufacturing sensor data for anomalies.
- Real-time Analytics: Computing rolling averages or 99th percentile response times.
- Materialized View Maintenance: Keeping caches and search indexes in sync with a primary database.
Key Challenges
- Time Handling: Dealing with the difference between event time (when the event occurred) and processing time (when the event was processed by the system).
- Fault Tolerance: Achieving exactly-once-semantics despite node failures or network issues.
- State Management: Maintaining counts, windows, or join states across multiple events.