Stream Joins
Summary: The process of combining data from multiple event-streams or from a stream and a database.
Sources: chapter11
Last updated: 2026-04-18
Types of Joins
- Stream-Stream Join: Joining two unbounded streams. Since both sides are infinite, a window must be defined for the join (e.g., joining a search click with the original search event if it occurs within 1 hour). The processor must maintain state for all events in the current window on both sides.
- Stream-Table Join (Enrichment): Joining a stream with a database table. For every event in the stream, the processor looks up information in the table (e.g., user profile info for a user ID). This can be done via remote database queries or by keeping a local copy of the table in the stream processor.
- Table-Table Join (Materialized View Maintenance): Joining two database changelogs to maintain a materialized view of the join results. For example, maintaining a Twitter timeline cache by joining a
tweetsstream with afollowsstream.
Implementation Challenges
- State Management: Joins require the processor to maintain state, which must be partitioned and replicated for fault tolerance.
- Time-dependence: The order of events across different streams matters. If a user updates their profile right before a stream event arrives, the join result depends on which event is processed first.