Chapter 12: The Future of Data Systems
Summary: This chapter synthesizes the book’s themes, proposing a future where data systems are “unbundled” and recomposed via dataflow, while emphasizing end-to-end correctness and the ethical responsibilities of engineers.
Sources: chapter12
Last updated: 2026-04-18
Data Integration and Unbundling
A central theme is that no single tool can satisfy all requirements for a complex application. Instead, we must combine specialized tools (OLTP, search, analytics) by deriving-data from a system of record.
- unbundling-databases: Traditional database features like indexes and materialized views can be viewed as implementations of dataflow. Unbundling means taking these components and composing them across different machines and technologies.
- Reasoning about Dataflows: Use change-data-capture or event-sourcing to maintain consistency across derived systems rather than distributed transactions.
Designing Applications Around Dataflow
The “database-inside-out” approach treats application code as a derivation function.
- Application code as derivation: When one dataset is derived from another, the application logic acts as the transformation function (e.g., updating a cache or training a ML model).
- Separation of state and code: Application servers can be stateless, with state maintained in specialized durable systems and updated via event streams.
- observing-derived-state: The write path (precomputing data) and read path (querying) meet at the derived dataset.
Aiming for Correctness
Strong database guarantees (like ACID) are often insufficient for application-level correctness.
- end-to-end-argument: Functions like duplicate suppression (idempotence) must be implemented at the application level to handle failures between the client and the database.
- timeliness-and-integrity: Timeliness (eventual consistency) is often acceptable, but integrity (no data loss or corruption) is essential.
- Auditing: Rather than blindly trusting database transactions, systems should be designed for auditability and verification.
Doing the Right Thing
The final section addresses the ethical impact of data systems on society.
- predictive-analytics: Algorithms can reinforce bias and discrimination if based on historical data.
- Privacy and Surveillance: The shift toward “data-driven” organizations can lead to pervasive surveillance if not balanced with user agency.