Chapter 12: The Future of Data Systems

Summary: This chapter synthesizes the book’s themes, proposing a future where data systems are “unbundled” and recomposed via dataflow, while emphasizing end-to-end correctness and the ethical responsibilities of engineers.

Sources: chapter12

Last updated: 2026-04-18


Data Integration and Unbundling

A central theme is that no single tool can satisfy all requirements for a complex application. Instead, we must combine specialized tools (OLTP, search, analytics) by deriving-data from a system of record.

  • unbundling-databases: Traditional database features like indexes and materialized views can be viewed as implementations of dataflow. Unbundling means taking these components and composing them across different machines and technologies.
  • Reasoning about Dataflows: Use change-data-capture or event-sourcing to maintain consistency across derived systems rather than distributed transactions.

Designing Applications Around Dataflow

The “database-inside-out” approach treats application code as a derivation function.

  • Application code as derivation: When one dataset is derived from another, the application logic acts as the transformation function (e.g., updating a cache or training a ML model).
  • Separation of state and code: Application servers can be stateless, with state maintained in specialized durable systems and updated via event streams.
  • observing-derived-state: The write path (precomputing data) and read path (querying) meet at the derived dataset.

Aiming for Correctness

Strong database guarantees (like ACID) are often insufficient for application-level correctness.

  • end-to-end-argument: Functions like duplicate suppression (idempotence) must be implemented at the application level to handle failures between the client and the database.
  • timeliness-and-integrity: Timeliness (eventual consistency) is often acceptable, but integrity (no data loss or corruption) is essential.
  • Auditing: Rather than blindly trusting database transactions, systems should be designed for auditability and verification.

Doing the Right Thing

The final section addresses the ethical impact of data systems on society.

  • predictive-analytics: Algorithms can reinforce bias and discrimination if based on historical data.
  • Privacy and Surveillance: The shift toward “data-driven” organizations can lead to pervasive surveillance if not balanced with user agency.