Schema Evolution

Summary: The process of changing a data schema (e.g., adding, removing, or renaming fields) over time as an application evolves.

Sources: chapter4

Last updated: 2026-04-18


Schema evolution is inevitable as application requirements change. To maintain system reliability during changes, the data format must support backward-compatibility and forward-compatibility.

Approaches

  • SQL Schema Migrations: Using ALTER TABLE to modify relational tables, often requiring manual maintenance of migration scripts.
  • NoSQL (Schema-on-read): Implicit schemas that are interpreted at read time, allowing arbitrary fields to be added (source: chapter2, p. 39).
  • Binary Encoding: Formats like avro, thrift, and protocol-buffers provide formal schemas that can be updated according to specific rules (source: chapter4).

Rules for Evolution

To maintain compatibility while evolving a schema:

  • Adding Fields: New fields must be optional or have a default value to maintain backward compatibility.
  • Removing Fields: Fields can only be removed if they are optional and don’t break old readers (maintaining forward compatibility).
  • Renaming Fields: Changing field names often breaks binary formats that use names (like JSON or Avro), but not those that use field tags (like Thrift or Protobuf).