Schema Evolution
Summary: The process of changing a data schema (e.g., adding, removing, or renaming fields) over time as an application evolves.
Sources: chapter4
Last updated: 2026-04-18
Schema evolution is inevitable as application requirements change. To maintain system reliability during changes, the data format must support backward-compatibility and forward-compatibility.
Approaches
- SQL Schema Migrations: Using
ALTER TABLEto modify relational tables, often requiring manual maintenance of migration scripts. - NoSQL (Schema-on-read): Implicit schemas that are interpreted at read time, allowing arbitrary fields to be added (source: chapter2, p. 39).
- Binary Encoding: Formats like avro, thrift, and protocol-buffers provide formal schemas that can be updated according to specific rules (source: chapter4).
Rules for Evolution
To maintain compatibility while evolving a schema:
- Adding Fields: New fields must be optional or have a default value to maintain backward compatibility.
- Removing Fields: Fields can only be removed if they are optional and don’t break old readers (maintaining forward compatibility).
- Renaming Fields: Changing field names often breaks binary formats that use names (like JSON or Avro), but not those that use field tags (like Thrift or Protobuf).