Protocol Buffers (protobuf)
Summary: A popular binary encoding format developed by Google that uses field tags for compact data representation and schema evolution.
Sources: chapter4
Last updated: 2026-04-15
Protocol Buffers (protobuf) is a binary encoding library originally developed at Google. It is conceptually similar to thrift but with different bit-packing details.
Binary Format
Protobuf encodes data using field tags (numerical IDs) instead of field names. This makes it much more compact than JSON or XML.
Schema Evolution
Protobuf uses the same principles as Thrift for achieving compatibility:
- Forward Compatibility: Older code ignores unknown field tags.
- Backward Compatibility: Newer code can read old records as long as field tags remain the same.
Required and Optional Fields
Protobuf originally had required and optional markers. However, these were found to be problematic for evolvability (e.g., you can’t make a required field optional later without breaking backward compatibility). Protobuf version 3 (proto3) removed these markers, making all fields essentially optional.
Repeated Fields
Protobuf uses a special “repeated” marker for list/array data. This is encoded simply by having the same field tag appear multiple times in the record.