Encoding

Summary: The process of translating in-memory data structures into a byte sequence for storage or network transmission.

Sources: chapter4

Last updated: 2026-04-15

Programs work with data in two representations:

In-Memory: Objects, structs, hash tables, and trees optimized for the CPU (often using pointers).
On-Disk/On-Wire: Self-contained byte sequences (e.g., JSON documents) for transmission or storage.

The process of translating from in-memory to byte sequences is called encoding (also known as serialization or marshalling). The reverse is decoding (parsing, deserialization, or unmarshalling).

Types of Encoding

Language-Specific Formats

Many languages (e.g., Java’s java.io.Serializable, Python’s pickle) have built-in encoding. These are often tied to the language, have poor performance, and lack compatibility guarantees.

Textual Formats (JSON, XML, CSV)

Widespread but ambiguous with data types (e.g., numbers vs. strings) and verbose. They are human-readable but less efficient for large datasets.

Binary Formats (Thrift, Protocol Buffers, Avro)

More compact and efficient than textual formats. They use schemas to define data and support schema-evolution.

Quartz 4

Explorer

encoding

Encoding

Types of Encoding

Language-Specific Formats

Textual Formats (JSON, XML, CSV)

Binary Formats (Thrift, Protocol Buffers, Avro)

Graph View

Table of Contents

Backlinks

Quartz 4

Explorer

encoding

Encoding

Types of Encoding

Language-Specific Formats

Textual Formats (JSON, XML, CSV)

Binary Formats (Thrift, Protocol Buffers, Avro)

Related pages

Graph View

Table of Contents

Backlinks