Column-Oriented Storage

Summary: A storage architecture that stores all values from each column together on disk, enabling efficient analytical queries that only need to read a few columns from a wide table.

Sources: chapter3

Last updated: 2026-04-15


Mechanism

In contrast to row-oriented storage where all values from a row are stored together, column-oriented storage stores each column in a separate file (or sequence of blocks). This allows a query to only read and parse the specific columns it needs (source: chapter3).

Advantages

  • I/O Efficiency: Dramatically reduces the amount of data that must be loaded from disk for analytical queries (source: chapter3).
  • Compression: Values in a column are often repetitive, lending themselves well to compression techniques like bitmap encoding and run-length encoding (source: chapter3).
  • Vectorized Processing: Query engines can process chunks of compressed column data in tight loops, making efficient use of CPU caches and SIMD instructions (source: chapter3).

Writing to Column Stores

Updating compressed columns in-place is difficult. Many column stores use an lsm-trees approach: writes go to an in-memory store (row or column-oriented) and are periodically merged with the column files on disk (source: chapter3).