Column-Oriented Storage
Summary: A storage architecture that stores all values from each column together on disk, enabling efficient analytical queries that only need to read a few columns from a wide table.
Sources: chapter3
Last updated: 2026-04-15
Mechanism
In contrast to row-oriented storage where all values from a row are stored together, column-oriented storage stores each column in a separate file (or sequence of blocks). This allows a query to only read and parse the specific columns it needs (source: chapter3).
Advantages
- I/O Efficiency: Dramatically reduces the amount of data that must be loaded from disk for analytical queries (source: chapter3).
- Compression: Values in a column are often repetitive, lending themselves well to compression techniques like bitmap encoding and run-length encoding (source: chapter3).
- Vectorized Processing: Query engines can process chunks of compressed column data in tight loops, making efficient use of CPU caches and SIMD instructions (source: chapter3).
Writing to Column Stores
Updating compressed columns in-place is difficult. Many column stores use an lsm-trees approach: writes go to an in-memory store (row or column-oriented) and are periodically merged with the column files on disk (source: chapter3).