Chapter 3: Storage and Retrieval

Summary: An exploration of how databases store and retrieve data, focusing on the trade-offs between different storage engine architectures and the divergence between transactional (OLTP) and analytical (OLAP) workloads.

Sources: chapter3

Last updated: 2026-04-15


Overview

At its most fundamental level, a database performs two tasks: storing data and retrieving it when requested. Choosing an appropriate storage engine requires understanding how it handles these tasks under the hood (source: chapter3).

Key Storage Engine Families

The chapter identifies two main families of storage engines:

  1. Log-Structured Engines: These engines only permit appending to files and deleting obsolete files. They never update a file that has been written. Examples include hash-indexes, sstables, and lsm-trees. They rely on compaction to reclaim space (source: chapter3).
  2. Page-Oriented Engines: These engines treat the disk as a set of fixed-size pages that can be overwritten. The most common example is the b-trees architecture (source: chapter3).

OLTP vs. OLAP

There is a significant difference between storage engines optimized for transactional workloads and those optimized for analytics:

  • oltp (Online Transaction Processing): User-facing, low-latency, small number of records per query. Disk seek time is often the bottleneck (source: chapter3).
  • olap (Online Analytic Processing): Used by analysts, high-volume aggregates, demanding many millions of records to be scanned. Disk bandwidth is often the bottleneck, often employing column-oriented-storage (source: chapter3).

Other Concepts

  • bloom-filters: Used to optimize lookups for non-existent keys in log-structured engines (source: chapter3).
  • write-ahead-log: A mechanism for crash recovery in update-in-place engines (source: chapter3).
  • data-warehousing: A separate database for analytics that extracts data from OLTP systems (source: chapter3).
  • materialized-views: Cached aggregates used to speed up common analytical queries (source: chapter3).