Data Warehousing

Summary: A data warehouse is a separate database that contains a read-only copy of data from various transactional systems, optimized for analytical queries without affecting OLTP operations.

Sources: chapter3

Last updated: 2026-04-15


Purpose

Large enterprises often have many different oltp systems. Running expensive analytical queries directly on these databases can harm the performance of concurrent transactions. A data warehouse provides a separate environment for analysts to run queries (source: chapter3).

ETL Process

Data is moved from OLTP systems to the warehouse through a process called Extract-Transform-Load (ETL):

  1. Extract: Periodic data dumps or continuous updates from OLTP databases (source: chapter3).
  2. Transform: Data is cleaned and transformed into an analysis-friendly schema (source: chapter3).
  3. Load: The transformed data is loaded into the warehouse (source: chapter3).

Schemas for Analytics

Analytical databases often use formulaic schemas like the star schema (dimensional modeling), consisting of a large fact table at the center surrounded by smaller dimension tables (source: chapter3).