Scalability
Summary: A system’s ability to cope with increased load.
Sources: chapter1
Last updated: 2026-04-15
Scalability is not a one-dimensional label like “X is scalable.” Instead, discussing scalability means considering questions like “If the system grows in a particular way, what are our options for coping with the growth?” and “How can we add computing resources to handle the additional load?” (source: chapter1).
Key Concepts
- load-parameters: Succinctly describe the current load on the system (requests per second, ratio of reads to writes, simultaneously active users, etc.).
- latency-and-response-time: Measuring how performance is affected when load increases.
- percentiles: Using p50 (median), p95, p99, and p99.9 to understand the distribution of response times.
Strategies for Coping with Load
- Scaling Up (Vertical Scaling): Moving to a more powerful machine.
- Scaling Out (Horizontal Scaling): Distributing the load across multiple smaller machines (also known as a shared-nothing architecture).
- replication: Increasing read throughput by serving queries from multiple read replicas (source: chapter5, p. 151).
- Elastic Systems: Automatically adding computing resources when a load increase is detected (source: chapter1).