Percentiles

Summary: Statistics used to understand the distribution of response times.

Sources: chapter1

Last updated: 2026-04-15


Using an average (arithmetic mean) to measure response time is common but often misleading, as it doesn’t represent the typical user experience. Percentiles are better for this purpose (source: chapter1).

Key Percentiles

  • p50 (Median): Half of the requests return in less than this time, and half take longer. It represents the typical response time (source: chapter1).
  • p95, p99, p99.9: These are tail latencies. They represent the experience of users who encounter the slowest responses (e.g., p99 = 1% of requests are slower than this) (source: chapter1).

Why Tail Latencies Matter

High percentiles are important because they directly affect the experience of users who have made many requests or who are your most valuable customers (source: chapter1).

  • Service Level Objectives (SLOs): Contracts that define the expected performance and availability of a service (source: chapter1).
  • Service Level Agreements (SLAs): SLOs that are legally binding and may include financial penalties for failure to meet targets (source: chapter1).
  • Tail Latency Amplification: When multiple backend calls are required to serve a single end-user request, just one slow backend request can make the entire end-user request slow (source: chapter1).