Skip to content

Latency Percentiles and Targets

First PublishedByAtif Alam

Latency is often expressed as percentiles so we can target “most users” and “tail” separately.

This page explains what those numbers mean and how to set and read latency targets in requirements and SLOs.

  • p50 (median) — Half of requests complete within this time.
  • p95 — 95% of requests complete within this time; 5% are slower.
  • p99 — 99% complete within this time; 1% are slower.

These come from your metrics—for example, histograms in Prometheus or APM—over a time window.

A few very slow requests can skew the average. Percentiles separate “typical” from “tail”:

p50 is the median (half of requests are faster), so it reflects typical experience;

p95 and p99 are tail metrics—they bound how slow the slowest 5% or 1% of requests can be.

For user-facing latency targets, p95 is a common choice because it constrains the tail (you care that most requests are under a limit) without focusing on the extreme 1% that p99 targets.

It’s a latency target (SLO): “95% of requests should complete in under X ms,” where X is the number your team picks (e.g. 300, 400, or 500). Example: “Feed load p95 < 500 ms” means 95% of feed loads finish in under 500 ms.

The 300–500 ms range is often used as a “feels fast” target for interactive UIs (e.g. feed load, search). Teams should set targets from product needs and their SLO process.

This is a latency SLI (the metric you measure) with an SLO (the target). For the full framework—SLI, SLO, error budget—see SLOs, SLIs & SLAs.

  • Pick one or two latency SLIs per critical path (e.g. “feed load,” “search”).
  • Set p95 (and optionally p99) targets; put them on dashboards and alerts.
  • Revisit targets when product or SLOs change.