Latency Percentiles and Targets

First PublishedFeb 10, 2026ByAtif Alam

Latency is often expressed as percentiles so we can target “most users” and “tail” separately.

This page explains what those numbers mean and how to set and read latency targets in requirements and SLOs.

What percentiles mean

p50 (median) — Half of requests complete within this time.
p95 — 95% of requests complete within this time; 5% are slower.
p99 — 99% complete within this time; 1% are slower.

These come from your metrics—for example, histograms in Prometheus or APM—over a time window.

Why percentiles instead of average

A few very slow requests can skew the average. Percentiles separate “typical” from “tail”:

p50 is the median (half of requests are faster), so it reflects typical experience;

p95 and p99 are tail metrics—they bound how slow the slowest 5% or 1% of requests can be.

For user-facing latency targets, p95 is a common choice because it constrains the tail (you care that most requests are under a limit) without focusing on the extreme 1% that p99 targets.

How to read “p95 < 300–500 ms”

It’s a latency target (SLO): “95% of requests should complete in under X ms,” where X is the number your team picks (e.g. 300, 400, or 500). Example: “Feed load p95 < 500 ms” means 95% of feed loads finish in under 500 ms.

The 300–500 ms range is often used as a “feels fast” target for interactive UIs (e.g. feed load, search). Teams should set targets from product needs and their SLO process.

Connection to SLOs

This is a latency SLI (the metric you measure) with an SLO (the target). For the full framework—SLI, SLO, error budget—see SLOs, SLIs & SLAs.

In practice

Pick one or two latency SLIs per critical path (e.g. “feed load,” “search”).
Set p95 (and optionally p99) targets; put them on dashboards and alerts.
Revisit targets when product or SLOs change.