Skip to content

Availability and The Nines

First PublishedByAtif Alam

Availability is how much of the time the system is working for users. It’s often expressed as “nines” (e.g. 99.9%).

This page explains what that means and how it’s measured.

  • 99% — Two nines; up to ~3.65 days of downtime per year.
  • 99.9% — Three nines; up to ~8.76 hours per year, ~43.2 minutes per month.
  • 99.99% — Four nines; up to ~52.6 minutes per year, ~4.32 minutes per month.

When you set an availability SLO, the “nines” tell you how much downtime you can afford in a window.

Two common approaches:

  1. Success rate — Successful requests / total requests over a window. This is request-based: each request is either a success or a failure (e.g. 5xx, timeout).
  2. Probe-based uptime — Percentage of time health checks pass. A probe hits an endpoint (or set of endpoints) on a schedule; availability = % of probes that succeed.

Clarify which you’re using when you set an SLO. Success rate reflects real traffic; probe-based uptime can miss issues that only appear under load.

When availability is defined as success rate, failed requests reduce availability. Error rate (failed / total) and availability (successful / total) are two sides of the same coin: error rate = 1 − availability (when both use the same window and definition of “success”).

Availability is a common SLI with an SLO target (e.g. “99.9% availability”). For the full framework, see SLOs, SLIs & SLAs.