Workload and Modeling
Before you set thresholds and headroom policies, you need to understand your workload and how capacity behaves under load. This page covers workload characterization, simple capacity modeling ideas, and the different kinds of headroom you need.
Workload Characterization
Section titled “Workload Characterization”Not all traffic is the same. How you plan capacity depends on the shape and type of workload.
- Peak-to-average ratio — If peak traffic is 3x your average, planning for average will fail at peak. Know your ratio; use it in forecasting and headroom. See Planning and Operations for how this feeds into policies.
- Workload type — Online (request/response, latency-sensitive) needs headroom for tail latency and spikes. Batch (scheduled jobs, ETL) is often throughput-bound and can be sized for a time window. Streaming (events, pipelines) is sustained throughput with burst buffers; size for steady rate plus burst absorption.
- Latency vs throughput — User-facing APIs care about latency (p95, p99); back-office or batch cares about throughput (items/sec). Your capacity model and scaling metrics should match which one matters.
- Critical journeys — Identify the paths that drive revenue or reliability (e.g. checkout, login, core API). Size and headroom for these first; other paths can follow.
Capacity Modeling Intuition
Section titled “Capacity Modeling Intuition”- Queueing — When arrival rate exceeds service rate, work queues up. Utilization close to 100% means long queues and high latency. Keeping utilization below a threshold (e.g. 70%) leaves room for bursts without blowing out latency. See Latency Percentiles for why tail latency matters.
- Tail latency — p99 and p999 often spike when the system is near capacity (e.g. GC, contention, cold caches). Headroom isn’t just for “not falling over”—it keeps tail latency acceptable.
- Load testing — Models are only as good as your data. Use Load and Stress Testing to validate utilization vs latency and find where the system bends.
Headroom Taxonomy
Section titled “Headroom Taxonomy”Headroom is the buffer between current utilization and the point where the system degrades or fails. Different risks need different headroom:
- N+1 / deployment headroom — Rolling deploys run extra instances; canary or blue/green may double traffic to a subset. Ensure you have headroom for the extra load during releases.
- Failure-domain headroom — If one AZ or region fails, the rest must absorb the traffic. Capacity is often sized so that N-1 (or N-1 region) can carry full load. See Planning and Operations and Failover and Failback.
- Surge headroom — Unexpected spikes (viral events, marketing, DDoS). Policies often specify a percentage (e.g. 30% headroom) or “X months of growth” so you don’t have to add capacity the day traffic jumps.
A headroom policy formalizes these (e.g. “30% CPU headroom, 3 months growth headroom, N+1 for failover”). Details and examples are in Planning and Operations.
See Also
Section titled “See Also”- Planning and Operations — Scaling thresholds, policies, autoscaling, forecasting, and operational processes.
- Load and Stress Testing — How to get the data that feeds workload and capacity models.