Skip to content

Measuring QA and Testing Success

First PublishedLast UpdatedByAtif Alam

Measuring QA and testing success means combining what happened in production (outcomes), how healthy your test system is (process), and how delivery behaves (org-level signals)—without pretending one number tells the whole story.

Remember: favor trends over one-off snapshots, mix lagging and leading signals, revisit definitions when the product or risk profile changes.

Map of the sections below:

  1. Outcomes — Escaped defects and other “we missed it” signals (mostly lagging).
  2. Production gates — SLOs and error budgets (mostly lagging; user-visible after release).
  3. Process health — CI, flakes, coverage discipline (mostly leading).
  4. Delivery context — DORA-style change and deploy signals (often lagging for failure; context for QA).

For a copy-paste starter dashboard (five panels), jump to Minimal dashboard template (v1).

Read the rest of this page through this lens.

Lagging indicators describe what already happened to users or the business: escaped defects, customer-visible quality issues, SLO misses or exhausted error budgets, and—in the sense of “this change caused pain”—change failure rate (rollback, hotfix, or incident tied to a deploy). They answer: did we hurt quality?

Leading indicators describe whether your detection machinery is healthy before the next escape: flaky test rate, default-branch green rate, time-to-green in CI, and explicit risk review before high-risk changes. They answer: are we set up to catch problems early?

No single KPI proves QA is successful. Use a small, complementary set; argue from trends and context; avoid blaming individuals when lagging numbers move.

How the next sections are groupedOutcome and production gates are both production-visible quality; process is your test-and-CI system; delivery adds org-level context (and often correlates with escapes, but is not the same thing).

Mostly lagging: misses that already reached users or production.

Escaped defects — bugs found in production or by customers that you believe should have been caught earlier—are the most direct lagging signal of gaps in testing or process.

  • Rate and severity — Counts per release or per time window; weight by severity (for example customer impact, data loss).
  • Source — Customer-reported vs internal discovery. Customer-only counts understate total escapes; internal-only counts may miss perception of quality.
  • Caveats — Attribution is messy: a “missed” bug might have been an accepted risk, unclear requirements, or a rare environment. Triangulate with post-incident reviews and incident management when escapes drive incidents.

Use escapes to steer strategy—which layers or domains need more investment—not to blame individuals.

Production Quality Gates: Error Budgets and SLOs

Section titled “Production Quality Gates: Error Budgets and SLOs”

Mostly lagging: user-visible reliability after release.

Pre-ship testing is not the end of quality. Error budgets and SLOs encode how much user-visible unreliability you can tolerate in production. When budgets burn too fast, you may freeze changes or refocus on reliability—including improving tests and release discipline.

Treat SLOs and error budgets as runtime complements to QA metrics: same product, different timeframe (after release vs before). Together with escaped defects, they form the “what customers experience” side of your dashboard; process metrics form the “are we equipped to prevent the next one” side.

Mostly leading: trust and speed of your test and CI system.

These describe whether your testing machinery is trustworthy:

  • Flaky test rate — Tests that fail non-deterministically erode trust; teams stop believing red builds. Track and fix or quarantine flakes aggressively.
  • CI duration and signal-to-noise — Long pipelines delay feedback; noisy failures hide real regressions. Optimize for fast, reliable signal on the default path.
  • Coverage trends — Line or branch coverage can show untouched code, but coverage is not quality: you can cover code with weak assertions. Use coverage as one input, not a target to game.

Context: system and release behavior; often lagging when something goes wrong.

Broader delivery metrics appear in Reliability metrics—for example change failure rate and deployment frequency in the DORA sense. They reflect system and process health, not only testing:

  • A rising change failure rate may point to test gaps, but also to operational or architecture issues.
  • Lead time and restore time interact with how fast you can fix after a bad escape.

Change failure rate and escaped defects are often correlated (bad deploys and missed bugs show up together) but not the same metric: one is deploy-centric, the other is defect-centric. Use both for triangulation in postmortems and planning.

Cross-link to reliability metrics instead of duplicating definitions; align narratives in postmortems and planning.

Metrics distort behavior. Examples: chasing coverage with empty tests, disabling flaky tests instead of fixing them, or shipping to meet a deployment frequency target while skipping meaningful gates. Name these failure modes explicitly in reviews and tie incentives to sustainable quality, not local optima.

The five panels mirror the story above: (1) outcomes (2–3) process/CI health (4) delivery signal (5) one production gate—enough to tell a coherent story without drowning in charts.

Use this as a starting layout for a Google Doc, spreadsheet, or BI dashboard. Copy the table into your tool of choice and replace placeholders with your data sources and definitions.

PanelWhat to showNotes
1. Escaped defects (severity-weighted)Trend over time; total or rate per release / month; weight by severity (e.g. P0–P3).Define “escape” and severity once; split customer vs internal if useful.
2. Flake rate% of CI runs with flaky failures, or count of tests under quarantine; trend.Goal is down; pair with action (fix vs quarantine with owner).
3. Main green / time-to-greenGreen rate for default branch (or trunk), and median minutes from commit/merge to green check on main.Captures reliability and speed of feedback on the path everyone trusts.
4. Change failure rate% of deployments (or changes) that caused incident, rollback, or hotfix—per your org’s definition.Context for QA: not testing alone; see reliability metrics.
5. One SLO or error-budget panelRemaining error budget for a critical SLI, or SLO attainment % for the period.Pick one user-visible slice (e.g. availability or error rate); aligns pre-ship work with error budgets.

Review trends weekly or monthly, annotate releases and incidents, and adjust definitions when the product or risk profile changes—this template is intentionally minimal, not exhaustive.