Skip to content

Feature Flags and Rollback

First PublishedByAtif Alam

A deploy puts new code on production servers. A release exposes that code to users. These don’t have to be the same event.

Feature flags let you deploy code without releasing it.

Rollback automation lets you undo a bad release in seconds instead of minutes or hours. Together, they give you a safety net that makes deploying less risky.

A feature flag (or feature toggle) is a conditional check in code that controls whether a feature is active.

The flag value is managed externally—a config service, a database, or a feature flag platform—so you can turn features on or off without redeploying.

Common uses:

  • Gradual rollout — Enable for 1% of users, then 5%, then 50%, then 100%. Similar to canary, but at the feature level rather than the deployment level.
  • Dark launch — Deploy a new code path and execute it in production, but don’t show results to users. Useful for validating performance or correctness under real load.
  • Kill switch — If a feature causes problems, disable it instantly without a rollback deploy.
  • A/B testing — Show different experiences to different user segments and measure outcomes.
  • Operational flags — Control system behavior: enable/disable a cache layer, switch between data sources, toggle a rate limiter.

Flags should not live forever. A long-lived, forgotten flag becomes tech debt and a source of unexpected behavior.

  • Short-lived (release flags) — Used during rollout. Remove once the feature is fully launched and stable (typically days to weeks).
  • Long-lived (operational flags) — Kill switches, A/B tests, or config-driven behavior. Review periodically; document why each exists.
  • Cleanup — Track flag age. Set a policy: e.g. “release flags must be removed within 30 days of full rollout.” Stale flags increase code complexity and risk.

Rollback means reverting to the previous known-good state. The faster you can roll back, the shorter your incidents.

Rollback strategies:

StrategyHow It WorksSpeed
Redeploy previous versionBuild and deploy the last known-good artifactMinutes (depends on pipeline)
Traffic switch (blue/green)Point traffic back to the previous environmentSeconds
Feature flag disableTurn off the flag; code is still deployed but feature is inactiveSeconds
Database rollbackRevert a migration; much harder and riskier than code rollbackMinutes to hours

Best practices:

  • Keep the previous artifact available. Don’t overwrite or garbage-collect the last successful build immediately.
  • Test rollback regularly. A rollback that’s never been tested may not work when you need it. Include rollback in your game-day drills.
  • Automate the decision when possible. If SLIs breach a threshold after deploy, trigger rollback automatically. See Error Budgets for how error budget burn rate can drive this.
  • Backward-compatible changes. Design database and API changes so the previous version can still run alongside or after the new version. This makes rollback safe.

A dark launch deploys new functionality and runs it in production—but hides the results from users.

The new code path executes alongside the old one; you compare outputs, measure performance, and validate correctness before exposing it.

  • When to use — Risky changes (new algorithms, new data pipelines, new integrations) where you want real-traffic validation without user impact.
  • How — Feature flag routes traffic to the new path. Results are logged or compared but not returned to the user. Monitor latency, error rate, and resource consumption of the new path.