IaC Best Practices
Here you’ll find the main ideas and practices that make Infrastructure as Code work well—without getting into code or tool-specific details.
Each section is kept short: we cover the idea, why we care about it, and what to do in practice.
1. Staying in Sync
Section titled “1. Staying in Sync”Declarative tools ask you to describe the desired end state; imperative tools ask for the steps to get there.
State is the record of what currently exists; drift is when reality and that record (or your config) no longer match.
Best practice:
- Treat the desired state (your config) as the single source of truth.
- Run drift checks on a schedule.
- Correct drift through the same IaC process, not ad-hoc fixes.
Why we care about this:
- Desired state as source of truth makes it clear what “correct” looks like.
- Drift leads to surprises—manual changes, forgotten resources, or config that no longer matches production.
- Checking for drift regularly keeps environments predictable.
2. Environments and Promotion
Section titled “2. Environments and Promotion”Dev, staging, and production are separate environments. Promotion is the process of moving the same (or appropriately parameterized) definitions from one environment to the next.
Best practice:
- Promote the same definitions across environments where possible.
- Parameterize only what must differ (size, secrets).
- Roll back by re-applying a previous, known-good version of the config rather than manual fixes.
Why we care about this:
- Parity between environments reduces “works in dev, breaks in prod” problems.
- Controlled promotion ensures that only tested, reviewed changes reach production.
- What typically changes between environments are scale, secrets, and sometimes feature flags—not the structure of the infrastructure itself.
3. Secrets and Security
Section titled “3. Secrets and Security”Secrets (passwords, keys, tokens) must not live in code or in plain text in config.
Use vaults, environment variables, or provider-native secret management. State files (the record of what exists) must also be protected—they can contain sensitive data and need access control.
Best practice:
- Never commit secrets to the repo.
- Use a secrets manager or provider-native mechanism and reference them at apply time.
- Restrict who can read and write state.
- Lock state when applying so only one process changes it at a time.
Why we care about this:
- Secrets in code leak via version history, sharing, or breaches.
- State files can expose resource details and sometimes credentials.
- Least-privilege access to state and to apply pipelines reduces risk.
4. Testing and Validation
Section titled “4. Testing and Validation”Before applying, you can preview the plan, run policy checks, and (where available) estimate cost or impact.
Some teams add automated tests that run against the plan or a sandbox.
Best practice:
- Validate and review every change before applying to production.
- Use plan preview and policy checks in CI.
- Require human approval for production applies.
Why we care about this:
- Catching errors before apply reduces the chance of breaking production.
- Policy checks enforce tags, naming, and compliance.
- Cost visibility avoids surprise bills.
- Validation gives reviewers and approvers confidence.
5. State and Collaboration
Section titled “5. State and Collaboration”State is the stored record of current infrastructure that the tool uses to compute “what to change.”
When multiple people or pipelines touch the same infrastructure, they need shared state (e.g. remote state) and locking so two applies do not run at once.
Best practice:
- Use one source of truth for state (e.g. remote backend).
- Use locking so only one apply runs at a time.
- Define clear ownership—who can run apply for which environments.
- Avoid concurrent apply on the same scope.
Why we care about this:
- Without shared state, each run might assume it owns the world and overwrite or conflict with others.
- Without locking, concurrent applies can corrupt state or leave infrastructure in an inconsistent state.
6. CI/CD for Infrastructure
Section titled “6. CI/CD for Infrastructure”Pipelines automate plan and apply: on code change, run plan (and policy checks); on approval, run apply.
Approval gates determine who can approve production; rollback is typically “revert the config and re-apply.”
Best practice:
- Automate plan and apply through a pipeline.
- Gate production with approvals (e.g. from a lead or a checklist).
- Track which commit or ticket triggered each apply so you can tie changes to incidents and roll back when needed.
Why we care about this:
- Automation reduces human error and makes every change auditable.
- Approval gates prevent unauthorized production changes.
- Rollback by re-applying a previous version is faster and more reliable than manual remediation.
For application deployment pipelines (as opposed to infrastructure), see CI/CD for Applications.