Best Practices for IaC Security
Secure Infrastructure as Code from authoring to deployment: scanning, policy as code, state security, and drift detection across Terraform, Bicep, and CloudFormation.
Infrastructure as Code transformed how teams provision cloud resources. Terraform, Pulumi, AWS CloudFormation, Azure Bicep, Google Cloud Deployment Manager, Crossplane, and the AWS CDK all express cloud infrastructure as version-controlled, reviewable, repeatable code. The benefits are huge: consistency, auditability, faster provisioning, and the ability to spin up identical environments on demand.
The same properties that make IaC powerful also make it dangerous. A typo in a Terraform module can expose a database to the public internet. A copy-pasted CloudFormation stack can grant blanket IAM permissions across an entire organization. Stored state files can leak credentials. CI pipelines that apply IaC hold keys to your entire cloud estate. IaC security has become a discipline of its own.
This guide compiles IaC security best practices for intermediate practitioners building, scanning, deploying, and operating Infrastructure as Code in production.
Core Concepts
IaC security covers the protection of the code itself, the pipelines that apply it, the state and credentials involved, and the resulting infrastructure. It blends DevSecOps practices with cloud-specific knowledge and policy enforcement.
The lifecycle has three main phases. Authoring: writing modules, manifests, or templates. Pipeline: validating and applying changes through CI/CD. Operations: managing state, secrets, drift, and remediation. Each phase has distinct controls.
Several frameworks structure the discipline. The CIS benchmarks for AWS, Azure, and GCP provide the security baselines that IaC should encode. SLSA addresses supply chain integrity for the modules and providers IaC depends on. NIST SP 800-218 and similar guidance emphasize provenance and verification. Cloud-provider-specific guides (AWS Well-Architected, Azure Well-Architected, Google Cloud Architecture Framework) add platform context.
A few principles drive the discipline. Encode the secure baseline in modules. Validate changes in pull request. Apply via least-privileged pipelines. Manage state as sensitive data. Detect and remediate drift continuously.
Authoring Securely
Use community-trusted modules and verify them. Hashicorp's Terraform Registry, the AWS CDK Construct Hub, and similar registries host many useful modules. Audit before use: read the source, check the maintainer reputation, review issues and PRs, and look at recent commits. Pin to specific versions or commit SHAs.
Write reusable internal modules that bake in secure defaults. Wrap raw resources with opinionated modules: a "private S3 bucket" module that defaults to block public access, encryption with CMEK, logging enabled, and versioning. Developers using the module inherit the security; developers using raw resources often forget.
Encode least privilege in every resource. IAM policies should be specific. Security groups should reference other security groups, not 0.0.0.0/0. Storage should be private by default. Databases should disable public access. Make the secure path the default path; require explicit overrides for exceptions.
Avoid plain-text secrets. Never put credentials, API keys, or sensitive configuration in IaC. Reference secret managers (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager, HashiCorp Vault). For Terraform, use sensitive = true on variables and outputs that must hold sensitive values, though this only suppresses logging; the value still appears in state.
Tag everything. Apply mandatory tags through module conventions: owner, environment, data classification, cost center, compliance scope. Tags drive policy, monitoring, cost allocation, and incident response.
Use small, composable modules. Large monolithic modules are hard to review, hard to test, and hard to change safely. Compose small, focused modules with explicit interfaces.
Scanning and Policy
Run IaC scanners on every change. Checkov, tfsec, KICS, Trivy IaC, Snyk IaC, CloudFormation Linter, Bridgecrew, and Prisma Cloud all detect common misconfigurations: open security groups, unencrypted storage, missing logging, public databases, overly permissive IAM.
Integrate scanning into pull request review. Findings appear directly in the PR. Critical findings block merge; lower-severity findings warn. Suppression should require justification.
Adopt policy as code. Open Policy Agent (OPA) and Rego, HashiCorp Sentinel, Checkov custom policies, and cloud-native policy engines let you express organizational standards as code. Common policies: deny resources in disallowed regions, require encryption with customer-managed keys, deny IAM wildcards, require specific tags. Reuse the same policies in CI, admission control, and runtime where possible.
Use plan-stage policy enforcement. Tools like Terraform Cloud's run tasks, Atlantis with custom hooks, and OPA's Conftest evaluate planned changes against policies before apply. Plan inspection is much faster and safer than reacting after deployment.
Maintain a baseline policy library. Start with vendor-provided defaults (Checkov rule packs, Snyk policies) and add organization-specific rules over time. Document each policy's rationale and link to remediation guidance.
Be pragmatic about findings. IaC scanners can be noisy on legacy codebases. Triage realistically. Fix critical findings now; create backlog items for medium findings with clear owners; suppress non-applicable findings with documented justification.
Pipeline Security
Apply IaC through a pipeline, not from developer laptops. Local applies bypass review, audit, and access controls. Make pipelines the only path to production changes.
Use OIDC federation for cloud credentials. GitHub Actions, GitLab CI, and other modern CI platforms can issue short-lived cloud credentials via OIDC. Eliminate long-lived service account keys from your pipeline.
Scope deployment credentials tightly. Per-environment, per-account credentials with only the permissions needed for that scope. Production credentials should be different from staging; security-critical resources may warrant separate roles with extra approval.
Require pull request review. Plan output as a PR comment helps reviewers understand the impact. CODEOWNERS on sensitive paths (IAM, networking, KMS, security tooling) require security team review.
Use environment protection. Production environments require manual approval, time delays, and restricted approvers. Most CI platforms support this directly.
Separate plan and apply. Plans run on every PR. Apply runs only after merge from the main branch. Some teams run apply in two phases: plan with stricter scrutiny, then apply on explicit confirmation.
Audit every change. Every plan and apply should produce a record with author, change summary, and impact. Stream to a SIEM with sufficient retention.
State and Secret Management
Treat state files as sensitive. Terraform state, in particular, can contain plaintext credentials, certificates, and infrastructure details that map your entire estate. Store remote state in encrypted backends (S3 with KMS, Azure Storage with CMEK, GCS with CMEK, Terraform Cloud, Spacelift).
Restrict state access. State files should be readable only by the pipeline and a small set of administrators. Lock state during applies to prevent corruption from concurrent changes.
Use workspace separation. Separate state per environment and per blast radius. A single state file managing your entire production estate is a single point of failure; many small state files are safer and more maintainable.
Avoid storing secrets in state. Where unavoidable (some providers return secrets on creation), retrieve them via outputs only when needed and rely on secret managers for ongoing storage.
Rotate credentials regularly. Even short-lived OIDC tokens benefit from monitoring for unusual usage. Audit credential issuance and resource changes for patterns that suggest compromise.
Drift and Reconciliation
Drift happens. Console clicks, emergency manual changes, and out-of-band automation modify cloud resources without going through IaC. Over time, drift accumulates and the gap between code and reality grows.
Detect drift continuously. Terraform's drift detection, AWS Config, Azure Resource Graph, GCP Asset Inventory, and tools like driftctl identify resources whose actual state differs from intended state. Schedule regular drift scans.
Reconcile drift. For each drift finding, decide: was the change legitimate and should it be codified, or unauthorized and should it be reverted? Codify by updating the IaC; revert by re-applying.
Prevent drift through access control. Lock down direct access to production cloud accounts for humans. Use just-in-time elevation for emergencies, with audit and follow-up to codify any out-of-band changes.
Use GitOps where it fits. Argo CD, Flux, Crossplane, and other GitOps tools continuously reconcile actual state with declared state. Drift is corrected automatically. Pair with strong policy to ensure GitOps does not enforce something insecure.
Operational Best Practices
Maintain a module library. Curated, versioned, documented internal modules accelerate teams and bake in security. Treat modules as products with owners, versioning, and changelogs.
Plan for the inevitable incident. Define playbooks for leaked credentials, compromised pipelines, malicious modules, state corruption, and drift indicating compromise. Practice through tabletop exercises.
Inventory your IaC. Know which repositories own which infrastructure, who maintains them, and which versions are deployed. Tools like Sourcegraph, Backstage, and internal catalogs help.
Train developers on IaC security. The patterns are learnable, but many developers come to IaC from application development without cloud security background. Pair training with code review and paved-road modules.
Measure outcomes. Track time to remediate critical IaC findings, percentage of resources managed by IaC, drift rate, and policy violation trends. Use metrics to drive investment.
Real-world Examples
The 2017 series of public S3 bucket disclosures, including defense contractors and major brands, drove home that misconfigured storage is one of the highest-impact IaC bugs. Modern S3 modules default to block public access; this pattern of "module defaults" prevents an enormous amount of risk.
In 2019, an exposed Terraform state file at a healthcare startup leaked database credentials and infrastructure details. Remote state with strict access controls and credential rotation are the standard mitigations.
The 2024 GitHub Actions self-hosted runner compromises showed how pipelines applying IaC can become attack vectors. OIDC, ephemeral runners, and signed plans are the operational responses.
Multiple incidents have involved Terraform modules from the public registry that exfiltrated credentials or planted malicious resources. SHA-pinning, internal forking, and vetted module catalogs are standard mitigations for security-conscious teams.
IaC security is essential to cloud security at scale. Encode secure defaults in modules. Scan changes in pull request. Adopt policy as code. Apply through OIDC-authenticated pipelines with tight credential scope. Treat state files as sensitive data. Detect drift and reconcile. Train developers and measure outcomes.
For intermediate practitioners, the highest-leverage moves are usually module-level defaults and PR-stage scanning with policy. Each prevents a wide class of misconfiguration before it reaches production. Build from there with state management, drift detection, and pipeline hardening. IaC done well is a force multiplier for cloud security; done poorly it is a force multiplier for breach.
Ready to test your knowledge? Take the IaC Security MCQ Quiz on HackCert today!
Related articles
CI/CD Security: Hardening the Software Development Pipeline
8 min
Container Security: Preventing Cyber Risks in Modern Containerized Applications
12 min
DevSecOps: Ensuring Cyber Security in Every Phase of Software Development
8 min
Pipeline Poisoning: The Risk of Injecting Malicious Code into Software Release Pipelines
10 min

