Terraform is excellent for declaring and reproducing infrastructure. The problem starts when it is attributed a property it does not have: real-time security control capability. In corporate practice, control breaks when someone “touches the console” (or makes a change with CLI) and that change is not governed by the IaC flow.
The result is usually not an immediate incident, but something more dangerous: silent exposure, inconsistent evidence, and a false sense of compliance because “in the repo it’s fine”. That is real drift: reality in cloud diverges from the code and, by extension, from what the team believes is deployed.
What went wrong: the day the urgent change left a door open
A recurring pattern: there is a partial outage or a service degradation, someone with broad permissions enters the console to “unblock” things quickly, and modifies a security rule, an IAM role, or a network route. Sometimes it’s with good intent (restore connectivity), other times due to business pressure (enable access from a vendor). The change is made “temporarily”… and it stays.
The problem is not the adjustment itself: it’s that it was done outside Terraform, without review, without traceability in the repo, and without an automatic mechanism to detect or revert it. Weeks later, the team runs a terraform apply for another reason and, depending on how the resource is modeled, one of two things happens: Terraform does not see the change (because it does not manage that attribute, or because the resource was changed/created outside) or it tries to “fix it” and causes an unexpected disruption.
In internal audits this shows up as inconsistencies: the control requires a posture (for example, “no administrative ports open to the Internet”), but the environment has exceptions without a ticket, without justification, and without expiration. The critical point is cultural and operational: the console becomes a change channel without governance.
Drift is not a detail: it breaks tfstate, evidence, and control expectations
tfstate is not a source of truth for security; it is a record of the last state known by Terraform. If someone changes a rule in a Security Group, an NSG, or a firewall managed outside the IaC cycle, tfstate stays “fine” while the cloud stays “different”. In an investigation, that difference translates into hours lost to understand what is really applied.
The typical case in enterprise is drift in security rules: temporarily opening an administration port, widening a CIDR range “to test”, or adding an exception because a third party “can’t reach it”. When this does not make it back into code, the team loses the ability to secure environments consistently (prod vs preprod) because the console creates unique, non-repeatable configurations.
Another classic: drift in IAM. An inline policy is added “as an emergency”, an overly broad managed permission is attached, or a trust policy is changed to unblock an assume-role. The effect is worse than in networking: the exposure can be lateral (other services, other data) and is harder to detect at a glance. And in networking it happens the same: routes, peering, egress rules, or parameters of a WAF that are touched to “reduce false positives”, leaving no trace in the pipeline.
Operational signals that usually appear when drift is already doing damage:
- “Terraform wants to change things that nobody touched”.
This usually comes up in PRs or plans with unexpected changes. In practice, it is the first clue that the console is being used as a change channel and Terraform is trying to return to the declared world, sometimes in critical components.
- “In the repo it’s closed, but the scanner sees open ports”.
When security validates posture with external tools (scanner, configuration review, findings from Config/Policy), discrepancies appear. It is the moment when it is discovered that real control was never in Terraform, but in discipline and guardrails.
How to detect it after it happened: continuous audit, not “waiting for the next apply”
Effective detection of security drift requires signals outside Terraform. In AWS, AWS Config allows evaluating configurations against rules (managed or custom) and triggering alerts when a condition is violated, for example, “Security Groups must not expose 0.0.0.0/0 on administrative ports” or “S3 must block public access”. In Azure, the equivalent role is filled by Azure Policy, with evaluation and remediation for resources that drift. In GCP, Org Policy and change audit complement that control.
The key is not “having Config/Policy enabled”, but operating it intentionally: what is considered relevant drift, who receives alerts, and what SLA the correction has. In enterprises, a finding without an owner ends up being noise; a finding with an owner and a defined correction window ends up being control.
How to do it in practice in an AWS environment (execution-oriented, not theory):
- Enable AWS Config in relevant accounts and regions and aggregate it at the organization level.
If Config is only in one account or in one region, drift migrates to where nobody looks. Operationally, this translates into “controls that work in prod, but not in satellite accounts”. Ensure coverage where resources are actually created.
- Configure rules that reflect your security non-negotiables (for example, not allowing ports 22/3389 from the Internet, requiring encryption, prohibiting public resources).
Don’t try to model your entire posture on day one. Start with what breaks most with urgent console changes: inbound/outbound rules and public exposure. That’s where drift generates immediate impact and usually appears in incidents.
- Integrate non-compliances with your operations: automatic ticket, alert channel, and escalation path.
A Config finding without a workflow fixes nothing. What works in enterprises is: alert → ticket with context (resource, change, actor) → business verification → revert or formalize the exception in code with expiration.
Practical validation: in AWS review the Timeline in AWS Config of the affected resource to confirm when it changed, what attribute changed, and whether there was remediation. In addition, correlate with CloudTrail to identify the principal (user/role) that executed the change. That correlation is what turns suspicion into operable evidence.
When someone touches the console: typical scenarios that degrade security without anyone noticing
In real life, most “manual” changes are not malicious: they are shortcuts under pressure. That’s why they are so dangerous: they become normalized. A recurring example is opening a port to a broad range “so it works now”, with the intention of narrowing it later. That “later” competes with roadmap, on-call, and projects, and rarely arrives.
In IAM, the typical scenario is the temporary permission that becomes permanent. A broad managed policy is attached to an application role to unblock a deployment; then the permission stays and nobody removes it because “if I touch it I’ll break something”. That debt accumulates and, when there is a credential compromise or internal abuse, the blast radius is larger than expected.
In networking, changes to routes, gateways, peering, or egress rules are made to restore connectivity with third parties or with an on-prem environment. If that change does not go through the declarative model, the team loses reproducibility: the same Terraform module applied in another account does not generate the same behavior, which breaks testing, DR, and controlled scaling.
A very corporate (and expensive) anti-pattern is allowing “break-glass” without real operational limits. A super-privileged role is created for emergencies, but it is used for daily tasks because “it’s faster”. Without limits, that role becomes the usual drift path: it not only changes resources, it also changes the conditions so others can change resources.
Recommendations for corporate environments
Terraform does not guarantee security control if there is a parallel channel of changes without governance. Drift between what is declared and what is executed appears especially in security rules, IAM, and networking, and is usually discovered through side effects (findings, unexpected plans, audits) rather than through a “visible failure”.
The practical response in enterprise combines guardrails (to prevent or limit unsafe changes from the console) with continuous audit (to detect and manage deviations when they occur). Configure detection (AWS Config / Azure Policy and event auditing), integrate alerts with operations (tickets, owners, SLA), and validate with evidence (CloudTrail and configuration timelines) so control doesn’t depend on “seeing if the next apply fixes it”.
Interested in Cloud Security?
Technical analysis, hands-on labs and real-world cloud security insights.