There are architectures that “pass” reviews because they have IAM with MFA, segmented networks, encryption, and a connected SIEM. And still, the day something happens, the team is left without answers to the basic questions: who did what?, from where?, what exactly changed?, what tokens were used?, what resources were touched?
The pattern repeats itself: security looked solid in the design, but when reviewing the logs it turns out that logging is partial, that CloudTrail/Activity Logs were misconfigured, or that retention and integrity do not allow the evidence to be trusted. The worst part is not “not seeing”; the worst part is believing that you are seeing.
What went wrong: security was in the diagram, not in the evidence
In real incidents, the first shock is usually operational: the team tries to reconstruct a timeline and discovers gaps. For example, there are access records to an application, but there is no control-plane traceability (who created a key, who modified a policy, who disabled a control). The result is that the analysis relies on hypotheses and not on facts, and that prolongs containment.
The cause is rarely “there is no logging.” Almost always there is incomplete logging: CloudTrail enabled in a single account but not at the organization level, Activity Logs without export to a central workspace, or logs for critical resources (KMS, S3, Key Vault, IAM) without the events that actually explain a change. In companies this translates into costly decisions: massive credential rotation due to lack of certainty, preventive shutdowns, or excessive restrictions that impact the business.
Another typical failure is blind trust that “the SIEM already collects everything.” If the ingestion pipeline fails or filters, the SIEM can show a clean picture while the source is incomplete. That gap is the breeding ground for a false sense of control: compliance is reported, but an incident cannot be answered precisely.
Common causes: CloudTrail and Activity Logs configured “halfway”
CloudTrail is usually “enabled,” but without the details that matter when there is credential abuse or permission changes. The classic example: not logging Data events in S3 or Lambda due to cost/volume without a risk analysis, and then not being able to prove which objects were read or exfiltrated. Another: not enabling management events in all regions, and discovering late that the attacker operated in a “forgotten” region.
In Azure, the frequent equivalent is staying with the default Activity Log without exporting it to Log Analytics/Storage/Event Hub with adequate retention. When you have to investigate a privilege escalation or a policy modification, the team finds that retention was short or that the export did not include relevant categories (for example, changes to resources, RBAC, Key Vault). The incident is not more severe due to lack of control; it is more severe because you cannot prove the scope.
There is also an organizational component: each team enables logs in its own way. The result is an architecture with islands: some projects send to a central bucket, others to a local workspace, others export nothing. In a crisis, coordinating searches across three systems and with different formats is wasted time paid for with exposure.
- Multi-account/multi-subscription without unified logging
When there is no organization-level strategy (AWS Organizations / Management Group), it is common for new accounts/subscriptions to be created without the logging “baseline.” In audits the main account is reviewed; in incidents, the attacker moves where no one is looking.
- Insufficient retention to investigate
Many corporate investigations do not start the same day as the event: they are detected by late indicators (fraud, anomalous billing, third-party findings). If log retention is days or a few weeks, you lose the window to attribute actions and delimit impact, and you end up applying global measures “just in case.”
Early signals: when logging lies to you without you noticing
A practical signal is the asymmetry between what the team expects to see and what actually appears. For example: a modified IAM policy is detected, but there is no event showing who changed it; or a created resource is seen, but there is no trace of the session/role that executed it. Those kinds of gaps usually point to logging disabled in certain regions, accounts, or to an incomplete export.
Another signal: log volume metrics that are “too stable.” In environments with real change, the control plane generates variability. If CloudTrail or Activity Logs remain flat for weeks while there are deployments, changes, and rotations, it is reasonable to suspect that not everything is being collected or that the pipeline is filtering too much.
In companies, these signals are usually discovered by accident: during an incident, when preparing evidence for an audit, or when an analyst tries to correlate events and finds nothing. The cost is double: the response is compromised and trust in the controls is eroded, which often leads to manual “micro-controls” and friction between teams.
- Alerts that never fire
If you have rules for Security Group changes, access key creation, MFA deactivation, or role modifications, and they never fire despite those changes occurring, it’s probably not that “everything is perfect”: it’s that the source is not arriving or arrives incomplete.
- Investigations that depend on application logs
When the team is forced to reconstruct control-plane actions from application logs (or CI/CD logs), it is usually because the cloud provider’s native logs are missing or are not centralized. It is a fragile dependency: application logs do not replace evidence from IAM, KMS, or policies.
How to do it in practice: validate that logs exist, cover, and are trustworthy
Effective validation is not “CloudTrail is ON.” It is checking coverage, centralization, retention, and integrity with controlled tests. In AWS, a realistic practice is to execute a set of expected actions (create a role, attach a policy, modify a Security Group, read an S3 object in a sensitive bucket) and verify that each action generates the corresponding event in the central repository, with the correct actor and enough context (sourceIPAddress, userAgent, ARN, requestParameters).
Concrete actions that often yield immediate results in corporate environments: create an organization-level trail, force it to be multi-region, and send it to a central bucket with strict write policies. If you decide to log Data events, do it at least for “crown jewels” (buckets with sensitive data, critical functions). In Azure, the operational equivalent is configuring Diagnostic settings to export Activity Logs and resource logs to a central destination with retention aligned with investigation needs.
A critical point: trustworthiness. If an attacker with permissions can delete or alter logs, your “evidence” stops being evidence. That’s why, in addition to collecting, you have to protect storage and access, and test it: try deleting log objects with a normal operational role and confirm it fails; validate that platform teams do not have modification permissions over the audit repository except for a controlled break-glass.
Example (AWS S3) policy to prevent deletion/alteration of logs in the destination bucket (adapt to your case; here the focus is to deny deletion and changes):
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyDeleteOnAuditLogs",
"Effect": "Deny",
"Principal": "*",
"Action": ["s3:DeleteObject","s3:DeleteObjectVersion"],
"Resource": "arn:aws:s3:::mi-bucket-auditoria/*"
},
{
"Sid": "DenyBucketPolicyChanges",
"Effect": "Deny",
"Principal": "*",
"Action": ["s3:PutBucketPolicy","s3:DeleteBucketPolicy"],
"Resource": "arn:aws:s3:::mi-bucket-auditoria"
}
]
}
Validation in AWS (operational, not theoretical): in CloudTrail Lake or in the central bucket, look for events such as AttachRolePolicy, PutRolePolicy, CreateAccessKey, UpdateAssumeRolePolicy, AuthorizeSecurityGroupIngress and confirm that they appear for your tests. In Azure, validate in Log Analytics that the expected tables receive events after real changes (RBAC, deployments, Key Vault operations) and that the timestamp and the actor match.
Recommendations for corporate environments
If an architecture “seems secure” but does not allow events to be reconstructed accurately, in practice it is not: security without operational evidence becomes irrelevant when decisions have to be made under pressure. The most damaging failures are not usually the total absence of logs, but partial coverage, incomplete centralization, and insufficient retention, which feed a false sense of control.
To avoid this, the criterion must be verifiable: execute controlled actions and confirm they are recorded in a central repository, with adequate retention and with protections that prevent deletion or tampering. If, when reviewing CloudTrail/Activity Logs, gaps appear, it is not a detail: it is a direct risk to response capability, compliance, and operational cost during an incident.
Interested in Cloud Security?
Technical analysis, hands-on labs and real-world cloud security insights.