In cloud, most serious compromises I have seen in enterprises do not start with a 0-day or with an exposed port. They start with an identity and a token that “fit” into the normal flow: a pipeline that deploys, a SaaS connector that syncs data, a service account that signs requests, or a role assumed by automation. The attacker does not need to break the system if they can enter through the same door processes use.
The key change is operational: the perimeter is no longer the network. The real unit of control is identity and the associated authentication material (tokens/keys). And when we talk about non-human identities, the risk multiplies because they are persistent, widely distributed (CI/CD, runners, integrations) and often have broad permissions for “convenience”.
Real attack surface: non-human identities and tokens that behave like master keys
Non-human identities (NHI) share a common pattern: they are hard to “see” like a user, but they operate all day. In a typical corporate environment there are dozens or hundreds: service accounts, automation roles, provider integrations, webhooks, bots, monitoring agents. Each one needs to authenticate, and that ends up materializing as a secret: an API key, a refresh token, a client secret, a certificate, or temporary credentials issued by an STS.
In real incidents, the breaking point is usually token management, not system logic. A token leaked in job logs, in a build artifact, in poorly protected environment variables, or in a private repository that “wasn’t that private”. The practical consequence is that the attacker can operate as if they were that automation: list secrets, read buckets, pivot between accounts/projects, or even create new identities to persist.
- Long-lived tokens in SaaS integrations
A SaaS connector that uses refresh tokens without real rotation becomes a stable credential. If it is exfiltrated, the attacker does not need to maintain a channel: they renew access and blend into the connector’s normal traffic.
- Automation with broad permissions “to avoid tickets”
When a pipeline has administrative permissions “just in case”, the attacker does not look to escalate: they already enter escalated. In enterprises this translates into fast impact (deletion of resources, creation of access, changes in logging) and a higher MTTR because it is hard to separate legitimate actions from malicious ones.
Abuse scenarios that repeat: they don’t break, they log in
The first typical scenario is credential theft in the delivery chain. A compromised (or poorly isolated) CI runner extracts environment variables that contain cloud or SaaS tokens. From there, the attacker assumes a role, obtains temporary credentials and starts enumerating: “what can I read”, “what can I modify”, “what can I create”. None of this requires exploiting a vulnerability in the cloud provider: it is IAM working as it was designed.
The second scenario is abuse of legitimate federation flows. Many companies connect corporate identity with cloud and with SaaS; that is correct, but if a session token, a refresh token or a client secret for an app leaks, the attacker enters through the “official” route. In logs it looks like successful authentication and standard API calls. The damage appears later: policy changes, creation of access keys, disabling alerts or lateral movement toward data.
- Persistence via creation of new non-human identities
Once inside, it is common to create a new service account/role “for maintenance”, assign it permissions and leave it as a backdoor. In large organizations, a name similar to existing ones goes unnoticed if there is no inventory hygiene and periodic reviews.
- Silent exfiltration using read permissions
With read permissions over buckets, queues, managed databases or secrets, the attacker can extract data gradually. The traffic can look like just another sync job if identity-origin-volume-target are not correlated.
Early signals and telemetry: what gives away a stolen token
Effective detection almost always starts with a question: “is this non-human identity acting as it always does?”. NHIs should be predictable: same endpoints, same regions, same schedules, same resources. When there is a token compromise, the attacker breaks that predictability. Not because they “make mistakes”, but because they need to explore and expand permissions or scope.
In real operations, the most useful signals are not generic; they are deviations against baseline. For example: a CI role that suddenly calls IAM APIs to create policies; a SaaS integration that starts enumerating buckets; an access token used from a region where operations never happen; an identity that does massive List* when it normally makes pinpoint calls.
- Changes in the API call pattern
If a service account normally executes 3–4 actions (for example, read a secret and deploy), any appearance of administrative actions (create roles, attach policies, disable logging) must be treated as a priority suspicion.
- Context anomalies: region, IP, user-agent, provider
Stolen tokens are often used from infrastructure different from the corporate one. If context is not validated (network, geography, provider, consistent headers), an attacker can operate “legitimately” from anywhere.
How to do it in practice: IAM guardrails and token control for NHI (with examples in AWS)
Useful mitigation is not “put MFA on everything” (many NHIs cannot). It is designing token issuance and use so they are short, scoped and attributable. In AWS, the goal is that almost no automation uses long-lived keys, and that assumed roles have conditions that prevent use outside the expected context.
The practical starting point is to inventory NHIs by function (what it does), authentication method (how it gets a token) and blast radius (what it can touch). With that you can enforce: short session duration, least privilege and trust conditions (trust policy) that limit who and how can assume roles.
- Restrict the CI/CD role to OIDC and block long-lived keys
If you use GitHub Actions or a similar provider, prioritize OIDC to assume roles in AWS. The trust policy must require expected sub/aud (repository, branch, workflow). That way, even if someone steals a token from another environment, they will not be able to assume the role if they do not meet conditions.
- Enforce context and reduce surface with conditions
Introduce conditions like aws:RequestedRegion, aws:PrincipalArn, or session tags when applicable. The goal is that “the role only works” from the intended flow. This reduces the value of an exfiltrated token.
Example trust policy (AWS IAM) for OIDC with GitHub Actions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {"Federated": "arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com"},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
},
"StringLike": {
"token.actions.githubusercontent.com:sub": "repo:mi-org/mi-repo:ref:refs/heads/main"
}
}
}
]
}
Validation in AWS (what to review, not just “configure”):
- CloudTrail: verify that CI calls come in as
AssumeRoleWithWebIdentityand not as IAM User access keys. IfAccessKeyIdfrom users appear, long-lived credentials are still in play. - Session duration: review the role’s
SessionDurationand in STS events thesessionContext. Excessively long sessions increase the abuse window. - Unexpected IAM actions: create alerts when automation identities call
iam:Create*,iam:Attach*,cloudtrail:StopLoggingor policy changes. If your pipeline does not manage IAM, those actions should be “impossible”.
An anti-pattern I see frequently: “since the runner is inside our VPC, it is safe”. If the token leaves the runner (logs, artifacts, variables), the VPC protects nothing. Control must be in the issuer (IAM/IdP) and in the token’s scope.
Recommendations for corporate environments
Today’s cloud compromises are better explained by identity and tokens than by network failures. In particular, non-human identities concentrate risk because they operate continuously, rely on reusable secrets and tend to accumulate permissions. When an attacker obtains a valid token, the activity blends with normal operations: they don’t “break”, they log in and use APIs.
In practice, the most tangible improvement comes from treating tokens as critical material: reduce lifetime, eliminate long-lived keys where possible, scope permissions to the real function and add trust conditions that prevent out-of-context use. At the operations level, detecting compromises requires baselines per non-human identity and alerts on clear deviations (context, region, anomalous IAM actions), because that is where a stolen token starts to give itself away.
Interested in Cloud Security?
Technical analysis, hands-on labs and real-world cloud security insights.