The pattern repeats itself: a vulnerability in an API inside a container (RCE, SSRF, insecure deserialization) looks like “just” an application incident. But in modern EKS/AKS, the Pod usually has a cloud identity (IRSA in AWS, Workload Identity in Azure) to access resources. If that identity is oversized or poorly scoped, the attacker does not need to break the Kubernetes control plane: it is enough to use valid credentials and escalate in the underlying cloud to levels equivalent to “root” (for example, full control of the tenant or the account).
The critical thing is not the existence of IRSA/Workload Identity, but the permissions model and isolation: what that Pod can assume, what resources it can touch, what it can create/modify, and what “escalation doors” exist (broad IAM policies, roles with delegation capability, identities with permissions over RBAC, etc.).
What went wrong: when the Pod identity becomes a master key
In AWS, a Pod with IRSA obtains temporary credentials for an IAM role. In Azure, a Pod with Workload Identity (OIDC federation) obtains tokens for an identity (Managed Identity or App Registration/Service Principal). In both cases, the design aims to eliminate static secrets. The problem appears when that identity is granted “platform” permissions for convenience: administer IAM/Entra ID, create roles, modify policies, operate networks, or manage clusters.
In real incidents, the starting point is usually mundane: a container image with an exposed server, a vulnerable library, or an endpoint that allows SSRF to metadata. The attacker does not need sophisticated persistence; by making a few calls to AWS STS or to Azure’s token endpoint from the Pod, they already have an operational cloud identity to pivot.
The most common business consequence is that the incident stops being “an app down” and becomes a cloud security event: creation of users/roles, data exfiltration in buckets/storage accounts, modification of network rules to maintain access, and deployment of workloads for mining or backdoors. In environments with broad automation (CI/CD with privileges, IaC with extensive permissions), escalation can be a matter of minutes.
Typical attack chain: from RCE/SSRF in the container to tenant control
The chain usually does not require exploiting Kubernetes itself. The attacker compromises a Pod, obtains its cloud identity, and executes actions in the AWS/Azure API. In AWS, the frequent goal is to find a role with permissions to create/modify policies or to assume more privileged roles. In Azure, the goal is usually to obtain permissions over subscriptions/resource groups, or over Entra ID (directory roles) to persist and escalate.
A common example in AWS: the IRSA role has permissions to manage IAM “because the platform team needed a controller to work”. With that, the attacker can attach a broad managed policy or create a new policy and attach it to the current role, or create a new role with an open trust policy and assume it. In Azure, a Managed Identity with Contributor at the subscription level can create resources with identities, assign roles (if it also has RBAC permissions), or modify infrastructure to open outbound paths.
- Write permissions over identity: allowing
iam:CreatePolicy,iam:AttachRolePolicyor Azure equivalents (roleAssignments write) often turns the Pod identity into a direct “escalator”.
In enterprises, this translates into a vulnerability in an internal app (for example, a billing microservice) ending up allowing modification of the policy of a role shared by multiple services, amplifying the impact. The team sees “legitimate” activity in CloudTrail/Activity Logs because, technically, it is: it comes from a valid identity.
- Ability to create privileged infrastructure: if the Pod can create instances/VMs, functions, or resources with attached roles/identities, it can manufacture a more “comfortable” execution point with greater permissions and persistence outside the cluster.
The operational pattern behind these incidents is the lack of separation between “workload” identities (microservices) and “platform” capabilities (IAM/RBAC, networks, clusters). When mixed, the Pod becomes an entry point into the account.
Early signals that reveal escalation from a Pod (before it is too late)
Detection should not depend only on “container breakout” alerts. Here the attacker operates through cloud APIs with temporary credentials. In AWS, CloudTrail shows AssumeRoleWithWebIdentity and then a sequence of calls unusual for that role. In Azure, Activity Logs and Entra ID sign-in logs reflect federated access and role assignments or resource changes outside the workload’s normal pattern.
In corporate environments, a strong signal is divergence between the Pod’s expected behavior (read from S3/Storage, query a queue) and the actual activity (create policies, list secrets, describe VPC/VNet, create role assignments). Another useful indicator: permission increases (policy attachments, role assignments) at unusual times or from unusual regions, or from identities associated with “application” namespaces rather than “platform”.
- AWS: bursts of
iam:*orsts:AssumeRolefrom an IRSA role, especially if the role historically only useds3:GetObjectorsecretsmanager:GetSecretValue.
In day-to-day operations, this is validated by correlating CloudTrail with the role name used by IRSA and the corresponding namespace/serviceAccount. If the Pod identity executes actions outside the approved set for that service, it is not “noise”: it is often the start (or continuation) of an escalation.
- Azure: creation/modification of
roleAssignments, changes toMicrosoft.Authorization/*, or administrative actions from a Workload Identity associated with a microservice.
A real difficulty is cultural: many teams accept broad permissions “temporarily” to unblock deployments. That temporariness is rarely reversed, and detection is complicated because actions pass standard authentication and authorization controls.
How to do it in practice: hardening IRSA (AWS) and Workload Identity (Azure) to cut off escalation
The practical goal is for a compromised Pod to have a small “blast radius”: minimal permissions, no delegation capability, no identity administration, and conditions that bind federation to the exact service account. This is achieved by combining well-scoped IAM/RBAC policies, permission boundaries, and identity assignment controls by namespace.
AWS (EKS + IRSA): start by hardening the role’s trust policy and then trimming its permissions policy. The trust policy must restrict by sub (serviceAccount) and aud, avoiding wildcards by namespace. An example of a correct trust policy (adjust your OIDC provider and namespace):
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.eu-west-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E"},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.eu-west-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:aud": "sts.amazonaws.com",
"oidc.eks.eu-west-1.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:sub": "system:serviceaccount:payments:svc-payments-api"
}
}
}]
}
Afterwards, validate that the role cannot escalate via IAM. In practice, that means: removing any iam:* that is not strictly necessary; avoiding write permissions over policies/roles; and, if your organization supports it, applying a permissions boundary that prevents attaching administrative privileges even if someone manages to introduce a new permission by mistake.
Validation in AWS: review in IAM the role associated with the service account and verify (1) trust policy without wildcards by namespace, (2) attached policies without dangerous IAM/STS actions, (3) CloudTrail to confirm the role only calls expected services. A concrete and repeatable check is to look for events of AttachRolePolicy, PutRolePolicy, CreatePolicyVersion or AssumeRole emitted by that role: if they appear, the design is allowing delegation or escalation.
Azure (AKS + Workload Identity): the key control is that OIDC federation is restricted to the service account subject and that the identity has minimal roles at the smallest scope possible (resource group or resource, not subscription). In corporate practice, where this breaks most often is assigning Contributor at the subscription level to “speed up”. That enables lateral movement through infrastructure and, if there are also permissions for role assignments, escalation is direct.
Validation in Azure: confirm that the identity used by the workload does not have broad roles at high scopes (subscription/management group) and review Activity Logs for authorization operations (Microsoft.Authorization/roleAssignments/write) originating from that identity. If they appear, the workload is in a dangerous zone: even an SSRF can trigger permission changes.
Typical mistakes that open the door to “root” without touching Kubernetes
The most expensive mistake is treating workload identities as platform identities. In AWS, granting an IRSA role permissions intended for operators (IAM, organizations, networks) because “the controller needs it” is often a tactical decision that becomes structural. In Azure, the equivalent is using application Managed Identities with Contributor/Owner to avoid friction with deployments and then reusing them across several namespaces.
Another repeated failure is identity reuse: the same role/identity for multiple service accounts or namespaces. That turns a compromise in one microservice into compromise of “everything that shares identity”. In internal incidents, this design complicates response: revoking the role breaks production for multiple apps, so containment is postponed, and the attacker gains time.
- Wildcards in trust/federation: allowing the role to be assumed from any
system:serviceaccount:*or not pinning the subject in Azure makes it easier for any Pod that manages to create/use a service account to “point” to that identity.
In enterprises, this often appears after fast migrations: a configuration “that works” is copied and generalized. The result is that namespace isolation is more cosmetic than real.
- Permissions to read secrets + unrestricted egress: if the Pod can read Secret Manager/Key Vault without limits and also has uncontrolled Internet egress, exfiltration is trivial and hard to distinguish from normal traffic.
This is not theory: many teams discover the problem when auditing after a pentest. The typical finding is not “container escape”, but “the Pod identity can list secrets from other systems and modify permissions”.
Recommendations for corporate environments
The risk of going from a compromised Pod to full cloud control does not depend on “hacking Kubernetes”, but on how identities and permissions have been modeled around IRSA/Workload Identity. If the workload identity can delegate, administer IAM/RBAC, or create privileged infrastructure, escalation is an operational consequence, not a surprise.
In corporate practice, risk reduction comes from: strictly binding federation to the correct service account, removing platform permissions from application identities, avoiding reuse of roles/identities across workloads, and continuously validating with traces (CloudTrail/Activity Logs) that Pod roles only execute expected actions. When this is done well, a vulnerability in a microservice stays within its perimeter and stops being a bridge to “root” in the account.
Interested in Cloud Security?
Technical analysis, hands-on labs and real-world cloud security insights.