Outbound (Egress) Control in Serverless: how to prevent your functions from leaking data

In serverless, the classic perimeter disappears quickly: you don’t manage hosts, scaling is automatic, and functions inherit connectivity that often goes unquestioned. The problem is that egress (where a function can “go out” to) usually stays in the most convenient mode: open Internet. When a function processes sensitive data or has access to secrets, that outbound path is the perfect channel for exfiltration or to connect to a Command & Control (C2) if someone manages to inject code or abuse dependencies.

Controlling egress is not “put a VPC on it and done”. In enterprises, the real goal is that only what is necessary can be reached (internal cloud services, corporate APIs, databases) and that any attempt to go out to unauthorized destinations fails and is logged. Below is an operational approach focused exclusively on egress control for serverless.

What egress control means in serverless when the attacker is already “inside”

Most incidents I’ve seen with functions don’t start with “someone hacked Lambda/Cloud Functions”, but with something more mundane: a compromised dependency, an environment variable exposed by a logging error, temporary credentials used out of context, or an input that ends up executing where it shouldn’t. At that point, the attacker doesn’t need sophisticated persistence: it’s enough that the function can make outbound connections without restrictions.

If your function can resolve DNS freely and open sockets to the Internet, the abuse patterns are predictable: upload data to an external bucket, send fragments to an HTTPS endpoint controlled by the attacker, or maintain beaconing to a C2 to iterate commands. In corporations, this is worsened when the function also has read permissions to internal stores (S3/Blob, databases, queues). Egress control acts as a “last barrier”: even if the runtime is compromised, traffic should not be able to go out to arbitrary destinations.

The practical consequence of not doing it is usually twofold: data loss (through silent exfiltration) and long containment times, because without outbound controls and without telemetry, it is hard to prove what data could have left and where.

Connectivity design: VPC, private endpoints, and controlled egress (without relying on “application-level blocks”)

The most effective pattern in cloud is to force the function to operate in a network where no direct route to the Internet exists and where access to managed services is via private endpoints. In AWS, that usually implies: Lambda inside a VPC, private subnets, VPC Endpoints (Gateway/Interface) for required services, and routes that avoid a 0.0.0.0/0 to an Internet Gateway. If you need occasional outbound access, it is concentrated at a single point (NAT or firewall) with controls.

This not only reduces surface area; it also changes the operational game: when the security team needs to block a destination or investigate a flow, it no longer depends on “which library is making the call?” but on network rules and centralized logs. In audited environments, this approach makes it easier to demonstrate that “by design” the function cannot talk to the Internet except for explicit exceptions.

  • VPC Endpoints for cloud services: they allow the function to consume S3/Secrets Manager/DynamoDB, etc., without going out to the Internet.

    The real implication is that a compromised function will still be able to talk to those services, but only to the approved ones. Additionally, you can scope access with endpoint policies (for example, to specific buckets), reducing the exfiltration blast radius even within the cloud itself.

  • NAT or egress firewall as the single outbound point: if there are unavoidable external dependencies (third-party APIs), traffic must exit through a controlled point.

    In enterprises, this enables allowlists by FQDN/IP, TLS inspection when it applies to corporate policies, and above all flow logging. The trade-off is cost and complexity: an “open” NAT is not control; a well-managed firewall is, but it requires daily operations.

  • Private subnets with no Internet route: if there is no explicit Internet requirement, remove the path entirely.

    It is the difference between “we trust that nobody calls out” and “even if it tries, there is no route”. This reduces incidents where a minor change (e.g., a new dependency that does telemetry) starts sending data out without anyone noticing.

How to do it in practice in AWS: close Internet and allow only endpoints (with verifiable examples)

The operational goal is clear: no route to the Internet from Lambda subnets, and managed AWS access through VPC Endpoints with restrictive policies. The typical change in enterprises fails due to “half measures”: the function is put in a VPC, a NAT is created for convenience, and 0.0.0.0/0 is left without controls. That is connectivity, not egress control.

Concrete (minimum) actions I usually require before considering egress “closed”:

  • Create private subnets for the functions and associate route tables without 0.0.0.0/0 towards an Internet Gateway.

    Real validation: review the effective route table of those subnets. If there is a default route to an IGW or to a NAT “without policy”, you are leaving the door open (directly or indirectly).

  • Create VPC Endpoints for the required services (for example, S3 as a Gateway Endpoint; Secrets Manager/KMS/STS as Interface Endpoints depending on the case).

    Real validation: from the function, force calls to those services and check that they resolve to private IPs (Interface endpoints) or that traffic does not traverse NAT (Flow Logs). If traffic goes to the NAT, you are not using the endpoint.

  • Restrict the endpoint with policy when the service allows it.

    Typical example with an S3 Gateway Endpoint: allow only specific corporate buckets. This limits exfiltration “within S3” (e.g., uploading to a bucket that should not be used).

Example of a policy for an S3 VPC Endpoint that only allows access to a specific bucket and denies the rest. Adjust ARN, region, and account to your case:

VPC Endpoint Policy (S3)

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowSpecificBucketOnly",
"Effect": "Allow",
"Principal": "*",
"Action": ["s3:GetObject", "s3:PutObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::mi-bucket-corporativo",
"arn:aws:s3:::mi-bucket-corporativo/*"
]
},
{
"Sid": "DenyAllOtherBuckets",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": "*"
}
]
}

To verify that egress is truly closed: enable VPC Flow Logs on the relevant subnets/ENIs and look for connection attempts to public destinations (REJECT/ACCEPT). If traffic to the Internet appears, the question is not “why is the function doing it?”, but “why does the network allow it?”.

Egress policies: security groups, NACLs, DNS, and the real problem with FQDN allowlists

In serverless inside a VPC, the first tangible control is usually the function’s security group: there you can reduce ports and destinations, but with a known limitation: you typically work with IPs/CIDRs, not names. In corporations, many third parties expose endpoints with changing IPs (CDNs), and an allowlist by IP becomes fragile: either the service breaks every week, or you end up opening too much.

The practical alternative when you need allowlists by logical destination is to introduce a component that does understand FQDN or that applies policies at the application/network level: a managed egress firewall or a corporate proxy. If that component is not there, egress control ends up degrading into “we allow 443 to any site”, which is exactly what you wanted to avoid.

  • Security Groups as the “lowest common denominator”: limit outbound to strictly necessary ports (e.g., 443) and to internal destinations (corporate CIDRs, database subnets, private endpoints).

    This prevents exfiltration through trivial channels (SMTP/IRC/weird ports) and reduces the C2 surface, but it does not by itself solve Internet access over 443 if you leave it open.

  • NACLs for additional controls when there are harder segmentation needs or audit requirements.

    In practice, NACLs tend to introduce operational complexity (stateless, rule order). Use them to reinforce “no egress” or to block obvious ranges, not as a substitute for a design with endpoints and a single egress point.

  • Controlled DNS as part of egress: if the function can resolve any domain using uncontrolled resolvers, you lose visibility.

    Forcing the use of corporate resolvers (and logging queries) helps detect C2 by domains and reduces investigation time. Note: blocking by DNS without blocking by network is often evadable if there is still a direct IP route.

A frequent anti-pattern in enterprises is implementing “DNS blocking” as the primary control while keeping an open NAT. When an incident hits, the attacker doesn’t need DNS: they can use direct IPs or DoH to an allowed destination. Egress control must be anchored in routes and outbound policies, not only in name resolution.

Exfiltration signals and how to validate that egress control works (before the incident)

If your outbound control is real, it must produce operable evidence: blocked attempts, metrics, and traceability of “which function tried to go out, where, and when”. In real life, the first indicator is usually subtle: latency increases due to timeouts (when you correctly block a destination), spikes in connection errors, or strange DNS patterns from functions that “should never” talk outside.

Validation should be part of your change routine. Every time a new function or an external vendor comes into play, you validate that the allowed flow is minimal and that the denied traffic is logged. Without that discipline, egress control degrades over time due to urgent exceptions that nobody rolls back.

Tests that work well in enterprise environments:

  • Controlled negative test: from a test function, try to connect to a disallowed public domain (for example, your own test endpoint) and confirm it fails.

    The key is that the failure has traceability: in Flow Logs you should see the REJECT (or the block in firewall/proxy), and in the function execution logs the expected error should appear. If it “fails” but there is no trace on the network side, you won’t be able to investigate a real case.

  • Minimal positive test: validate that the required endpoints work (S3/Secrets Manager/internal DB) without going through NAT.

    This avoids the classic scenario where “everything works” because NAT is acting as a bypass, and nobody realizes until you try to close it and break production.

When control is well implemented, the impact on incident response is direct: you can respond with quick containment (revoke a route/policy), and you can bound exposure with network evidence instead of assumptions.

Recommendations for enterprise environments

Outbound control in serverless is effective when it is supported by network design: private subnets with no Internet route, consuming services via VPC Endpoints and, if Internet is needed, a single egress point with policies and logging. That drastically reduces the probability of exfiltration or C2 even if a function is compromised.

In operations, the standard should be: private endpoints and restrictive policies first; open NAT never as a “shortcut”; and continuous validation with negative/positive tests and telemetry (Flow Logs and firewall/proxy logs). The result is fewer silent incidents and, when one happens, a much shorter investigation that is defensible under audit.


Interested in Cloud Security?

Technical analysis, hands-on labs and real-world cloud security insights.

Privacy policy