Today ateom-gvisor gives an active actor network connectivity by moving the worker pod's real Kubernetes eth0 into the actor/gVisor network namespace. That works for basic actor execution, but it has an important side effect: while the actor is running, the worker pod no longer owns its pod network interface.
This makes the worker pod partially disappear from the normal Kubernetes networking model during actor activation. It also prevents us from cleanly adding pod-local networking components, such as an egress proxy or policy enforcement sidecar, because those components remain in the worker pod network namespace while actor traffic leaves through an interface that has been moved elsewhere.
We should replace the eth0 move approach with an explicit veth pair between the worker pod namespace and the actor/gVisor namespace.
Rationale
Keeping the pod's real eth0 in the worker namespace gives us a stable networking boundary:
- The worker pod keeps normal Kubernetes network connectivity while an actor is running.
- Sidecars or worker-local helper processes can remain reachable and can participate in actor networking.
- Actor networking becomes explicit and owned by Substrate instead of depending on moving the CNI-provided interface.
- This creates the foundation for transparent egress capture and policy enforcement.
- It avoids coupling actor lifecycle to destructive mutation of the pod's primary network interface.
This is especially important for planned transparent egress policy work. We want actor traffic to be captured without requiring application opt-in via HTTP_PROXY or HTTPS_PROXY. To do that cleanly, actor traffic needs to cross a worker-owned interface where Substrate can install forwarding, NAT, and later proxy-capture rules.
Proposed approach
For each active actor, ateom-gvisor should create a point-to-point veth pair:
- Worker pod namespace side:
ateom0, for example 10.200.0.1/30.
- Actor/gVisor namespace side: renamed to
eth0, for example 10.200.0.2/30.
- Actor default route points to the worker-side veth IP.
- The Kubernetes-provided pod
eth0 remains in the worker pod namespace.
For compatibility with the current routing model, the initial implementation can install temporary nftables rules:
- Masquerade actor egress behind the worker pod IP.
- DNAT inbound traffic from the worker pod IP and actor service port to the actor veth IP.
- Enable forwarding between the actor veth and pod
eth0.
These rules are intended as a compatibility bridge, not the final egress policy implementation. The later egress work should replace broad actor egress NAT with transparent capture into AgentGateway and default-deny policy rules.
Expected outcome
After this change:
- Actors still start, checkpoint, restore, and receive inbound traffic as before.
- The worker pod retains network connectivity during actor execution.
ateom-gvisor no longer needs to scrape, move, or restore the pod's real eth0.
- Substrate has a cleaner networking foundation for transparent egress capture, policy enforcement, and future worker-local AgentGateway integration.
Validation
The implementation should be validated with:
- Existing focused Go tests for
ateom-gvisor and related networking/server boot packages.
- The demo e2e flow that exercises actor startup, golden snapshot creation, restore, and inbound routing.
- A runtime check that the worker pod still has its Kubernetes
eth0 while an actor is active.
- A runtime check that actor traffic routes through the veth pair and inbound traffic still reaches the actor.
Today
ateom-gvisorgives an active actor network connectivity by moving the worker pod's real Kuberneteseth0into the actor/gVisor network namespace. That works for basic actor execution, but it has an important side effect: while the actor is running, the worker pod no longer owns its pod network interface.This makes the worker pod partially disappear from the normal Kubernetes networking model during actor activation. It also prevents us from cleanly adding pod-local networking components, such as an egress proxy or policy enforcement sidecar, because those components remain in the worker pod network namespace while actor traffic leaves through an interface that has been moved elsewhere.
We should replace the
eth0move approach with an explicit veth pair between the worker pod namespace and the actor/gVisor namespace.Rationale
Keeping the pod's real
eth0in the worker namespace gives us a stable networking boundary:This is especially important for planned transparent egress policy work. We want actor traffic to be captured without requiring application opt-in via
HTTP_PROXYorHTTPS_PROXY. To do that cleanly, actor traffic needs to cross a worker-owned interface where Substrate can install forwarding, NAT, and later proxy-capture rules.Proposed approach
For each active actor,
ateom-gvisorshould create a point-to-point veth pair:ateom0, for example10.200.0.1/30.eth0, for example10.200.0.2/30.eth0remains in the worker pod namespace.For compatibility with the current routing model, the initial implementation can install temporary nftables rules:
eth0.These rules are intended as a compatibility bridge, not the final egress policy implementation. The later egress work should replace broad actor egress NAT with transparent capture into AgentGateway and default-deny policy rules.
Expected outcome
After this change:
ateom-gvisorno longer needs to scrape, move, or restore the pod's realeth0.Validation
The implementation should be validated with:
ateom-gvisorand related networking/server boot packages.eth0while an actor is active.