Skip to content

scripts: probe worker pod egress#541

Open
benben wants to merge 2 commits intomainfrom
ben/probe-worker-egress-script
Open

scripts: probe worker pod egress#541
benben wants to merge 2 commits intomainfrom
ben/probe-worker-egress-script

Conversation

@benben
Copy link
Copy Markdown
Member

@benben benben commented May 7, 2026

Summary

Probe a worker pod's network egress and produce a pass/fail matrix for an allow/deny target list. Designed to run before and after a Cilium NetworkPolicy lands so a diff highlights what got blocked.

Companion to charts PR #10949 (`feat(duckgres): cluster-wide Cilium egress policy for worker pods`).

Mechanism

Each probe is its own `kubectl debug` ephemeral container attached to the worker pod's network namespace. Cilium endpoints cover all containers in a pod, so the ephemeral container inherits the worker's policy.

Two non-obvious things the script handles:

  • A consolidated single-debug-call form (run all probes inline with base64+while-read) silently dropped output past ~3 probes due to streaming-buffer behaviour of ephemeral containers. So one debug call per probe — slower but robust.
  • `kubectl debug --attach=true` does not propagate the inner shell's exit code (always returns 0 once attached). The script embeds `__PROBE_EXIT=$?` as a sentinel in stdout and parses it back.

Result on mw-dev (post-policy)

```
KIND TARGET EXPECTED RESULT VERDICT
TCP cache-proxy (node-local) allow reachable PASS
DNS kube-dns resolution allow reachable PASS
HTTPS S3 region endpoint allow reachable PASS
HTTPS public internet (example.com) allow reachable PASS
TCP EC2 IMDS (169.254.169.254) block blocked PASS
HTTP EC2 IMDS block blocked PASS
TCP kube-apiserver block blocked PASS
TCP other tenant RDS (world) allow reachable PASS (documented trade-off)
TCP other worker (Flight) block blocked PASS
```

benben added 2 commits May 7, 2026 08:34
Drives kubectl debug ephemeral containers attached to a worker pod's
network namespace and reports a pass/fail matrix for known allow/deny
destinations. Designed to be run before and after the Cilium worker
egress policy lands so a diff highlights what got blocked.

Targets:
  allow: cache-proxy on node, kube-dns, S3 region endpoint, public
         internet (example.com), and (per documented trade-off) any
         RDS endpoint reachable as the world entity.
  block: EC2 IMDS, kube-apiserver, peer worker pods (Flight port).

One kubectl debug call per probe — the inline-loop variant silently
dropped output past ~3 probes due to streaming-buffer behaviour of
ephemeral containers, and kubectl debug --attach=true does NOT
propagate the inner shell's exit code, so the script embeds a
sentinel `__PROBE_EXIT=$?` in stdout and parses it back.
Adds two block-expectation rows that catch the case where the world
allow rule is widened past TCP 443 + 5432:

  example.com:80   — Cloudflare actually serves HTTP there
  github.com:22    — GitHub's real SSH endpoint

Targets chosen so the destination port is genuinely listening when there
is no policy in place. A blocked outcome against a host that wouldn't
have answered anyway proves nothing about the policy; both targets here
flip from reachable to blocked exactly because of Cilium's port scoping.

Validated on mw-dev: with the policy applied, both rows PASS as blocked
alongside the existing IMDS / kube-apiserver / cross-worker checks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant