Skip to content

fix(sovereign-ci): chown sibling workspace back from root on container exit#28

Merged
noahgift merged 1 commit into
mainfrom
fix/sovereign-ci-root-workspace-poison
Apr 20, 2026
Merged

fix(sovereign-ci): chown sibling workspace back from root on container exit#28
noahgift merged 1 commit into
mainfrom
fix/sovereign-ci-root-workspace-poison

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Summary

  • Every container job now chowns \$GITHUB_WORKSPACE/.. back to uid 1000 on exit (if: always()), preventing root-owned files from leaking onto the runner host via bind mounts.
  • Security job gets a defense-in-depth sudo chown pre-step to recover from pre-fix residue.

Root cause (Five Whys — paiml/infra#69)

  1. PR #104 security job fails in 15s with rm: cannot remove 'aprender/.../*.rs': Permission denied.
  2. Those files are owned by root:root; security runs as the runner user and can't rewrite them.
  3. Root-owned files come from test/lint/coverage/bench — they run in a container with uid 0.
  4. The sibling-checkout step (cd \$GITHUB_WORKSPACE/..) writes into a bind-mounted host dir, so root ownership lands on the runner host, not just the container.
  5. Subsequent non-container jobs — and the container jobs' own rm -rf cleanup — can't reclaim the tree. Result: silent 15s failures that recur on every PR.

Scope

Reusable workflow used by ~38 repos (course-studio, rmedia, aprender, bashrs, …). After merge, every job run gets the cleanup automatically; no per-repo change needed.

Immediate unblock

Before this patch lands, manually chowned 83,838 root-owned files across the 16 runners on intel to unblock existing PRs. This patch prevents re-accumulation.

Test plan

  • Merge
  • Kick rmedia PR #104 security job — should now pass
  • Kick one course-studio PR — confirm no regression
  • Watch next 10 container-job runs on intel — no new root-owned files in /home/noah/data/actions-runner*/_work/

🤖 Generated with Claude Code

…r exit

Root cause (Five Whys — paiml/infra#69):

1. Security job fails in 15s with rm: EACCES on aprender/**/.git and
   aprender-present-widgets/src/*.rs in sibling workspace.
2. Files are owned by root:root; the security job runs as the runner
   user (uid 1000) and can't rewrite them.
3. Root-owned files come from the test/lint/coverage/bench jobs — those
   run inside a container whose process uid is 0.
4. The container mounts $GITHUB_WORKSPACE (the runner's _work tree) and
   the sibling checkout writes to $GITHUB_WORKSPACE/.., which is also
   bind-mounted via the host filesystem. Files written by root inside
   the container land on the host as root-owned.
5. Subsequent non-container jobs (security) — and every *future* run
   of the container jobs themselves, which start by `rm -rf`'ing the
   stale clones — cannot reclaim the tree.

Fix: every container job chowns $GITHUB_WORKSPACE/.. back to uid 1000
in an `if: always()` tail step, so root-owned files never escape the
container. Security job gets a defense-in-depth `sudo chown` at the
top to recover from any pre-fix residue.

Manually chowned 83838 files across the 16 runners on intel to unblock
existing PRs; this patch prevents the poison from re-accumulating.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit 5105268 into main Apr 20, 2026
2 checks passed
@noahgift noahgift deleted the fix/sovereign-ci-root-workspace-poison branch April 20, 2026 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant