ci: pre-build chown of per-RUN target dir to fix root-owned bind mount#1798
Merged
Conversation
Adds a chown step BEFORE the cargo step that runs `docker run --rm` as root and chowns the per-RUN target dir + cargo registry to noah:1000. ## Why Docker's bind-mount creates missing host directories with the daemon's uid (root). Since #1693 switched to per-RUN target dirs (`/mnt/nvme-raid0/targets/aprender-ci/<PR>/run-<RUN_ID>`), every fresh run gets a root-owned target dir. Cargo (running as uid 1000 inside the container) cannot write to it and fails with: error: failed to create directory `/workspace/target/debug`: No such file or directory (os error 2) The existing post-job chown (line 245) was meant to fix this for the NEXT run's git-clean — but per-RUN paths invalidate that since each run gets a brand-new root-owned dir. First-runs always fail. This was observed across 6+ in-flight PRs (#1784, #1791-#1797) on 2026-05-18 — every "infrastructure flake" turned out to be the same ownership bug at different cargo entry points. ## Fix Pre-cargo chown step. Idempotent (`|| true`). Runs the existing sovereign-ci image as root for the chown, then exits — adds maybe 2s to runs. Matches the pattern of the post-job chown step that already exists; just moves it to BEFORE cargo as well. ## Manual one-shot The 6 currently-stuck PRs were unblocked by manually chowning their per-RUN dirs on the runner host: ssh intel sudo chown -R 1000:1000 \ /mnt/nvme-raid0/targets/aprender-ci/{1792,1793,1794,1796,1797,main}/run-* After this PR lands, future runs will fix themselves. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a chown step BEFORE the cargo step in `workspace-test` that runs the sovereign-ci image as root and chowns the per-RUN target dir (`/mnt/nvme-raid0/targets/aprender-ci//run-<RUN_ID>`) and cargo registry to noah:1000.
Why — five-whys
Empirical (2026-05-18 lambda-vector ssh intel)
Inspection showed 6 in-flight runs hit the same error simultaneously across unrelated PRs:
```
ROOT-OWNED: /mnt/nvme-raid0/targets/aprender-ci/1792/run-26040118649 (root:root)
ROOT-OWNED: /mnt/nvme-raid0/targets/aprender-ci/1793/run-26040120976 (root:root)
ROOT-OWNED: /mnt/nvme-raid0/targets/aprender-ci/1794/run-26040155236 (root:root)
ROOT-OWNED: /mnt/nvme-raid0/targets/aprender-ci/1796/run-26040126733 (root:root)
ROOT-OWNED: /mnt/nvme-raid0/targets/aprender-ci/1797/run-26040155057 (root:root)
ROOT-OWNED: /mnt/nvme-raid0/targets/aprender-ci/main/run-26040057475 (root:root)
```
All resolved immediately after manual `sudo chown -R 1000:1000` on the runner host. Disk space and inode counts were fine (67% used / 22% inodes); this was pure ownership.
This was misdiagnosed as a transient infra flake across the day's session — every "workspace-test failure" turned out to be the same root cause at different cargo entry points (lint, test, coverage, gate all error in the same way).
Fix
A new step BEFORE `Workspace lib tests`:
```yaml
run: |
docker run --rm \
-v "/mnt/nvme-raid0/targets/aprender-ci/${PR_OR_REF}/run-${GITHUB_RUN_ID}:/workspace/target" \
-v "/mnt/nvme-raid0/cargo-ci/registry/${PR_OR_REF}:/usr/local/cargo/registry" \
"$IMAGE" \
bash -c 'chown -R 1000:1000 /workspace/target /usr/local/cargo/registry 2>/dev/null || true'
```
Idempotent (`|| true` covers reruns where the dir is already noah-owned). Adds ~2s per run since chown of an empty fresh dir is fast.
Test plan
Cross-refs
🤖 Generated with Claude Code