Skip to content

ci+fix: opt into --workspace --lib + detached-HEAD branch fallback (PMAT-159)#934

Closed
noahgift wants to merge 8 commits into
mainfrom
ci/test-args-exclude-gpu-pmat-159
Closed

ci+fix: opt into --workspace --lib + detached-HEAD branch fallback (PMAT-159)#934
noahgift wants to merge 8 commits into
mainfrom
ci/test-args-exclude-gpu-pmat-159

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

@noahgift noahgift commented Apr 20, 2026

Summary

PMAT-159 (paiml/infra#33): the reusable sovereign-ci workflow historically ran cargo nextest run --lib (root package only). For aprender — where lib.rs is a stub — that meant ci/test ran 0 tests while F11 measured compile/cache-fetch latency and called it signal.

paiml/.github#29 added an opt-in test_workspace: true input. This PR opts aprender into that flag so ci/test actually exercises workspace-member lib tests (the interesting ~25k-test suite).

Changes

  1. .github/workflows/ci.yml: set test_workspace: true + pair with existing test_args exclusions (aprender-gpu / cuda-edge / compute don't build in the sovereign-ci container).
  2. crates/aprender-orchestrate/src/oracle/local_workspace.rs: get_git_status now falls back to HEAD@<short-sha> when git branch --show-current returns empty. Surfaced by the first canary run — actions/checkout leaves the container workspace on detached HEAD, so !branch.is_empty() (and downstream consumers) needed a non-empty fallback.

Five-whys (branch fallback)

  1. Why did test_get_git_status_current_repo fail? — assertion failed: !status.branch.is_empty().
  2. Why was status.branch empty? — git branch --show-current emits "" on detached HEAD.
  3. Why is the workspace on detached HEAD? — actions/checkout default checkout style.
  4. Why didn't the existing unwrap_or_else("unknown") fire? — Ok("") maps to Some(""), not None; the .unwrap_or_else only triggers on I/O errors.
  5. Why did the bug hide until now? — the test lived in a workspace-member crate that ci/test never ran because the workflow used --lib (root) not --workspace --lib.

Test plan

  • Local: cargo test -p aprender-orchestrate --lib oracle::local_workspace passes
  • Local: same test passes with workspace detached via git checkout --detach HEAD
  • CI: ci/test runs with workspace-member tests, not 0 (will be visible in job log: TEST_SCOPE: --workspace --lib)
  • CI: all 25k+ lib tests pass (canary re-run triggered by the fix push)

Refs paiml/infra#33 · paiml/.github#29

@noahgift noahgift enabled auto-merge (squash) April 20, 2026 13:32
PMAT-155 investigation (paiml/infra#70) found the sovereign-ci.yml reusable
workflow runs `cargo nextest run --lib` at the repo root, which only tests
the root package. For aprender, the root lib.rs is a stub (added in #19) so
ci/test runs 0 tests — all 60 workspace-crate libs are silent.

paiml/.github#29 switches the reusable workflow's primary invocation to
`cargo nextest run --workspace --lib $TEST_ARGS`. When that merges, aprender's
ci/test will try to compile workspace members that don't build in the
sovereign-ci container:

  - aprender-gpu (cuBLAS not present)
  - aprender-cuda-edge (CUDA toolchain not present)
  - aprender-compute (SIGSEGV at test-harness exit; workspace-test handles
    this crate specially with a grep-based pass check)

This commit pre-stages `test_args` on the sovereign-ci call so ci/test excludes
those three crates, matching the workspace-test job's current exclusions. Safe
to land BEFORE paiml/.github#29 — while the reusable workflow still uses --lib
(root only), the --exclude flags are no-ops on a 0-test run.

Refs PMAT-155, PMAT-159, paiml/.github#29, paiml/infra#70.
@noahgift noahgift force-pushed the ci/test-args-exclude-gpu-pmat-159 branch from 3635cbf to ffd7ed5 Compare April 20, 2026 13:57
paiml/.github#29 merged an opt-in test_workspace input that switches the
reusable workflow's test invocation from `cargo nextest run --lib` (root
only) to `cargo nextest run --workspace --lib`. Without this, aprender's
workspace-member lib tests (the interesting suite) never run in ci/test.

Pair it with the existing exclusions so aprender-gpu/cuda-edge/compute
are skipped — they don't build in the sovereign-ci container (cuBLAS /
CUDA / SIGSEGV at exit). The workspace-test job below still covers them
on GPU-ready hosts.

Refs paiml/infra#33
…d HEAD

Found by the PMAT-159 canary: with the reusable workflow now exercising
`--workspace --lib`, aprender-orchestrate's test_get_git_status_current_repo
ran in the sovereign-ci container and failed because `actions/checkout`
leaves the workspace on a detached HEAD. `git branch --show-current`
emits an empty string in that state, so `status.branch` was "".

`unwrap_or_else("unknown")` never fired because `Ok("")` still maps to
`Some("")`, not None.

Fall back to `git rev-parse --short HEAD` formatted as `HEAD@<sha>` when
the branch lookup yields empty. Keeps the "unknown" sentinel for the
no-git-dir case. Existing test assertions (`!branch.is_empty()`,
`branch != "unknown"`) now hold in both interactive and CI contexts.

Verified locally with and without detached HEAD.

Refs paiml/infra#33
@noahgift noahgift changed the title ci: pre-stage test_args exclusions for paiml/.github#29 (PMAT-159) ci+fix: opt into --workspace --lib + detached-HEAD branch fallback (PMAT-159) Apr 20, 2026
noahgift added a commit to paiml/.github that referenced this pull request Apr 20, 2026
)

* ci(sovereign): cargo test --workspace --lib (PMAT-159)

F11 falsifier blind-spot from PMAT-155 investigation (paiml/infra#70):
cargo nextest run --lib at repo root only tests the root package, leaving
workspace-member libs silent.

Per-pilot impact (pre-fix):
  copia    - 227 tests (valid, single-crate repo)
  bashrs   - 5 tests   (root only; specs/runtime/oracle/wasm silent)
  aprender - 0 tests   (root lib.rs is a stub; all 60 workspace crates silent)

Fix: primary invocation now cargo nextest run --workspace --lib TEST_ARGS.
Coverage updated to match. The -p REPO_NAME fallback is retained for harness
quirks. Callers that need to skip workspace members (e.g. aprender's GPU crates)
pass test_args: --exclude X --exclude Y.

Blast radius: every repo's ci / test and ci / coverage will start running
workspace-member lib tests that were previously silent. May surface real bugs.

Recommended canary: merge, watch copia (no workspace effect), bashrs (4 new
members surface), aprender (requires test_args exclusions first).

Refs paiml/infra#70, PMAT-155, PMAT-159.

* ci(sovereign): gate --workspace --lib behind opt-in test_workspace input

The initial PMAT-159 change force-switched every caller to `--workspace --lib`.
That breaks any repo whose workspace members don't build in the sovereign-ci
container (e.g. aprender-gpu needs cuBLAS, aprender-cuda-edge needs CUDA).

Switch to an opt-in `test_workspace` input (default false → current behavior).
Callers that want workspace-wide coverage pair it with `test_args` exclusions:

    with:
      test_workspace: true
      test_args: "--exclude aprender-gpu --exclude aprender-cuda-edge"

Refs paiml/infra#33

* ci(sovereign): bump test/lint/coverage timeout 30→60 min (PMAT-159)

First aprender#934 canary of `test_workspace: true` timed out at exactly
30:28 on the test job — container has to compile 60+ crates cold before
testing 25k+ lib tests. Default --lib callers finish in under 5 min, so
the extra headroom is invisible to them; it only binds for large workspaces.

Applied to all three 30-min jobs (test, lint, coverage) for consistency —
lint also benefits if clippy ever hits a large workspace.

Refs paiml/infra#33 · paiml/aprender#934
noahgift and others added 5 commits April 20, 2026 21:15
…4-19 (Refs #934)

Flip enable_sccache: false → true. The "missing wrapper" disablement from
2026-04-19 is stale — paiml/infra#66 shipped the exec-script shim in
sovereign-ci:stable the same day, and the fleet default is now true (PMAT-061).

Diagnostic: run 24685501433 (test_workspace: true, sccache: false) hit the
60-min ceiling on both ci/test and ci/coverage. Cold compile of 60+ APR-MONO
crates × 3 concurrent jobs × jobserver oversubscription × llvm-cov
instrumentation on coverage exceeds 60 min without a warm sccache.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…der (Refs #934)

PMAT-159's test_workspace opt-in duplicates what workspace-test already
does (same --workspace --lib + GPU excludes). Running both on the same
intel host triples cargo jobserver pressure (cargo #12912) and cost
~30 min of redundant compile per PR.

F11 falsifier gets a per-repo override in infra so aprender's measurement
points at `workspace-test` instead of `ci / test`. ci/test here stays as
a root-stub gate presence (0 tests, green) so org ruleset contexts remain
populated; no behavior regression.

Keeps:
- enable_sccache: true (independent PMAT-061 fix, fleet default)
- aprender-orchestrate detached-HEAD fallback (real bug from actions/checkout)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift
Copy link
Copy Markdown
Contributor Author

Triaged stale (last touched 2026-04). CI fix from 2026-04 — sccache+per-PR target dir replaced its concerns; ci.yml diverged significantly. Closing as superseded.

@noahgift noahgift closed this May 12, 2026
auto-merge was automatically disabled May 12, 2026 15:59

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant