Skip to content

ci(sovereign): bump test/lint/coverage timeout 30→60 min (PMAT-159)#30

Merged
noahgift merged 3 commits into
mainfrom
chore/bump-test-timeout-pmat-159
Apr 20, 2026
Merged

ci(sovereign): bump test/lint/coverage timeout 30→60 min (PMAT-159)#30
noahgift merged 3 commits into
mainfrom
chore/bump-test-timeout-pmat-159

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Summary

First aprender#934 canary of test_workspace: true (#29) timed out at exactly 30:28 on the test job. The container has to compile 60+ crates cold before running 25k+ lib tests.

Default --lib callers finish in under 5 min, so the extra headroom is invisible to them — the ceiling only binds for workspace-mode callers on large workspaces.

Changes

  • test, lint, and coverage self-hosted jobs: timeout-minutes: 3060
  • Rationale comment added at each site so the reason survives the next edit

Why all three, not just test

Lint also benefits if clippy ever hits a large workspace (it already runs --all-targets). Keeping them in lock-step avoids surprise timeouts when one job's scope grows.

Test plan

  • YAML validates (paiml/.github CI)
  • After merge: aprender#934 rerun — expect ci/test to finish (observed ~35-40 min test+compile wall-clock)
  • F11 daily snapshot next tick — aprender p95 should drop into the useful-signal band

Refs paiml/infra#33 · paiml/aprender#934

F11 falsifier blind-spot from PMAT-155 investigation (paiml/infra#70):
cargo nextest run --lib at repo root only tests the root package, leaving
workspace-member libs silent.

Per-pilot impact (pre-fix):
  copia    - 227 tests (valid, single-crate repo)
  bashrs   - 5 tests   (root only; specs/runtime/oracle/wasm silent)
  aprender - 0 tests   (root lib.rs is a stub; all 60 workspace crates silent)

Fix: primary invocation now cargo nextest run --workspace --lib TEST_ARGS.
Coverage updated to match. The -p REPO_NAME fallback is retained for harness
quirks. Callers that need to skip workspace members (e.g. aprender's GPU crates)
pass test_args: --exclude X --exclude Y.

Blast radius: every repo's ci / test and ci / coverage will start running
workspace-member lib tests that were previously silent. May surface real bugs.

Recommended canary: merge, watch copia (no workspace effect), bashrs (4 new
members surface), aprender (requires test_args exclusions first).

Refs paiml/infra#70, PMAT-155, PMAT-159.
The initial PMAT-159 change force-switched every caller to `--workspace --lib`.
That breaks any repo whose workspace members don't build in the sovereign-ci
container (e.g. aprender-gpu needs cuBLAS, aprender-cuda-edge needs CUDA).

Switch to an opt-in `test_workspace` input (default false → current behavior).
Callers that want workspace-wide coverage pair it with `test_args` exclusions:

    with:
      test_workspace: true
      test_args: "--exclude aprender-gpu --exclude aprender-cuda-edge"

Refs paiml/infra#33
First aprender#934 canary of `test_workspace: true` timed out at exactly
30:28 on the test job — container has to compile 60+ crates cold before
testing 25k+ lib tests. Default --lib callers finish in under 5 min, so
the extra headroom is invisible to them; it only binds for large workspaces.

Applied to all three 30-min jobs (test, lint, coverage) for consistency —
lint also benefits if clippy ever hits a large workspace.

Refs paiml/infra#33 · paiml/aprender#934
@noahgift noahgift merged commit aae01f0 into main Apr 20, 2026
2 checks passed
@noahgift noahgift deleted the chore/bump-test-timeout-pmat-159 branch April 20, 2026 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant