Skip to content

fix(ci): bump workspace-test timeout 55→75min — followup to per-run P0 fix#1704

Merged
noahgift merged 4 commits into
mainfrom
fix/ci-timeout-bump-75min
May 15, 2026
Merged

fix(ci): bump workspace-test timeout 55→75min — followup to per-run P0 fix#1704
noahgift merged 4 commits into
mainfrom
fix/ci-timeout-bump-75min

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Andon followup

The P0 per-run target-dir fix (#1693, 2026-05-15) eliminated
cargo-incremental cross-run warmth. Cold compiles now happen on every
run; sccache covers codegen (~80% hit rate on warm cache) but cargo's
metadata + linking + test binaries still cost ~40-50 min cold.

Under runner-pool saturation (7+ concurrent CI runs observed today),
the previous 55min timeout was exactly hit by 5 simultaneously-rebasing
PRs (run 25919246467 and siblings — all timed out at 55:00.0 with no
real test failure).

Fix

Bumps:
workspace-test step: 55min → 75min
workspace-test job: 65min → 85min (10min overhead headroom)

Trade-off

A genuinely stuck run now eats up to 75min of runner time instead of
55min. Acceptable — we have 16 self-hosted runners and the cost of a
timeout-false-alarm cascade (5 PRs simultaneously red) is much higher
than the cost of one extra 20min of waiting for a truly hung run.

Why this is needed now

The per-run target-dir fix is correct (no more cancel-corrupt-state
race), but it shifted cargo into a perpetually-cold mode for the
incremental layer. The old 55min budget assumed cross-run warmth that
no longer exists. This bump rebases the budget against reality.

Refs Toyota Way: don't ignore the line stoppage — extend the time
window to match the new cold-compile reality.

🤖 Generated with Claude Code

…0 fix

The P0 per-run target-dir fix (#1693, 2026-05-15) eliminated
cargo-incremental cross-run warmth. Cold compiles now happen on every
run; sccache covers codegen (~80% hit rate on warm cache) but cargo's
metadata + linking + test binaries still cost ~40-50 min cold.

Under runner-pool saturation (7+ concurrent CI runs observed today),
the previous 55min timeout was exactly hit by 5 simultaneously-rebasing
PRs (run 25919246467 and siblings — all timed out at 55:00.0 with no
real test failure).

Bumps:
  workspace-test step: 55min → 75min
  workspace-test job:  65min → 85min  (10min overhead headroom)

Trade-off: a genuinely stuck run now eats up to 75min of runner time
instead of 55min. Acceptable — we have 16 self-hosted runners (per
memory `reference_self_hosted_runner_disk_guard.md`) and the cost of
a timeout-false-alarm cascade (5 PRs simultaneously red) is much higher
than the cost of one extra 20min of waiting for a truly hung run.

Refs Toyota Way: don't ignore the line stoppage — extend the time
window to match the new cold-compile reality.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) May 15, 2026 14:05
@noahgift noahgift merged commit 2ceedf3 into main May 15, 2026
10 checks passed
@noahgift noahgift deleted the fix/ci-timeout-bump-75min branch May 15, 2026 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant