fix(ci): bump workspace-test timeout 55→75min — followup to per-run P0 fix#1704
Merged
Conversation
…0 fix The P0 per-run target-dir fix (#1693, 2026-05-15) eliminated cargo-incremental cross-run warmth. Cold compiles now happen on every run; sccache covers codegen (~80% hit rate on warm cache) but cargo's metadata + linking + test binaries still cost ~40-50 min cold. Under runner-pool saturation (7+ concurrent CI runs observed today), the previous 55min timeout was exactly hit by 5 simultaneously-rebasing PRs (run 25919246467 and siblings — all timed out at 55:00.0 with no real test failure). Bumps: workspace-test step: 55min → 75min workspace-test job: 65min → 85min (10min overhead headroom) Trade-off: a genuinely stuck run now eats up to 75min of runner time instead of 55min. Acceptable — we have 16 self-hosted runners (per memory `reference_self_hosted_runner_disk_guard.md`) and the cost of a timeout-false-alarm cascade (5 PRs simultaneously red) is much higher than the cost of one extra 20min of waiting for a truly hung run. Refs Toyota Way: don't ignore the line stoppage — extend the time window to match the new cold-compile reality. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Andon followup
The P0 per-run target-dir fix (#1693, 2026-05-15) eliminated
cargo-incremental cross-run warmth. Cold compiles now happen on every
run; sccache covers codegen (~80% hit rate on warm cache) but cargo's
metadata + linking + test binaries still cost ~40-50 min cold.
Under runner-pool saturation (7+ concurrent CI runs observed today),
the previous 55min timeout was exactly hit by 5 simultaneously-rebasing
PRs (run 25919246467 and siblings — all timed out at 55:00.0 with no
real test failure).
Fix
Bumps:
workspace-test step: 55min → 75min
workspace-test job: 65min → 85min (10min overhead headroom)
Trade-off
A genuinely stuck run now eats up to 75min of runner time instead of
55min. Acceptable — we have 16 self-hosted runners and the cost of a
timeout-false-alarm cascade (5 PRs simultaneously red) is much higher
than the cost of one extra 20min of waiting for a truly hung run.
Why this is needed now
The per-run target-dir fix is correct (no more cancel-corrupt-state
race), but it shifted cargo into a perpetually-cold mode for the
incremental layer. The old 55min budget assumed cross-run warmth that
no longer exists. This bump rebases the budget against reality.
Refs Toyota Way: don't ignore the line stoppage — extend the time
window to match the new cold-compile reality.
🤖 Generated with Claude Code