ci: Retry host shard on Vite dynamic-import flakes too#4706
Conversation
The shard-retry block only matched `ChunkLoadError` (webpack-era), so Vite's `Failed to fetch dynamically imported module` failures (e.g. the `monaco.contribution-*.js` chunk dropping mid-fetch) bypassed the retry and failed the shard outright. Broadens the trigger to all three transient signatures, prints the matching lines so the cause is visible in the run log, and emits a clear recovered-vs-persistent banner after the retry so a second-run failure reads as deterministic rather than another flake. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3953df21cc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Without this, a shard that fails on chunk-fetch and recovers on retry would still publish memory data extracted from the failed first run's log — the downstream `Extract memory report` step reads `/tmp/test-output.log` only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the host-test CI shard retry logic in .github/workflows/ci-host.yaml so that it also retries on Vite-era dynamic import / chunk-fetch flake signatures (not just the older webpack ChunkLoadError), and surfaces useful diagnostics directly in the Actions log.
Changes:
- Expand the retry trigger to match
ChunkLoadError,Failed to fetch dynamically imported module, andNetworkError when attempting to fetch resource. - Print the first 5 matching lines in a grouped log section to make the flake cause visible in the job output.
- On retry success, promote the retry log to
/tmp/test-output.logso downstream parsing uses the successful run; on retry failure, emit explicit “failed twice” messaging and (if applicable) show retry-run matches.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Host Test Results 1 files 1 suites 1h 46m 35s ⏱️ Results for commit 0fa718f. |
backspace
left a comment
There was a problem hiding this comment.
I wish we could add resilience that would let us retry loading vs having to restart the whole job (which breaks Percy), but getting back to what we had with Webpack is baseline at least
|
Thanks for the approve. Agree on the longer-term shape — an in-process retry around the dynamic |
|
lol, sorry my claude PR comment monitoring is very eager |
Summary
The host-shard CI retry only matched
ChunkLoadError, the webpack-era flake signature. After the host build moved to Vite, the same transient class of failure surfaces asFailed to fetch dynamically imported module(and occasionallyNetworkError when attempting to fetch resource), which never tripped the retry and instead failed the shard outright.A recent example: Host Tests (4, 20) failed on
monaco.contribution-DyRU9U2K.jsfailing to fetch dynamically. The grep forChunkLoadErrorreturned zero matches, the retry skipped, the shard turned red.Changes
ChunkLoadErrorto a regex covering all three transient chunk-fetch signatures.::notice::"recovered on retry" when the shard turns green — confirms the original failure was a flake.::error::"failed twice" when both runs fail — flags the failure as deterministic so investigators don't waste time treating it as flake. If the retry hit the same transient pattern again, that's also called out separately as a possible serving-side issue worth investigating.The realm-shim layer already enumerates these same signatures for its in-process retry (
packages/runtime-common/package-shim-handler.ts). This brings the CI shard-retry layer into alignment.Test plan
python3 -c "import yaml; yaml.safe_load(open('.github/workflows/ci-host.yaml'))")bash -non extracted step)Failed to fetch dynamically imported module, the retry triggers and the new diagnostic lines appear in the run log