Skip to content

ci: Retry host shard on Vite dynamic-import flakes too#4706

Merged
habdelra merged 2 commits intomainfrom
ci-retry-vite-dynamic-import-flake
May 7, 2026
Merged

ci: Retry host shard on Vite dynamic-import flakes too#4706
habdelra merged 2 commits intomainfrom
ci-retry-vite-dynamic-import-flake

Conversation

@habdelra
Copy link
Copy Markdown
Contributor

@habdelra habdelra commented May 7, 2026

Summary

The host-shard CI retry only matched ChunkLoadError, the webpack-era flake signature. After the host build moved to Vite, the same transient class of failure surfaces as Failed to fetch dynamically imported module (and occasionally NetworkError when attempting to fetch resource), which never tripped the retry and instead failed the shard outright.

A recent example: Host Tests (4, 20) failed on monaco.contribution-DyRU9U2K.js failing to fetch dynamically. The grep for ChunkLoadError returned zero matches, the retry skipped, the shard turned red.

Changes

  • Broaden the retry trigger from ChunkLoadError to a regex covering all three transient chunk-fetch signatures.
  • Print the first 5 matched lines so the cause shows up directly in the GitHub Actions run log instead of buried in the per-shard testem log artifact.
  • Tee the retry attempt's output and emit one of two banners after the retry:
    • ::notice:: "recovered on retry" when the shard turns green — confirms the original failure was a flake.
    • ::error:: "failed twice" when both runs fail — flags the failure as deterministic so investigators don't waste time treating it as flake. If the retry hit the same transient pattern again, that's also called out separately as a possible serving-side issue worth investigating.

The realm-shim layer already enumerates these same signatures for its in-process retry (packages/runtime-common/package-shim-handler.ts). This brings the CI shard-retry layer into alignment.

Test plan

  • YAML parses (python3 -c "import yaml; yaml.safe_load(open('.github/workflows/ci-host.yaml'))")
  • Inline shell parses (bash -n on extracted step)
  • CI green
  • On the next host shard that hits Failed to fetch dynamically imported module, the retry triggers and the new diagnostic lines appear in the run log

The shard-retry block only matched `ChunkLoadError` (webpack-era), so
Vite's `Failed to fetch dynamically imported module` failures (e.g. the
`monaco.contribution-*.js` chunk dropping mid-fetch) bypassed the retry
and failed the shard outright.

Broadens the trigger to all three transient signatures, prints the
matching lines so the cause is visible in the run log, and emits a
clear recovered-vs-persistent banner after the retry so a second-run
failure reads as deterministic rather than another flake.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3953df21cc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/ci-host.yaml
Without this, a shard that fails on chunk-fetch and recovers on retry
would still publish memory data extracted from the failed first run's
log — the downstream `Extract memory report` step reads
`/tmp/test-output.log` only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@habdelra habdelra requested review from a team and Copilot May 7, 2026 12:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the host-test CI shard retry logic in .github/workflows/ci-host.yaml so that it also retries on Vite-era dynamic import / chunk-fetch flake signatures (not just the older webpack ChunkLoadError), and surfaces useful diagnostics directly in the Actions log.

Changes:

  • Expand the retry trigger to match ChunkLoadError, Failed to fetch dynamically imported module, and NetworkError when attempting to fetch resource.
  • Print the first 5 matching lines in a grouped log section to make the flake cause visible in the job output.
  • On retry success, promote the retry log to /tmp/test-output.log so downstream parsing uses the successful run; on retry failure, emit explicit “failed twice” messaging and (if applicable) show retry-run matches.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Host Test Results

    1 files      1 suites   1h 46m 35s ⏱️
2 634 tests 2 619 ✅ 15 💤 0 ❌
2 653 runs  2 638 ✅ 15 💤 0 ❌

Results for commit 0fa718f.

Copy link
Copy Markdown
Contributor

@backspace backspace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we could add resilience that would let us retry loading vs having to restart the whole job (which breaks Percy), but getting back to what we had with Webpack is baseline at least

@habdelra
Copy link
Copy Markdown
Contributor Author

habdelra commented May 7, 2026

Thanks for the approve. Agree on the longer-term shape — an in-process retry around the dynamic import('monaco-editor') (and any other lazy chunks) would mirror what package-shim-handler.ts's withResolveRetry already does for the realm shim layer (#4531) and would dodge the Percy / wall-clock cost of restarting the whole shard. Happy to file a follow-up branch that lifts that helper out and applies it here; this PR just restores the webpack-era baseline so the bleeding stops.

@habdelra habdelra merged commit cb409e9 into main May 7, 2026
58 checks passed
@habdelra
Copy link
Copy Markdown
Contributor Author

habdelra commented May 7, 2026

lol, sorry my claude PR comment monitoring is very eager

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants