Harden Playground snippet-fetch retry against transient failures#1690
Open
bghgary wants to merge 2 commits intoBabylonJS:masterfrom
Open
Harden Playground snippet-fetch retry against transient failures#1690bghgary wants to merge 2 commits intoBabylonJS:masterfrom
bghgary wants to merge 2 commits intoBabylonJS:masterfrom
Conversation
The Win32_x64_D3D11 Playground job fails intermittently with five identical "[Error] SyntaxError: JSON.parse Error: Unexpected input at position:0" lines followed by "[Log] Running the playground failed." on commits that don't touch playground code. Root cause: validation_native.js fetches each test's snippet from https://snippet.babylonjs.com/<id>/<rev> and unconditionally calls JSON.parse(xmlHttp.responseText) on readyState === 4, ignoring xmlHttp.status. When the snippet service returns a transient error (5xx, 429, gateway timeout, empty body), the parse fails and falls through to the catch which calls onError. The retry policy is maxRetry=5 with a fixed 500ms delay -- a 2-second total budget that cannot ride out a normal CDN/upstream blip. Three changes: 1. Check xmlHttp.status === 200 before parsing. Non-200 responses are logged with the status code and the playground id, then routed to onError instead of bubbling up as a misleading SyntaxError. 2. Increase maxRetry from 5 to 8. 3. Replace the fixed 500ms delay with exponential backoff capped at 30 seconds (500ms, 1s, 2s, 4s, 8s, 16s, 30s). Total budget grows from ~2s to ~60s, which is sufficient to ride out typical service blips without changing the eventual fail-fast behavior on persistent outages. Validates against the canonical snippet loader in BabylonJS/Babylon.js (packages/tools/snippetLoader/src/fetchSnippet.ts) which also checks response.ok before calling response.json(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Improves the robustness of BabylonNative’s Playground visual validation runner by hardening the snippet-fetch path against transient HTTP failures from snippet.babylonjs.com, reducing flaky CI failures unrelated to code changes.
Changes:
- Add an HTTP status check before attempting to parse the snippet response as JSON.
- Increase retry attempts (5 → 8) and implement exponential backoff (capped at 30s).
- Improve the final failure log message to reflect the number of attempts.
The readystatechange listener is registered via addEventListener, not via the onreadystatechange property -- those are separate slots in the XHR API. Setting the property to null does not detach the listener, just as the reviewer observed. Spec also guarantees readystatechange fires only once on transition to DONE, so the line was a misleading no-op. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ryantrem
approved these changes
May 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
The
Win32_x64_D3D11 / buildPlayground job in BabylonNative CI has been failing intermittently with this exact pattern, on commits that don't touch playground code:Five identical errors →
maxRetry=5retries are all blowing up on the same parse step.Root cause
Apps/Playground/Scripts/validation_native.js:178-247fetches each test's snippet fromhttps://snippet.babylonjs.com/<id>/<rev>and unconditionally callsJSON.parse(xmlHttp.responseText)onreadyState === 4, without checkingxmlHttp.status. When the snippet service returns a transient error (503, 429, gateway timeout, empty body),responseTextis empty or HTML and the parse fails with "Unexpected input at position:0". The catch path correctly retries, but the retry policy is too weak:maxRetry = 5,retryTime = 500ms, no backoff.Two seconds is nowhere near enough to ride out a normal CDN/upstream blip (typically 10–30s).
Change
Three small fixes to the snippet-fetch retry path:
Status check before parse. Check
xmlHttp.status === 200before callingJSON.parse. Non-200 responses are logged with the status code and the playground id, then routed toonErrorinstead of bubbling up as a misleadingSyntaxError.More retries. Increase
maxRetryfrom 5 → 8.Exponential backoff. Replace the fixed 500ms delay with
min(500ms × 2^(retry-1), 30s). Schedule:Total budget grows from ~2s to ~60s. Persistent outages still fail-fast (just at ~60s instead of ~2s); transient blips now have room to recover.
Validation against upstream
The canonical snippet loader in BabylonJS/Babylon.js —
packages/tools/snippetLoader/src/fetchSnippet.ts— also checksresponse.okbefore callingresponse.json(). The status-check approach is consistent with the upstream pattern.Scope
Targeted at the
playgroundIdretry path (validation_native.js:152-259) which is the only one we have evidence of flaking. ThescriptToRunbranch andBABYLON.Tools.LoadFilecalls are unchanged — they have their own (retry-less) error handling that can be hardened in a follow-up if they ever flake.Observed failure example
CI run on PR #1687, commit
d3cdfeab(a comment-only change to a different test file): job 25524000188 failed with the snippet-parse-error pattern above. Re-run passed without any code change.[Created by Copilot on behalf of @bghgary]