Reworked threading model.#1652
Open
bkaradzic-microsoft wants to merge 22 commits intomasterfrom
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR reworks the graphics threading/synchronization model by replacing per-thread encoders and the SafeTimespan/UpdateToken mechanism with a single per-frame bgfx encoder owned by DeviceImpl and a new FrameCompletionScope gate/counter for cross-thread frame completion synchronization.
Changes:
- Removed
SafeTimespanGuarantor,UpdateToken/Update, and per-thread encoder management; introduced a single frame encoder lifecycle inDeviceImpl. - Added
FrameCompletionScope+ frame-start/before-render/after-render scheduling to coordinate JS-thread work withbgfx::frame(). - Updated NativeEngine/NativeXR/Canvas/FrameBuffer APIs to use
GetActiveEncoder()and internal encoder acquisition patterns instead of passing encoders through call chains.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| Core/Graphics/Source/DeviceImpl.h | Adds single-frame encoder pointer and frame gate (mutex/CV + pending scope count). |
| Core/Graphics/Source/DeviceImpl.cpp | Implements frame open/close sequencing, scope blocking, and frame-start dispatcher tick. |
| Core/Graphics/InternalInclude/Babylon/Graphics/DeviceContext.h | Replaces Update/UpdateToken with FrameCompletionScope, adds frame-start scheduler + encoder accessors. |
| Core/Graphics/Source/DeviceContext.cpp | Implements FrameCompletionScope RAII and forwards new scheduling/encoder methods. |
| Core/Graphics/Include/Shared/Babylon/Graphics/Device.h | Converts DeviceUpdate into a no-op compatibility shim and inlines GetUpdate. |
| Core/Graphics/Source/Device.cpp | Removes old GetUpdate implementation. |
| Core/Graphics/InternalInclude/Babylon/Graphics/FrameBuffer.h | Removes encoder parameters from bind/unbind/viewport/scissor APIs. |
| Core/Graphics/Source/FrameBuffer.cpp | Updates FrameBuffer internals to acquire view IDs without needing an encoder parameter. |
| Plugins/NativeEngine/Source/NativeEngine.h | Switches to active-encoder access + outside-frame scope holding; adds begin/end frame no-ops. |
| Plugins/NativeEngine/Source/NativeEngine.cpp | Updates command submission and RAF scheduling to use frame-start scheduling + scopes; updates ReadTexture blit scheduling. |
| Plugins/NativeEngine/Source/PerFrameValue.h | Adjusts API to remove encoder parameters (but currently has a type-signature issue). |
| Plugins/NativeXr/Source/NativeXrImpl.h | Removes stored Update member from XR session state. |
| Plugins/NativeXr/Source/NativeXrImpl.cpp | Migrates XR scheduling to frame-start + scopes; updates encoder usage. |
| Polyfills/Canvas/Source/Context.h | Removes stored Update member. |
| Polyfills/Canvas/Source/Context.cpp | Uses GetActiveEncoder() and acquires a FrameCompletionScope when flushing outside the frame cycle. |
| Polyfills/Canvas/Source/nanovg/nanovg_babylon.cpp | Updates FrameBuffer binding calls to new signature (no encoder param). |
| Apps/HeadlessScreenshotApp/Win32/App.cpp | Wraps startup wait with Start/Finish frame calls to keep gate open during startup. |
| Apps/StyleTransferApp/Win32/App.cpp | Same startup framing change as HeadlessScreenshotApp. |
| Core/Graphics/CMakeLists.txt | Removes SafeTimespanGuarantor sources from build. |
Comments suppressed due to low confidence (1)
Plugins/NativeEngine/Source/PerFrameValue.h:33
PerFrameValueis templated onT, butSetcurrently takes aboolparameter. This makes the template misleading and will break or silently convert ifPerFrameValueis ever instantiated with a non-bool type. ChangeSetto acceptT(e.g.,const T&orT) to match the class template parameter.
T Get() const
{
return m_value;
}
void Set(bool value)
{
m_value = value;
if (!m_isResetScheduled)
{
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
9f25ed5 to
0b8bfad
Compare
…er FrameCompletionScope acquisition.
…ompletionScope instead of deferred member release.
This reverts commit 0d315bb.
…cquired FrameCompletionScope in SubmitCommands.
…ck on FrameCompletionScope.
…enderScheduler fires one frame late when called outside RAF.
…te leaking into nanovg rendering.
… getFrameBufferData outside RAF.
0b8bfad to
e97bfdc
Compare
bghgary
added a commit
to bghgary/BabylonNative
that referenced
this pull request
Apr 23, 2026
Matches the established pattern from ExternalTexture.AddToContextAsyncAndUpdate on master: wait for JS work to complete while the frame is still open, then Finish. The new CreateForJavaScript and RenderTextureArray tests in this PR had accidentally dropped this pattern, which works on master but deadlocks under the reworked threading model in BabylonJS#1652 (Finish closes the frame gate while JS is still acquiring FrameCompletionScopes). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bghgary
added a commit
to bghgary/BabylonNative
that referenced
this pull request
Apr 29, 2026
Hypothesis test for the 4-30x Linux JSC CI runtime variance on rework-thread-model: The new FrameCompletionScope gate model on this branch makes JS-thread SubmitCommands legal only between StartRenderingCurrentFrame and FinishRenderingCurrentFrame. The previous test pump called Start and Finish back-to-back with the wait_for AFTER Finish — collapsing the open window to ~zero. JS-thread work then had to win a scheduler race to land its scope before the pump thread re-acquired the mutex and closed the gate. On contended runners, JS thread loses that race many times per second, producing 4-30x runtime variance. Restructure: open the gate, then wait_for, then briefly close+reopen each iteration. Gate is open ~99% of the time so JS thread never has to race. If this commit reduces variance to master-baseline (5-10 min) consistently, the production fix belongs in BabylonJS#1652 itself (either invert the consumer pattern or eliminate the gate). [Created by Copilot on behalf of @bghgary] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bghgary
added a commit
to bghgary/BabylonNative
that referenced
this pull request
Apr 29, 2026
Root cause of the 4-30x Linux JSC CI runtime variance proven on rework-thread-model-perf branch (4 baseline runs hit 30-60min, 4 fix runs ran Unit Tests in 3 sec). On rework-thread-model the new FrameCompletionScope gate model makes JS-thread SubmitCommands legal only between StartRenderingCurrentFrame and FinishRenderingCurrentFrame. The previous test pump called Start and Finish back-to-back with the wait_for AFTER Finish, collapsing the open window to ~zero. JS-thread work then had to win a scheduler race to land its scope before the pump thread re-acquired the mutex and closed the gate. On contended runners JS lost that race repeatedly, accumulating per-poll 16ms+frame() stalls into 30+ minute runtimes. Restructure: open the gate before the loop, then for each iteration wait_for(16ms) -> Finish -> Start. Gate is open ~99% of the time so JS thread never has to race. Validation: rework-thread-model-perf branch (BabylonJS#1652 base + this fix only) ran Unit Tests in 3 seconds across all reachable jobs in 4 parallel runs. The previous baseline showed Unit Tests timing out at 30-60 min on multiple runs. Note: Validation Tests (Playground pixel diff) is a separate flaky issue unrelated to this fix. [Created by Copilot on behalf of @bghgary] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The unit-test pump has been calling Start; Finish back-to-back per loop iteration with the wait_for AFTER Finish, which collapses the FrameCompletionScope gate's open window to ~zero. With the new shared-gate model on rework-thread-model, JS-thread SubmitCommands (and similar) acquire a scope that blocks on the gate's CV; whenever they wake on notify_all from Start, they have to win a scheduler race against the pump thread re-entering Finish to land their scope before the gate closes. On contended Linux CI runners that race is lost frequently, accumulating 16ms+ per missed acquisition into 30+ minute test runtimes (vs ~5 minutes on master). Restructure the pump to follow the "Start ASAP / Finish; Start tick / final Finish" pattern: open the frame immediately after device creation so the JS thread can submit at any time, then in the pump loop just tick bgfx (Finish; Start) once per iteration, and rely on the existing trailing Finish (after nativeCanvas.reset()) to close the gate at shutdown. Drops the dead Start; Finish priming pair and the redundant post-loop Start. Validated locally (Win32 RelWithDebInfo): UnitTests --gtest_filter= JavaScript.* passes in 2.4s with all 24 sub-tests green. Validated on bghgary fork CI (4 parallel runs x 2 jobs, Linux JSC, prior version of this fix on bghgary/unittest-watchdog 140894b): all 8 jobs ran the Unit Tests step in 4 seconds consistently with total job duration 352-393s — back in master baseline range. [Created by Copilot on behalf of @bghgary] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bghgary
added a commit
to bghgary/BabylonNative
that referenced
this pull request
Apr 29, 2026
…abylonJS#1646 In BabylonJS#1674's combination of rework-thread-model (BabylonJS#1652) and the ExternalTexture sync API (BabylonJS#1646), three apps swapped the original async AddToContextAsync flow for synchronous CreateForJavaScript but kept the two-frame skeleton from the async version. The result is that the loader.Dispatch which runs the JS startup() callback now gets queued in frame 1 even though its observable wait (startup.get_future().wait()) is in frame 2 — the dispatch can land in either frame depending on JS-thread scheduling, breaking the "each Start/Finish pair wraps a phase of work" pattern. On rework-thread-model the JS thread will block on the closed gate between frames 1 and 2 anyway, so it functionally works, but the ordering is wrong. Move the dispatch into the second frame so the texture creation, startup() call, and the wait that observes them all run inside the same frame: frame 1: load scripts (no dispatch) frame 2: dispatch[startup] -> wait frame 3+: per-asset / render-loop Three apps had this issue; UnitTests' ExternalTexture tests already do it correctly. On bare rework-thread-model these apps still use AddToContextAsync and don't have the bug. Files: - Apps/HeadlessScreenshotApp/Win32/App.cpp - Apps/PrecompiledShaderTest/Source/App.cpp - Apps/StyleTransferApp/Win32/App.cpp Local build verification (Win32 RelWithDebInfo): - HeadlessScreenshotApp: builds cleanly - StyleTransferApp: builds cleanly - PrecompiledShaderTest: not configured in this Build/Win32; change is structurally identical to HeadlessScreenshotApp which compiled. [Created by Copilot on behalf of @bghgary] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolves four real conflicts from the integration of master's recent canvas/blit fixes (#1683) with the threading-model rework (#1652): * Core/Graphics/InternalInclude/Babylon/Graphics/DeviceContext.h: keep rework's AcquireNewViewId() (no encoder param), add master's PeekNextViewId() const. * Core/Graphics/Source/DeviceImpl.h: same — keep rework's AcquireNewViewId / IncrementPendingFrameScopes / DecrementPendingFrameScopes / SetActiveEncoder / GetActiveEncoder, add master's PeekNextViewId() const. * Plugins/NativeEngine/Source/NativeEngine.cpp::CopyTexture: adopt master's #1683 fix structurally — fetch encoder at top of function, schedule blit on PeekNextViewId() so it runs after every view used so far (matching bgfx's blit-in-numeric-view-order semantics). Substitute rework's GetEncoder() member for master's GetUpdateToken().GetEncoder() since UpdateToken is a no-op shim on rework. * Polyfills/Canvas/Source/Context.cpp::Flush: keep both — master's EnsureFontsLoaded() at top (CPU-only nanovg font registration, no GPU/encoder dependency), then rework's optional FrameCompletionScope acquisition for callers that hit Flush outside a frame cycle, then GetActiveEncoder + discard. Local validation: Win32 RelWithDebInfo UnitTests --gtest_filter= JavaScript.* passes in 2.3s, all 24 sub-tests green. [Created by Copilot on behalf of @bghgary] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The X11 Validation Tests flake on Linux JSC was caused by a gap in FrameCompletionScope coverage between two scope acquisitions in one logical JS-side update. SubmitCommands acquired a scope locally, processed its command stream, and released the scope when the function returned. If a single JS task made multiple submit calls (or did any non-bgfx work between them), the scope count dropped to 0 between calls. A concurrent FinishRenderingCurrentFrame on the render thread could then proceed through the !m_pendingFrameScopes wait, close the gate, and run bgfx::frame() with only a partial scene submitted -- producing the empty/clear-color screenshots observed in the flake's pixel diffs. Fix: capture the scope into an m_runtime.Dispatch lambda so it survives to the end of the current JS-thread task and is released when the runtime services its queue next. Continuation work in the same JS frame sees count > 0 and remains protected from concurrent Finish. Same shape as the existing RAF dispatcher's "prevent_frame" pattern, applied at the SubmitCommands funnel which is reached by every render path. Validated locally on Win32 RelWithDebInfo (UnitTests --gtest_filter= JavaScript.* passes in 3.3s, all 24 sub-tests green). [Created by Copilot on behalf of @bghgary] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…fresh docs The std::move(scope) capture in 7f98c7c produced a move-only lambda. Dispatchable at the AppRuntime layer is move-only and accepts it, but JsRuntime downstream type-erases the callable through std::function which requires copy-constructibility: std_function.h:439:18: error: static assertion failed due to requirement 'is_copy_constructible<...>::value': std::function target must be copy-constructible MSVC accepted this; gcc 14 (Ubuntu CI) rejects it. Wrap the scope in shared_ptr so the lambda is copyable, matching the existing RAF dispatcher pattern in ScheduleRequestAnimationFrameCallbacks. Also refresh the comments on FrameCompletionScope and AcquireFrameCompletionScope. The previous class-level comment listed three usage patterns, but one of them ("NativeEngine::GetEncoder() acquires lazily") didn't match the code — GetEncoder doesn't acquire a scope. Replace with the two patterns that are actually in use: 1. JS-frame scoped via m_runtime.Dispatch capture (RAF, SubmitCommands) 2. Block scoped on the stack (Canvas::Flush fallback, ReadTextureAsync) [Created by Copilot on behalf of @bghgary] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Schedule the FrameCompletionScope-releasing lambda before the command stream is processed, not after. Equivalent in the success path, but makes the JS-frame coverage exception-safe — if the command processing throws, the lambda is already queued and the scope survives to the next dispatch cycle, instead of being destroyed at the throw and dropping the count to 0 on the render thread's view. Also drops the explicit local in favour of an inline make_shared in the lambda capture — the local was only there to be captured. [Created by Copilot on behalf of @bghgary] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What Changed
The old model used per-thread encoders (
GetEncoderForThread/EndEncoders)and UpdateToken (wrapping
SafeTimespanGuarantor::SafetyGuarantee) for encoderaccess. The new model replaces all of this with a single frame encoder owned
by
DeviceImpland FrameCompletionScope for synchronization.SafeTimespanGuarantor,UpdateToken, andUpdateclasses are fully removed.DeviceUpdateremains as a no-op shim inDevice.h— its removal is aseparate PR because it is an API breaking change.
Summary of Changes
Core/Graphics — Removed
SafeTimespanGuarantor.h/SafeTimespanGuarantor.cpp— deletedUpdateToken,Updateclasses — removed fromDeviceContext.hGetSafeTimespanGuarantor()— removed fromDeviceImplGetEncoderForThread(),EndEncoders()— removed fromDeviceImplm_threadIdToEncoder,m_updateSafeTimespans— removed fromDeviceImplDevice::GetUpdate()— replaced with inline no-op inDevice.hDeviceUpdateclass — replaced with no-op shim (Start/Finish/RequestFinish do nothing)bgfx::Encoder¶meter — removed fromAcquireNewViewId,FrameBuffermethodsCore/Graphics — Added
m_frameEncoder— singlebgfx::Encoder*inDeviceImpl, acquired inStartRenderingCurrentFrame, ended inFinishRenderingCurrentFrameFrameCompletionScope— RAII counter that preventsFinishRenderingCurrentFramefrom proceeding while held
m_frameBlocked+m_pendingFrameScopes+ mutex/CV — blocking gate mechanismm_frameStartDispatcher— ticked inStartRenderingCurrentFramefor RAF schedulingm_firstFrameStarted— flag to distinguish first-call no-op from lifecycle errorsin
FinishRenderingCurrentFrameSetActiveEncoder()/GetActiveEncoder()— on DeviceImpl, forwarded by DeviceContextFrameStartScheduler(),AcquireFrameCompletionScope()— on DeviceContextCore/Graphics — Modified
StartRenderingCurrentFrame(): acquires encoder, opens gate, ticks frame startdispatcher
FinishRenderingCurrentFrame(): waits for scopes==0, closes gate, ticks beforeRenderdispatcher, ends encoder, calls
Frame(). Returns early on first call before anyStarthas been called; throws on subsequent out-of-order calls.FrameBuffermethods: get encoder internally viaGetActiveEncoder()instead ofparameter
Plugins/NativeEngine
GetUpdateToken().GetEncoder()withGetEncoder()→GetActiveEncoder()+ assertBeginFrame()/EndFrame()— no-ops (encoder lifecycle is in DeviceImpl)SubmitCommands()always acquires a stack-scopedFrameCompletionScope. Whenwithin a RAF callback the frame is already open and acquisition returns immediately.
When outside (e.g., XHR callback), it blocks until
StartRenderingCurrentFrameprovides the encoder. The scope lives on the stack — no deferred release, no member
variable, no risk of mid-execution release via microtask draining.
ReadTexture(): acquires its ownFrameCompletionScopeto ensure the encoderis available. When a blit is needed, executes it inline using
GetEncoder().Both blit and
ReadTextureAsyncland in the same frame. The scope keeps theframe open for the entire operation.
GetEncoder()includesassert(encoder != nullptr)— all call-site asserts removedm_update,m_updateTokenLoadTextureFromImage/LoadCubeTextureFromImages: cachedm_numMipsbeforeloop to avoid reading from potentially freed image (from PR Fixed race condition when passing data to bgfx. Issue #1398 #1628)
Plugins/NativeXr
Update.Scheduler()→FrameStartScheduler()Update.GetUpdateToken()→AcquireFrameCompletionScope()GetUpdateToken().GetEncoder()→GetActiveEncoder()Updatemember fromSessionStatePolyfills/Canvas
m_update.GetUpdateToken().GetEncoder()→GetActiveEncoder()Flush(): acquiresFrameCompletionScopewhen encoder is null (outside frame),plus defensive null check after acquisition — returns early if encoder still null
(shutdown edge case). Discards encoder state before nanovg rendering to prevent
NativeEngine state from leaking into Canvas rendering (old model had separate
per-thread encoder with clean state).
m_updatememberApps
Start/Finishframe aroundstartup wait to keep encoder available while JS runs
while waiting for JS test completion. Keep frame open during shutdown so JS
thread can complete pending work.
bgfx Fix (applied to build deps)
m_encoderBeginLockinbgfx::frame()to prevent deadlock withbgfx::destroy()Architecture
Single Encoder Lifecycle
Encoder Access
Per-Frame Sequence Diagram
ReadTexture
ReadTextureacquires its ownFrameCompletionScope— it can be called duringRAF, from
getFrameBufferDatacallbacks, or during initialization:SubmitCommands Scope Acquisition
SubmitCommandsalways acquires a stack-scopedFrameCompletionScope:This eliminates:
drainMicrotasks)m_outsideFrameScopemember variable and deferred release patternCanvas Flush
Synchronization Rules
Single encoder, shared by all consumers — DeviceImpl acquires the encoder in
StartRenderingCurrentFrameand all consumers read it viaGetActiveEncoder().Encoder acquired before gate opens —
bgfx::begin(true)is called BEFOREm_frameBlockedis set tofalse. Mutex provides memory ordering guarantee.Encoder ended after gate closes —
bgfx::end()is called AFTER allFrameCompletionScopes are released andm_frameBlockedis set totrue.SubmitCommands always holds a scope — Prevents encoder from being ended
mid-command-processing, regardless of microtask draining or reentrancy.
Apps must keep frame open when waiting for JS — Any main thread wait for JS
work must be bracketed by
Start/Finish. OtherwiseSubmitCommandsblocksforever waiting for the gate to open.
ReadTexture holds its own scope — Acquires
FrameCompletionScopeto ensureencoder is available for blit and
ReadTextureAsync. Both land in the same frame.Works during RAF, from
getFrameBufferDatacallbacks, and during initialization.Files Changed (vs 413cb49)
Core/Graphics/Source/DeviceImpl.hCore/Graphics/Source/DeviceImpl.cppCore/Graphics/InternalInclude/.../DeviceContext.hCore/Graphics/Source/DeviceContext.cppCore/Graphics/Include/.../Device.hCore/Graphics/Source/Device.cppCore/Graphics/InternalInclude/.../FrameBuffer.hCore/Graphics/Source/FrameBuffer.cppCore/Graphics/InternalInclude/.../SafeTimespanGuarantor.hCore/Graphics/Source/SafeTimespanGuarantor.cppPlugins/NativeEngine/Source/NativeEngine.hPlugins/NativeEngine/Source/NativeEngine.cppPlugins/NativeXr/Source/NativeXrImpl.h/cppPolyfills/Canvas/Source/Context.h/cppApps/HeadlessScreenshotApp/Win32/App.cppApps/StyleTransferApp/Win32/App.cppApps/UnitTests/Source/Tests.JavaScript.cppbgfx/src/bgfx.cpp+bgfx_p.hWhat Remains for Separate PR
Remove DeviceUpdate —
DeviceUpdateis a no-op shim retained because itsremoval is an API breaking change (apps call
GetUpdate/Start/Finish).All other changes in this branch are non-breaking. Removing DeviceUpdate will
be a separate PR that updates all app call sites.