Skip to content

[draft] Combined: bgfx MSAA+mipmaps fix + Canvas FB-pool survivability (CI smoke)#1715

Draft
bkaradzic-microsoft wants to merge 5 commits into
BabylonJS:masterfrom
bkaradzic-microsoft:weekend/2026-05-21-msaa-cascade
Draft

[draft] Combined: bgfx MSAA+mipmaps fix + Canvas FB-pool survivability (CI smoke)#1715
bkaradzic-microsoft wants to merge 5 commits into
BabylonJS:masterfrom
bkaradzic-microsoft:weekend/2026-05-21-msaa-cascade

Conversation

@bkaradzic-microsoft
Copy link
Copy Markdown
Contributor

Note

Draft PR for CI smoke-test only. Combines the two weekend feature branches into one tree to exercise all platform builds at once. Do not review or land from here — splits below.

This combined branch contains two independent fixes:

  1. MSAA + mipmaps fails on D3D11/D3D12/Vulkan (bgfx) #1714 — bgfx MSAA + mipmaps fix (bn-fix-msaa-mips on BabylonJS/bgfx, picked up via bgfx.cmake bump). Sets MipLevels=1 on the MSAA target texture in all four backends (D3D11, D3D12, Vulkan, Metal); the resolve target still gets m_numMips. Without this, MSAA-enabled render targets that also request mipmaps fail validation (E_INVALIDARG on D3D11/12, VUID on Vulkan, MTL validation on Metal). Three-level pin chain:

    • BabylonJS/bgfx branch bn-fix-msaa-mips @ b617ec8f4
    • BabylonJS/bgfx.cmake branch bn-bump-bgfx-msaa-mips @ 2bd6f6214 (advances bgfx submodule)
    • This branch advances CMakeLists.txt GIT_TAG for bgfx.cmake to 2bd6f6214
  2. Canvas framebuffer-pool survivability fix (cascade root cause). Many playground tests were marked excludeFromAutomaticTesting with reasons like "Suspected to cause Win32 D3D11 cascade crash". Empirically: the cascade is V8 GC pacing outpacing bgfx's default 128-slot framebuffer pool; whichever test ran when the pool finally exhausted got blamed, varying from run to run, so dozens of innocent tests accumulated the exclusion.

    This branch:

    • Bumps default BGFX_CONFIG_MAX_FRAME_BUFFERS from 128 → 512 in Dependencies/CMakeLists.txt and propagates CMake -D overrides to bgfx's compile defines (the existing CI -D BGFX_CONFIG_MAX_FRAME_BUFFERS=256 was previously a silent no-op).
    • Replaces three bare assert(handle.idx != bgfx::kInvalidHandle) sites with std::runtime_error throws + texture cleanup, so pool exhaustion no longer abort()s the process. The Canvas Context::Flush JS binding now catches and rethrows as Napi::Error so just the offending test fails instead of killing the whole sweep.
    • Re-enables 112 playground tests in config.json that were marked excluded as cascade victims. All 112 verified to pass when run sequentially as one batch on Win32 D3D11 Release with the FB pool bump.

CI expectations

This is the first time these two changes have been exercised on the non-Win32-D3D11 backends. Things to watch for:

  • bgfx MSAA + mipmaps fix: validation layers on D3D12 / Vulkan / Metal should all be clean now where they previously errored. If any backend still errors, the fix needs further work in the respective renderer_*.cpp.
  • Canvas FB-pool survivability: tests should pass without --include-excluded. The 112 re-enabled tests' new validated lines should appear in the sweep output. A test that genuinely fails on a non-Win32-D3D11 backend (real rendering bug, not cascade) will show up as a new pixel-diff or JS error in that backend's CI log.

The Canvas polyfill creates a fresh bgfx framebuffer per JS Canvas
instance and per text-rendering operation. Under V8's GC pacing,
finalizers (which release the framebuffers) only run in batches, so
a long playground sweep can outpace the default 128-slot bgfx pool.
The first allocation past the cliff returned BGFX_INVALID_HANDLE,
the bare assert() at Canvas.cpp tripped (Debug) or the invalid handle
silently bled into Graphics::FrameBuffer/bgfx-internal arrays (Release,
later AV at bgfx_p.h:5526). The whole run aborted regardless of which
test happened to be unlucky.

Three changes:

- Dependencies/CMakeLists.txt now propagates BGFX_CONFIG_MAX_FRAME_BUFFERS
  to the bgfx target as a compile definition, defaulting to 512 (4x the
  bgfx default) and honoring any -D override the way CI workflows
  already pass it. Previously the CI -D was a no-op.

- Canvas.cpp UpdateRenderTarget() now throws std::runtime_error when
  createFrameBuffer returns kInvalidHandle and frees the two textures
  it just allocated. Context::Flush wraps the call and rethrows as
  Napi::Error so the offending test fails cleanly without taking the
  rest of the sweep down.

- FrameBufferPool::Add gets the same treatment for symmetry.

The pool bump alone is expected to remove the cascade in practice; the
throw paths are defense-in-depth so future regressions surface as a
catchable JS error instead of a silent process kill.

Refs BabylonJS/BabylonNative cascade-victim discussion (113 playground
tests previously excluded with reasons like "Suspected to corrupt state
causing later D3D11 cascade crash").
@bkaradzic-microsoft bkaradzic-microsoft force-pushed the weekend/2026-05-21-msaa-cascade branch 5 times, most recently from 6d2a348 to b467942 Compare May 21, 2026 22:19
Re-enables 112 tests previously excluded with "Suspected to corrupt state
causing later Win32 D3D11 cascade crash" reasons, now that PR BabylonJS#1573
(Canvas framebuffer pool survivability fix) makes the in-suite cascade
failure mode survivable.

For 21 of the 112 cascade-victims whose original reason also mentioned
Linux/OpenGL or D3D11 issues, re-add "excludedGraphicsApis: ["OpenGL"]"
or keep "excludeFromAutomaticTesting: true" so they stay excluded on the
affected backend(s):

  OpenGL-only exclusion (15 tests):
    47 Fresnel, 54 Scissor test w/ 0.9 HW scaling, 101 Gaussian Splatting
    viewports, 137 Volumetric Light Scattering PP w/ Morph Targets,
    159 Node material 0, 161 Node material 2, 163 Node material 4,
    165 Node material 6, 166 Node material 7, 189 Fresnel,
    240 Depth of field, 258 Local cubemaps,
    312 Scissor test w/ 0.9 HW scaling, 378 SerializeScene + ImportMesh
    w/ MorphTargetManager, 385 Command encoder order in WebGPU 2.

  Excluded on both D3D11 and OpenGL (6 tests):
    358 Roundtrip babylon file (skeletal/morph), 397 simple-sphere-in-4-
    mirrors, 398 procedural-texture-with-node-material, 400 simple-
    custom-shader, 409 lens-flare, 410 change-texture-of-material.

Also fixes Dependencies/CMakeLists.txt:62 to guard the BGFX_CONFIG_MAX_
FRAME_BUFFERS bump with "if(NOT X)" instead of "if(NOT DEFINED X)"
since bgfx.cmake declares the variable as "" CACHE STRING (defined but
empty). Without this fix, all non-Win32 CI builds compiled with an
empty -D flag and broke bgfx_p.h template instantiation.
@bkaradzic-microsoft bkaradzic-microsoft force-pushed the weekend/2026-05-21-msaa-cascade branch from b467942 to 9dac136 Compare May 21, 2026 22:57
bkaradzic-microsoft added a commit to BabylonJS/bgfx that referenced this pull request May 22, 2026
The SortKey struct in src/bgfx_p.h declared its members with no inline
initializer and relied on every caller to invoke reset() before reading
m_hasAlphaRef, m_blend, etc. via encodeDraw(). On at least one code path
exercised by BabylonNative's Playground sweep, encodeDraw(SortSequence)
runs before reset() / setState(), causing UndefinedBehaviorSanitizer to
fire with:

    bgfx_p.h:1339:42: runtime error: load of value 190, which is not a
    valid value for type 'bool'

at the m_hasAlphaRef read. Add in-class default initializers to all six
members so a default-constructed SortKey is in a valid (zeroed) state.
The reset() method continues to work unchanged.

Reported on the application side as BabylonJS/BabylonNative#1715
(MacOS_Sanitizers job).
@bkaradzic-microsoft bkaradzic-microsoft force-pushed the weekend/2026-05-21-msaa-cascade branch 2 times, most recently from 920b18e to a720c85 Compare May 22, 2026 00:44
…xes (BabylonJS#1714).

bgfx.cmake (BabylonJS) branch: bn-bump-bgfx-msaa-mips-and-ubsan @ 4a53db5
- bgfx submodule -> 89dcbaea (b617ec8f MSAA+mips + cherry-picked upstream
  bgfx PR #3688 UBSan: SortKey reset() in EncoderImpl ctor + BitMask
  shift-by-width guard)
- bx and bimg unchanged from BabylonJS/bgfx.cmake master
@bkaradzic-microsoft bkaradzic-microsoft force-pushed the weekend/2026-05-21-msaa-cascade branch from a720c85 to 1c1942e Compare May 22, 2026 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant