Browser async/sync: core async virtuals + real ReduceAsync + race detector#6
Merged
Merged
Conversation
…Async Root cause: ILGPU core had no overridable async drain/readback, so the algorithm layer (which references only core) could not reach a backend's real async wait. AcceleratorStream.SynchronizeAsync was a non-virtual Task.Run(sync Synchronize) - fake on Wasm where Synchronize is a no-op - and ReductionExtensions.ReduceAsync was Task.Run(sync Reduce->T), whose inner CopyToCPU throws on WebGPU (no sync GPU->CPU readback) and reads stale data on Wasm (the reduction kernel is still in flight). Core (ILGPU): - AcceleratorStream.SynchronizeAsync() made virtual. - Accelerator.SynchronizeAsync() added (virtual; default runs sync Synchronize + completed task). - MemoryBuffer.CopyToRawAsync(stream, offsetBytes, lengthBytes) added (virtual; default drains then sync CopyTo) + ArrayView<T>.CopyToCPUAsync extension. Backends (SpawnDev.ILGPU): Wasm/WebGPU/WebGL override SynchronizeAsync (accelerator + stream) and CopyToRawAsync with their real async drain + readback (worker-dispatch await, queue.OnSubmittedWorkDone + mapAsync, GL-worker readback). Algorithms: ReduceAsync (both overloads) rewritten to real async (dispatch -> SynchronizeAsync -> CopyToCPUAsync). Synchronous Reduce->T now throws a clear NotSupportedException on Wasm/WebGL/WebGPU instead of returning stale data. Test: ILGPUReduceAsyncTest exercises dispatch -> real async drain -> async readback. PMT scoped run green on CPU/CUDA/OpenCL/WebGPU/Wasm (48 passed, 0 failed); WebGL skips (no shared memory/barriers). Docs: Wasm/CLAUDE.md async drain/readback section. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…raversal) GLSL codegen now builds hoisted struct/local default initializers with per-field constructors (GetStructDefaultInitializer) instead of a single flat constructor, and hoists all PointerType values as int. glWorker keys the program cache by shader source so a changed source recompiles (and the stale program/shaders are deleted) instead of returning a cached mismatch. Version -> 4.9.10-local.15. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Lets the shared test project exercise ML pipelines directly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Completes the browser-safe async buffer-op surface alongside CopyFromAsync / CopyToHostAsync / SynchronizeAsync. On Wasm the sync MemSetToZero writes the SharedArrayBuffer immediately and bypasses the dispatch queue, racing in-flight worker kernels; MemSetToZeroAsync awaits the accelerator drain first so the zero-fill is correctly ordered after pending dispatches. WebGPU/WebGL/desktop ordering is already handled by their encoder/worker/stream, so the explicit wait is Wasm-only (mirrors CopyFromAsync). Test MemSetToZeroAsyncTest: kernel fills nonzero (unawaited) -> MemSetToZeroAsync -> readback all zeros. PMT green on CPU/CUDA/OpenCL/WebGPU/Wasm (6 pass/0 fail); WebGL skips (MemSet is deferred CPU-side upload). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
WasmMemoryBuffer.DetectHostBufferRaces (default false): when true, the synchronous host ops (MemSet / CopyTo / CopyToHost) throw if the buffer has an in-flight dispatch (_pendingSnapshotIntents > 0, incremented synchronously at queue time in RunKernel) - i.e. the host is reading/zeroing a SharedArrayBuffer that worker kernels may still be writing. CopyFrom* are NOT guarded; the lazy snapshot mechanism (PrepareHostWrite) protects them by design. This mechanizes the async/sync audit: enable it in a PMT sweep to ENUMERATE any remaining sync-readback race sites that the async APIs (CopyToHostAsync / CopyFromAsync / MemSetToZeroAsync / SynchronizeAsync) replace. A properly-drained path never trips it. Test WasmTests.DetectHostBufferRaceTest: a sync read on the same JS turn as an unawaited dispatch deterministically throws; the identical read succeeds after SynchronizeAsync. PMT green on Wasm (1 pass/0 fail). Wasm/CLAUDE.md documented. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes the browser single-threaded-async bug class Trip's research (
_research/08-browser-async-vs-ilgpu-sync.md) surfaced: ILGPU core had no overridable async drain/readback, soILGPU.Algorithms(references only core) couldn't reach a backend's real async wait.AcceleratorStream.SynchronizeAsyncwas a non-virtualTask.Run(sync Synchronize)— fake on Wasm — andReduceAsyncwasTask.Run(sync Reduce->T), whoseCopyToCPUthrew on WebGPU / read stale on Wasm.Commits
2717f90):Accelerator/AcceleratorStream.SynchronizeAsyncmade virtual; newMemoryBuffer.CopyToRawAsync+ArrayView<T>.CopyToCPUAsync. Wasm/WebGPU/WebGL override all three with their real drain + async readback.ReduceAsyncrewritten real-async; syncReduce->TthrowsNotSupportedExceptionon browser backends instead of silent-stale.d73880c): carried-forward, version-noted local.15.e7e0ff1).MemSetToZeroAsync(be02c8e): the missing async sibling ofCopyFromAsync.DetectHostBufferRaces(97da1f2): opt-in detector that throws on a syncMemSet/CopyTo/CopyToHostagainst an in-flight buffer — mechanizes the audit.Verification (PMT)
ILGPUReduceAsyncTest— green on CPU/CUDA/OpenCL/WebGPU/Wasm (WebGL skips); previously WebGPU threw, Wasm stale.MemSetToZeroAsyncTest— green ×5 backends.DetectHostBufferRaceTest— green on Wasm (deterministic: sync read on the same JS turn as an unawaited dispatch throws; succeeds afterSynchronizeAsync).Not in scope (documented for follow-up)
Confirmed-broken sites needing async-signature changes (
OptimizationEngine.FetchToCPUAsync,LoadParametersInternal,Optimizer.MemSetToZero,SparseMatrix.CopyToCPU,ConcurrentStreamProcessor) and the 10 ML sync sites — deferred to avoid rushing invasive changes.🤖 Generated with Claude Code