Skip to content

Signal handling for non-Python WASM modules#6154

Open
logan-gatlin wants to merge 21 commits intomainfrom
logan/wasm-signal-handling
Open

Signal handling for non-Python WASM modules#6154
logan-gatlin wants to merge 21 commits intomainfrom
logan/wasm-signal-handling

Conversation

@logan-gatlin
Copy link

@logan-gatlin logan-gatlin commented Feb 24, 2026

Continuation of 6098

This adds a shim to WebAssembly.Instantiate and friends which looks for modules that export two i32 globals: __instance_signal and __instance_terminated. Modules matching one of these signatures will receive a hook, allowing us to read and write to those memory locations, allowing for signal-based communication with modules.

The addition of __instance_terminated is required, since we must know when the module is dead so we can stop sending it signals. This is also useful for modules that want to know when they have been terminated, so that they can re-initialize.

Copy link
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

Summary: Adds a JS shim around WebAssembly.instantiate/Instance to detect WASM modules exporting __signal_address/__terminated_address globals, registering them for signal-based shutdown notification via a new signal-safe AtomicList data structure.

Issues ranked by severity:

  1. [High] Missing compat flag — behavioral change applied unconditionally to all workers. The shim wraps WebAssembly.instantiate's return promise with an extra .then(), which adds a microtask tick before results are delivered. This subtly changes promise scheduling for every worker that uses WebAssembly.instantiate, not just ones with signal globals. workerd has a strong backward-compat commitment; this kind of observable behavioral change must be gated behind a compatibility flag in compatibility-date.capnp.

  2. [High] Missing autogate for staged rollout. Even with a compat flag, this is a risky change (shimming fundamental WebAssembly APIs, new signal-handler code path). It should additionally be gated behind an autogate flag for safe incremental rollout.

  3. [High] No tests. There are no tests for any part of this feature: AtomicList, the JS shim, registerWasmShutdownSignal, or the GC prologue filtering. AtomicList is a custom concurrent data structure used from signal handlers — this absolutely needs unit tests, including tests that verify signal-safety invariants (e.g., that iterate() sees a consistent list while filter() is interrupted).

  4. [Medium] WebAssembly.Instance shim changes behavior when called without new. The original WebAssembly.Instance throws if called as a regular function (not as a constructor). The shim is a plain JS function that internally does new originalInstance(...), so WebAssembly.Instance(module, imports) (without new) would succeed instead of throwing. This is a spec deviation.

  5. [Medium] AtomicList uses GCC __atomic builtins on non-atomic types. Node* head and Node* next are plain pointers, not std::atomic<Node*>. The GCC builtins work in practice, but this is technically undefined behavior per the C++ memory model. Consider using std::atomic<Node*> for head and Node::next — the generated code is identical, and it makes the concurrency intent explicit and portable.

  6. [Low] <atomic> header is included but std::atomic is not used. Either switch to std::atomic (see issue 5) or remove the unused include.

  7. [Low] signalByteOffset and terminatedByteOffset can overlap. The bounds checks don't verify the two offsets point to non-overlapping memory. Overlapping offsets would cause the runtime's own signal write to be misinterpreted as the module's termination flag. A simple KJ_REQUIRE(signalOffset != terminatedOffset) would catch this.

@ask-bonk
Copy link
Contributor

ask-bonk bot commented Feb 24, 2026

I'm Bonk, and I've done a quick review of your PR.

Summary: Adds a JS shim around WebAssembly.instantiate/Instance to detect WASM modules exporting __signal_address/__terminated_address globals, registering them for signal-based shutdown notification via a new signal-safe AtomicList data structure.

I found 7 issues and posted them as a batch review on the PR. The highest severity items are:

  1. [High] Missing compat flag — The .then() wrapper adds a microtask tick to WebAssembly.instantiate for all workers, which is an observable behavioral change that needs a compat flag.
  2. [High] Missing autogate — Shimming fundamental WebAssembly APIs with new signal-handler code is risky and should use an autogate for staged rollout.
  3. [High] No tests — No tests for AtomicList, the JS shim, registration, or GC prologue filtering.
  4. [Medium] Instance shim spec deviationWebAssembly.Instance() without new would succeed instead of throwing (suggestion provided).
  5. [Medium] UB with __atomic builtins on non-atomic typeshead/next should be std::atomic<Node*> (suggestion provided).
  6. [Low] Unused <atomic> include.
  7. [Low] No overlap check for signal/terminated offsets (suggestion provided).

github run

@codecov-commenter
Copy link

codecov-commenter commented Feb 24, 2026

Codecov Report

❌ Patch coverage is 84.68992% with 79 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.65%. Comparing base (c0f9d42) to head (fb6d55a).
⚠️ Report is 230 commits behind head on main.

Files with missing lines Patch % Lines
src/workerd/io/wasm-shutdown-signal-test.c++ 83.56% 0 Missing and 58 partials ⚠️
src/workerd/io/worker.c++ 71.64% 12 Missing and 7 partials ⚠️
src/workerd/io/wasm-shutdown-signal.h 96.82% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6154      +/-   ##
==========================================
- Coverage   70.70%   70.65%   -0.05%     
==========================================
  Files         409      416       +7     
  Lines      109411   110493    +1082     
  Branches    18018    18207     +189     
==========================================
+ Hits        77356    78070     +714     
- Misses      21246    21553     +307     
- Partials    10809    10870      +61     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial feedback, please let me know when this is ready for final review.

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 25, 2026

Merging this PR will not alter performance

✅ 70 untouched benchmarks
⏩ 129 skipped benchmarks1


Comparing logan/wasm-signal-handling (fb6d55a) with main (c8cd8f1)

Open in CodSpeed

Footnotes

  1. 129 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@guybedford guybedford marked this pull request as ready for review February 25, 2026 19:57
@guybedford guybedford requested review from a team as code owners February 25, 2026 19:57
Copy link
Contributor

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be great to see this feature shipped. There is an integration test PR already in cloudflare/workers-rs#939. @jasnell would value your input on if we need to think more about detached buffers here before landing.

auto backingStore = memory->Buffer()->GetBackingStore();
auto wasmMemory =
kj::arrayPtr(static_cast<kj::byte*>(backingStore->Data()), backingStore->ByteLength())
.attach(kj::mv(backingStore));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, may need to consider v8 sandbox / memory protection key interactions here. @erikcorry

Copy link
Author

@logan-gatlin logan-gatlin Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke with Corry on this, but any misstatements are my own. I believe that MPK interactions are actually an existing concern with Python workers. Currently WASM linear memories are not MPK protected, which would explain why we have not had issues so far, and why this code is fine for the moment. I will make a follow-up to this PR which future proofs both implementations for when we enable protections in the future.

Copy link
Contributor

@ryanking13 ryanking13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some nitpicky comments, not blocking merge.

@logan-gatlin
Copy link
Author

Minor note: I added wasm-tools as a test dependency, so that the wasm test files can be compiled instead of committed as binaries.

Co-authored-by: Guy Bedford <gbedford@cloudflare.com>
auto backingStore = memory->Buffer()->GetBackingStore();
auto wasmMemory =
kj::arrayPtr(static_cast<kj::byte*>(backingStore->Data()), backingStore->ByteLength())
.attach(kj::mv(backingStore));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if signalOffset is the offset into backingStore->Data(), it might be safer here to limit the kj::arrayPtr(...) to start at that point.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little bit tricky because there are two signals that need to be in scope. Could restrict it to the minimum span containing both maybe? Seems overkill to me

…posal

Each WasmShutdownSignal entry holds a shared_ptr<v8::BackingStore> via
kj::Array::attach(). In Worker::Isolate, member destruction order destroys
the V8 isolate (api) before limitEnforcer (which owns the AtomicList).
The BackingStore destructor lives in libv8.so and may access V8 internal
state, so dropping these shared_ptrs after V8 disposal is a use-after-free.

Fix: explicitly clear the AtomicList in ~Isolate() while V8 is still alive.
Copy link
Contributor

@guybedford guybedford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just needs the suggested memory lookup adjustment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants