Skip to content

fix(rivetkit-wasm): fix mem leaks#4880

Merged
NathanFlurry merged 1 commit intomainfrom
05-02-fix_rivetkit-wasm_fix_mem_leaks
May 2, 2026
Merged

fix(rivetkit-wasm): fix mem leaks#4880
NathanFlurry merged 1 commit intomainfrom
05-02-fix_rivetkit-wasm_fix_mem_leaks

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented May 2, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@NathanFlurry NathanFlurry mentioned this pull request May 2, 2026
11 tasks
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 2, 2026

Code Review: fix(rivetkit-wasm): fix mem leaks (#4880)

This is a solid, focused PR. The work falls into four categories: fixing Box::leak memory leaks by interning schemas globally, centralizing HTTP status promotion in rivetkit-core, fixing the wasm websocket callback region slot management, and wiring register_task correctly. The test coverage is unusually good. A few issues and one genuine bug are noted below.


Bug: Duplicate import in napi_actor_events.rs test file

File: rivetkit-typescript/packages/rivetkit-napi/tests/napi_actor_events.rs

The same RivetTransportError alias appears to be declared twice via separate use rivet_error::... lines. If this is the case, the compiler will emit "the name RivetTransportError is defined multiple times" and the test binary will fail to compile. Check that only a single combined import form exists, e.g. use rivet_error::{RivetError as RivetTransportError, RivetErrorSchema};


Bug/Logic: anyhow_to_bridge_rivet_error_payload - the should_promote condition may incorrectly override a deliberately-set error

Files: rivetkit-typescript/packages/rivetkit-napi/src/lib.rs, rivetkit-typescript/packages/rivetkit-wasm/src/lib.rs

The guard uses an || chain where promotion fires if any single condition is true, including context.public_ != Some(true). An error with public_: Some(false), status_code: Some(404) (a deliberate non-public override) would still be promoted because public_ != Some(true) is satisfied. The intent appears to be: only override errors that were not explicitly set by the caller. The safer condition should use && instead of ||:

context.public_ != Some(true) && context.status_code.map_or(true, |s| s == 500)

Duplication: should_promote logic and payload serialization between NAPI and wasm

The anyhow_to_bridge_rivet_error_payload body is duplicated verbatim in both rivetkit-napi/src/lib.rs and rivetkit-wasm/src/lib.rs. The new public_error_status_code function in core is a good start, but the full function could eventually move to rivetkit-core::error and be called by both adapters. Low severity for this PR, but worth a follow-up task.


Missing tracing log in wasm anyhow_to_bridge_rivet_error_payload

The NAPI version emits a tracing::error! with structured fields whenever it encodes a bridge error. The wasm version emits nothing. At minimum, a tracing::warn! or console_error in debug builds would bring the two paths to parity and make wasm bridge errors diagnosable.


Off-by-one in allocate_websocket_callback_region_id

The loop for _ in 0..u32::MAX iterates u32::MAX - 1 times, not u32::MAX times. In the completely impractical scenario where exactly u32::MAX - 1 slots are occupied, the one free slot would be missed and the function would return None spuriously. Use 0..=u32::MAX or restructure as a loop with a visited-count guard.


Overly strict monotonic assertion in test

The inline test websocket_callback_regions_are_removed_after_end asserts region_id > previous_id, which will fail once the ID counter wraps past u32::MAX. Replace with assert_ne!(region_id, 0) plus an assertion that the ID is not currently live in the map.


Global schema-count assertion is racey under parallel tests

The inline test parse_bridge_rivet_error_reuses_interned_schema snapshots BRIDGE_RIVET_ERROR_SCHEMAS.len() before and after without holding a lock. Concurrent test threads inserting their own entries can cause the delta assertion to fail spuriously. Either drop the count assertion and instead check only that the cached pointer address is stable, or annotate with #[serial_test::serial].


Missing trailing newline in error artifact

engine/artifacts/errors/wasm.invalid_config.json is missing the trailing newline that all other artifact files in that directory have. Minor, but inconsistent.


Positive observations

  • Moving from Vec<Option<WebSocketCallbackRegion>> to HashMap<u32, WebSocketCallbackRegion> backed by scc::HashMap is a clear improvement: no retained None slots, and end_websocket_callback becomes a clean remove.
  • STRUCTURED_TIMEOUT_SCHEMAS correctly uses scc::HashMap with entry_sync for lock-free interning, which is the right pattern.
  • register_task now routes through ActorContext::register_task in wasm (instead of the previous wait_until path), and UserTaskKind::RegisteredTask is correctly added to the ALL array.
  • Removing the duplicated JS-side normalizeDecodedBridgePayload / promoteKnownBridgeError functions from multiple TypeScript files is a good cleanup.
  • Test coverage is thorough: the inline wasm tests cover schema-cache interning, status promotion, ID wrap, and per-end cleanup.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

Preview packages published to npm

Install with:

npm install rivetkit@pr-4880

All packages published as 0.0.0-pr.4880.34d3f72 with tag pr-4880.

Engine binary is shipped via @rivetkit/engine-cli on linux-x64-musl, linux-arm64-musl, darwin-x64, and darwin-arm64. Windows users should use the release installer or set RIVET_ENGINE_BINARY.

Docker images:

docker pull rivetdev/engine:slim-34d3f72
docker pull rivetdev/engine:full-34d3f72
Individual packages
npm install rivetkit@pr-4880
npm install @rivetkit/react@pr-4880
npm install @rivetkit/rivetkit-napi@pr-4880
npm install @rivetkit/workflow-engine@pr-4880

@NathanFlurry NathanFlurry force-pushed the 05-02-fix_rivetkit-wasm_fix_mem_leaks branch from b8a3dc6 to 466c831 Compare May 2, 2026 21:32
@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch 2 times, most recently from 7600ef8 to ecba65a Compare May 2, 2026 22:13
@NathanFlurry NathanFlurry force-pushed the 05-02-fix_rivetkit-wasm_fix_mem_leaks branch from 466c831 to f914045 Compare May 2, 2026 22:13
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 2, 2026

PR Review: fix(rivetkit-wasm): fix mem leaks

Status: DRAFT

Overview

This PR addresses memory leaks and correctness gaps in the wasm runtime identified in a review of #4860:

  • Intern leaked RivetErrorSchema values via process-global LazyLock<SccHashMap> caches (each (group, code) pair leaked at most once)
  • Move HTTP-status promotion to rivetkit-core::error::public_error_status_code; remove duplicate TS-side promotion tables
  • Fix WebSocket callback region management: unbounded Vec<Option<_>> replaced with HashMap<u32, _> and a wrapping ID allocator
  • Properly wire registerTask to ActorContext::register_task (was incorrectly forwarding to wait_until)
  • Pin wasm-pack@0.14.0 as a devDependency instead of npx -y wasm-pack
  • Add validate_wasm_serve_config to reject engine_binary_path before core setup

Bugs / Potential Compile Errors

1. Duplicate import in tests/napi_actor_events.rs

The file already contains use rivet_error::RivetError as RivetTransportError;. The diff adds a second import of the same alias, which is a compile error.

2. NAPI STRUCTURED_TIMEOUT_SCHEMAS key lifetime mismatch

The cache is declared with key type (&'static str, &'static str), but structured_timeout_schema(group: &str, code: &str, ...) passes non-'static slices to entry_sync((group, code)). scc::HashMap::entry_sync takes K by value, so (&str, &str) against K = (&'static str, &'static str) is a lifetime mismatch. The Vacant branch also stores the raw group/code params in the leaked RivetErrorSchema without leaking the strings first.

The wasm-side counterpart (BRIDGE_RIVET_ERROR_SCHEMAS) correctly uses (String, String) as the key. The napi-side should match.


Logic Concerns

should_promote third arm

In both rivetkit-napi/src/lib.rs and rivetkit-wasm/src/lib.rs, a payload with public: true but statusCode: 500 still triggers re-promotion via the context.status_code == Some(500) arm. This is probably intentional (user-produced 500 should not suppress a semantic status from core), but a short inline comment would clarify the invariant.

Misleading monotonicity assertion in wasm websocket test

assert!(region_id > previous_id) holds for 1000 iterations but IDs wrap at u32::MAX. Wrap behavior is already covered by websocket_callback_region_ids_wrap_without_collision. Consider assert_ne!(region_id, 0) to avoid a confusing failure if the iteration count is ever raised.


Missing Method Reference

The #[cfg(target_arch = "wasm32")] tests call WasmActorContext::from_core(...) but only a new(inner, callbacks) constructor is visible in the diff. If from_core is not defined elsewhere in lib.rs, these tests will fail to compile on wasm targets.


Dependency Concerns

wasm-pack@0.14.0 brings in deprecated transitive packages:

  • binary-install@1.1.2 — marked deprecated in pnpm-lock
  • tar@6.2.1 — deprecated with "widely publicized security vulnerabilities"

These are build-only dev dependencies so user impact is nil, but worth checking whether a newer wasm-pack version clears them.


What is Done Well

  • Schema interning pattern is correctly implemented in wasm: (String, String) key, Box::leak only on first insertion per unique (group, code).
  • ACTION_NOT_FOUND_SCHEMA static is the right call — all fields are compile-time constants, matching the new CLAUDE.md rule.
  • registerTask vs wait_until fix is semantically correct. UserTaskKind::RegisteredTask + shutdown-deadline cancellation is the right pattern; the new Rust test verifies it.
  • requestSaveAndWait for durability is validated by the parity test gate: action must not settle until the save gate is released.
  • Dual-runtime parity test (runtime-parity.test.ts) using fake bindings is a strong addition — no real NAPI or wasm artifacts needed to verify behavioral parity.
  • validate_wasm_serve_config correctly rejects before side effects and is called consistently in both serve and serverless_runtime.
  • Pinned wasm-pack via require.resolve eliminates the npx -y reproducibility hazard.
  • CLAUDE.md entries are concise one-line bullets with the canonical example reference embedded.

This was referenced May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant