Skip to content

chore(rivetkit): wasm support#4860

Merged
NathanFlurry merged 1 commit intomainfrom
04-29-chore_rivetkit_wasm_support
May 2, 2026
Merged

chore(rivetkit): wasm support#4860
NathanFlurry merged 1 commit intomainfrom
04-29-chore_rivetkit_wasm_support

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 30, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch 2 times, most recently from c2da5e3 to cc67013 Compare April 30, 2026 20:24
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 30, 2026

Review: chore(rivetkit): wasm support

This is a substantial Phase 1 PR that adds remote SQLite execution (v4 envoy protocol) and a WASM runtime path (rivetkit-wasm). The architecture is well-designed — the split into native-transport/wasm-transport features, the typed ProtocolCompatibilityError hierarchy, and the RAII inflight guard pattern are all solid. A few issues below.


Bugs / Correctness

WasmActorConfig has dead fields that are silently dropped

rivetkit-typescript/packages/rivetkit-wasm/src/lib.rs defines create_state_timeout_ms, on_create_timeout_ms, on_wake_timeout_ms, on_before_actor_start_timeout_ms, and on_request_timeout_ms in WasmActorConfig, but the From<WasmActorConfig> for ActorConfigInput conversion doesn't map them (they don't exist in ActorConfigInput). Users who set these will get silently ignored config. Either remove them from WasmActorConfig or, if they should be supported, add them to ActorConfigInput with corresponding NAPI exposure.

remote_sqlite_inflight_counter is incremented before the semaphore is acquired

In spawn_tracked_remote_sqlite_task (ws_to_tunnel_task.rs), the inflight counter increments at track_remote_sqlite_inflight before worker_permits.acquire_owned().await. If all 32 worker slots are full, the counter will be non-zero while the task is queued waiting for a permit — so actor_stopped will block for the full remote_sqlite_stop_timeout even if the actual SQL work hasn't started. This is the safer direction (avoid closing the DB while tasks are pending), but worth documenting explicitly since it can make actor stop slower than expected under load.

actor_stopped spawned unconditionally without structured cleanup on drain timeout

The caller in ToRivetEvents now unconditionally spawns actor_stopped and drops the handle. If the SQLite drain times out (the ensure! in actor_lifecycle.rs), the error is logged but the active_actors entry is never removed and the executor cell leaks for the lifetime of the connection. The previous in-line call also swallowed errors with warn!, but consider whether the executor/inflight cleanup should still run on drain timeout to avoid accumulated state.


Convention Violations

Inline #[cfg(test)] mod tests in rivetkit-sqlite-types/src/lib.rs

CLAUDE.md: Rust tests live under tests/, not inline #[cfg(test)] mod tests in src/.
The two tests (execute_result_preserves_result_and_route_metadata and execute_result_projects_query_and_exec_results) should move to rivetkit-rust/packages/rivetkit-sqlite-types/tests/.

Missing doc comment on ActorConfig::remote_sqlite

The adjacent has_database field has a full explanation of what it controls and why. remote_sqlite has none. Add a one-liner explaining it selects SqliteBackend::RemoteEnvoy vs LocalNative.


Performance / Design Notes

handle_message clones Arc<Conn> for every message

The signature change from &Conn to Arc<Conn> adds an atomic refcount inc/dec per parsed message, even for the majority of message types that never need the Arc (all existing SQLite page I/O paths just reborrow with &conn). Consider keeping the parameter as &Arc<Conn> and only cloning in the ActorStateStopped and remote SQL dispatch branches.

Global 32-worker limit per connection, not per actor

REMOTE_SQLITE_WORKER_LIMIT = 32 is per-envoy-connection. A single high-traffic actor can saturate the pool. This is a reasonable starting point, but worth a TODO comment flagging per-actor limiting as a follow-up (the spec already notes this gap).


What's well done

  • Typed ProtocolCompatibilityError (versioned.rs): replacing bail!("sqlite requests require envoy-protocol v2") with a structured error carrying feature, direction, required_version, and target_version is a clear improvement. The compat tests in remote_sql_compat.rs cover all four cross-version directions cleanly.
  • compile_error! mutual-exclusion guards in connection/mod.rs: enforcing exactly one transport feature at compile time is the right approach.
  • RAII RemoteSqliteInflightGuard: incrementing on construction and decrementing on Drop is correct and doesn't require the spawned task to manage cleanup explicitly.
  • OnceCell per (actor_id, generation) executor: lazy init on first SQL request, reused for the generation lifetime, torn down on actor stop — exactly what the spec describes.
  • runtime-import-guard.test.ts: enforcing that raw NAPI/WASM binding imports stay behind the two adapter files is an excellent lightweight architectural lint.
  • rivetkit-sqlite-types extraction: removing the duplicated stub enums (BindParam, ColumnValue, ExecuteResult) under #[cfg(not(feature = "sqlite"))] in favor of a shared crate is a clean improvement.
  • open_database_from_engine generation fence check: the ensure! on generation equality before registering the VFS is the correct guard against stale executor reuse.
  • validate_remote_sqlite_request trust boundary: the namespace + actor state + generation checks are well-placed at the envoy edge per the trust boundary conventions.

Minor

  • scripts/ralph/prd.json, scripts/ralph/progress.txt, scripts/ralph/.last-branch, and the archived copies under scripts/ralph/archive/ look like agent working state included accidentally.
  • .mcp.json deletion is unrelated to the feature; confirm it's intentional.

@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch from cc67013 to 716eee0 Compare April 30, 2026 20:29
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 30, 2026

Code Review: WASM Support + Remote SQLite Execution

This PR adds WASM support for rivetkit-core and introduces remote SQLite execution over the envoy protocol (v4). The architecture is well-structured and the invariants are clearly documented in .agent/specs/rivetkit-core-wasm-support.md. Overall this is a solid implementation — feedback below is mostly medium/minor.


Protocol versioning (engine/sdks/rust/envoy-protocol/src/versioned.rs)

_ => Ok(()) on owned enum variants — CLAUDE.md violation

ensure_to_envoy_v1_compatible and ensure_to_rivet_v1_compatible (and their v2/v3 variants) use _ => Ok(()) as a catch-all on v4::ToEnvoy / v4::ToRivet. Per CLAUDE.md, wildcard fall-through on owned enum types means a future protocol message variant could silently pass a compatibility gate without a compile-time reminder to update the exclusion list.

// ensure_to_envoy_v1_compatible line 128
_ => Ok(()),  // new v5 message types would slip through here

Recommendation: enumerate every variant explicitly in each ensure_*_compatible function, or at minimum document that the _ => Ok(()) arm is deliberately inclusive and describe the invariant that makes it safe (e.g. "all v4 messages not listed above existed in v1").

_ => bail!(...) on v1::ToEnvoy in convert_to_envoy_v1_to_v2

Same rule. v1:: is a published, frozen schema so new variants won't appear in practice, but the same lint applies. _ => unreachable!() would at least make the assertion explicit.


Error classification (rivetkit-rust/packages/rivetkit-core/src/actor/sqlite.rs)

remote_sqlite_error_response uses string matching to distinguish "unavailable" from "execution failed":

fn remote_sqlite_error_response(message: String) -> anyhow::Error {
    if message.contains("unavailable") || message.contains("unsupported") {
        return SqliteRuntimeError::RemoteUnavailable { reason: message }.build();
    }
    SqliteRuntimeError::RemoteExecutionFailed { message }.build()
}

This is fragile — changing the error message wording in the envoy would silently reclassify errors. Since the protocol already has typed union responses (SqliteExecResponse = SqliteExecOk | SqliteFenceMismatch | SqliteErrorResponse), consider wrapping the error earlier in the response-to-anyhow conversion with a typed sentinel (similar to how ProtocolCompatibilityError is used for the unavailable-version case) so callers can downcast_ref deterministically instead of scanning strings.


Message serialization panic (engine/sdks/rust/envoy-client/src/connection/mod.rs)

let encoded = crate::protocol::versioned::ToRivet::wrap_latest(message)
    .serialize(protocol::PROTOCOL_VERSION)
    .expect("failed to encode message");  // panics instead of returning error

The .expect here would take down the entire envoy connection on an unexpected serialization failure. A tracing::error! + early return (or propagating with ?) would be more graceful.


runtime.rs duplicated #[cfg] blocks

native-runtime and the "neither runtime feature" fallback have identical bodies for RuntimeBoxFuture, RuntimeFuture, RuntimeFutureOutput, and RuntimeSpawner::spawn. The duplication compiles correctly but creates maintenance surface. A single #[cfg(not(feature = "wasm-runtime"))] guard would cover both without repeating:

#[cfg(not(feature = "wasm-runtime"))]
pub type RuntimeBoxFuture<T> = Pin<Box<dyn Future<Output = T> + Send>>;

Minor: commented-out code in test support

engine/packages/pegboard-envoy/tests/support/ws_to_tunnel_task.rs has dead code left over from development:

// use scc::HashMap;
// let authorized_tunnel_routes = HashMap::new();
// let authorized_tunnel_routes = HashMap::new();

These should be removed.


What looks good

  • Protocol compatibility gating: ProtocolCompatibilityError { feature: RemoteSqliteExecution, required_version: 4 } is a clean, typed mechanism; rivetkit-core correctly maps it to SqliteRuntimeError::RemoteUnavailable with downcast_ref.
  • Input validation: 1 MiB SQL limit, 1024 param limit, and 1 MiB bind-param limit in ws_to_tunnel_task.rs are applied before the request reaches the executor.
  • Indeterminate result semantics: fail_sent_remote_sqlite_requests_with_indeterminate_result correctly separates "unsent, safe to retry" from "sent, outcome unknown" on WS disconnect.
  • AsyncCounter: wait_zero correctly arms the Notify permit before the counter check, avoiding the race described in CLAUDE.md ("arm the notification before re-checking").
  • Lock choices: parking_lot::Mutex in actor/sqlite.rs has justifying comments ("Forced-sync: native SQLite handles are read from synchronous diagnostic accessors..."). scc::HashMap is used correctly for the actor map in ws_to_tunnel_task.rs.
  • Feature gating: wasm-runtime / sqlite-local gates are consistently applied and connection/mod.rs uses compile_error! to catch mutually exclusive feature combinations at build time.
  • Test coverage: remote_execution_parity.rs and rivetkit-core/tests/sqlite.rs cover protocol conversion, error mapping, and backend selection across feature combinations.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 30, 2026

Review: chore(rivetkit): wasm support (Draft)

This PR lays the infrastructure groundwork for WebAssembly compilation of rivetkit-core. It is a draft but already contains meaningful pieces: feature-gated transport selection, a moved AsyncCounter, stub remote-SQLite protocol handlers, a Serverful/Serverless transition-state bug fix, a guard path filter fix, and a new CI job for the wasm package. The spec docs are thorough and well-structured.


Bug: Serverful actor transition state (actor2/runtime.rs)

This is a genuine bug fix embedded in the PR. Before this change, the handle_stopped path always set Transition::Allocating (with actor_allocation_threshold) regardless of whether the allocation was Serverful or Serverless. Serverful actors should use Transition::Starting with actor_start_threshold. The fix is correct. Worth calling out because it affects production actor lifecycle, not just wasm scaffolding.


AsyncCounter: atomic ordering inconsistency

pub fn increment(&self) {
    self.value.fetch_add(1, Ordering::Relaxed);  // Relaxed
    self.notify_change();
}

pub fn decrement(&self) {
    let prev = self.value.fetch_sub(1, Ordering::AcqRel);  // AcqRel

increment uses Relaxed, decrement uses AcqRel. For a counter whose sole synchronization role is "did all work finish?" the ordering works in practice because wait_zero arms the Notified future before the Acquire load, and the zero-notification path is AcqRel. But Relaxed on increment means a reader can theoretically observe a stale non-zero count on weakly-ordered hardware (ARM), leading to a spurious extra wait loop iteration. Using Release on increment costs nothing and closes the gap.


AsyncCounter: dead-Weak accumulation in zero_observers

change_observers prunes dead Weak entries on every notify_change call (every increment and decrement). But zero_observers is only pruned inside the prev == 1 branch of decrement. If the counter is long-lived and registrations are frequent but it rarely or never hits zero, dead Weak<Notify> entries accumulate indefinitely. Either prune zero_observers in notify_change too, or extract a shared prune helper called from both paths.


AsyncCounter: change_callbacks clone under lock

fn notify_change(&self) {
    let callbacks = self.change_callbacks.lock().clone();
    for callback in callbacks {
        callback();
    }
}

The lock is released before invoking callbacks (correct - avoids re-entrant deadlock), but the Vec<Arc<...>> is cloned while the lock is still held. For the current use case this is acceptable; the only cost is an Arc::clone per registered callback. Not a blocking concern.


Stub SQLite handlers return raw string errors

protocol::ToRivet::ToRivetSqliteExecRequest(req) => {
    send_sqlite_exec_response(
        &conn,
        req.request_id,
        protocol::SqliteExecResponse::SqliteErrorResponse(protocol::SqliteErrorResponse {
            message: "remote sqlite exec handling is not wired".to_string(),
        }),
    )
    .await?;
}

All three stub handlers (Exec, Execute, ExecuteWrite) return a free-form string that crosses the wire to actor code. Per project conventions, actor-facing errors should be structured group/code payloads (e.g. sqlite.remote_unavailable). The spec explicitly names sqlite.remote_unavailable for this case. Free-form strings in wire-facing messages make structured error handling in actor code impossible and can leak internal implementation detail to production.


connection/mod.rs: wasm cfg gating asymmetry

send_initial_metadata, forward_to_envoy, and ws_url are gated with:

#[cfg(any(
    feature = "native-transport",
    all(feature = "wasm-transport", target_arch = "wasm32")
))]

But ws_send is unconditionally pub with no cfg gate. This is fine - ws_send is transport-agnostic - but the asymmetry is surprising. A short comment explaining that ws_send is transport-independent would help.

Also: connection/mod.rs declares mod wasm; behind #[cfg(feature = "wasm-transport")] but no wasm.rs file is present in this diff. The compile_error! guard prevents breakage today, but the wasm-transport feature is currently non-functional even when enabled.


Guard path filter (pegboard_gateway/mod.rs)

if !req_ctx.is_websocket() && !is_actor_http_request_path(req_ctx.path()) {
    return Ok(None);
}

The is_actor_http_request_path logic is correct: strips /request, then accepts empty, /, or ? continuations. This is a good defensive fix - before this, non-WebSocket requests to arbitrary paths would fall through to actor routing logic.


Cargo.toml: rivet-envoy-client default-features = false

[workspace.dependencies.rivet-envoy-client]
path = "engine/sdks/rust/envoy-client"
default-features = false

Disabling default features at the workspace level means every consumer that previously relied on the implicit native-transport default must now explicitly opt in. Audit all workspace consumers of rivet-envoy-client and confirm they have features = ["native-transport"]. A missed opt-in is silent until runtime since the compile_error! only fires when neither feature is present (not when the intended feature is absent).


CI: build-wasm job

The new job is well-structured. One note: the validation step checks for exactly rivetkit_wasm.js, rivetkit_wasm.d.ts, and rivetkit_wasm_bg.wasm by filename, which is sensitive to wasm-pack output naming. If the crate name in Cargo.toml changes, validation silently stops catching regressions. Consider a glob check (test -f pkg/*.wasm) or deriving expected filenames from the package name.


namespace_name added to Conn

Adding namespace_name: String to Conn alongside the existing namespace_id: Id means both fields are present. The spec says remote SQL validation needs namespace checking - confirm both are needed long-term and that namespace_id (UUID) is not made redundant by namespace_name (string).


Minor

  • The indentation fix in init_conn (misaligned fields on the Conn struct literal) is a clean structural fix.
  • Tests in ws_to_tunnel_task.rs are thorough and well-designed. The unique_actor_id helper uses nanosecond wall-clock time; uuid::Uuid::new_v4().to_string() would be more robust, but nanosecond resolution is unlikely to cause collisions in practice.
  • CLAUDE.md additions follow the one-line-bullet convention and document genuine invariants rather than wiring.

Summary

The PR scaffolds wasm transport gating, moves AsyncCounter, adds stub protocol handlers, and fixes the Serverful transition-state bug. Three issues worth resolving before merge:

  1. Raw-string errors in stub SQLite handlers - replace with structured sqlite.remote_unavailable error codes before these handlers go to production.
  2. AsyncCounter::increment memory ordering - use Release instead of Relaxed to close the ARM ordering gap.
  3. Workspace default-features change - audit all consumers of rivet-envoy-client to ensure explicit native-transport opt-ins after disabling the workspace default.

@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch from 716eee0 to 8da73a1 Compare April 30, 2026 20:35
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

Preview packages published to npm

Install with:

npm install rivetkit@pr-4860

All packages published as 0.0.0-pr.4860.2a8ccdf with tag pr-4860.

Engine binary is shipped via @rivetkit/engine-cli on linux-x64-musl, linux-arm64-musl, darwin-x64, and darwin-arm64. Windows users should use the release installer or set RIVET_ENGINE_BINARY.

Docker images:

docker pull rivetdev/engine:slim-2a8ccdf
docker pull rivetdev/engine:full-2a8ccdf
Individual packages
npm install rivetkit@pr-4860
npm install @rivetkit/react@pr-4860
npm install @rivetkit/rivetkit-napi@pr-4860
npm install @rivetkit/workflow-engine@pr-4860

@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch 3 times, most recently from 6213c3a to 4966b21 Compare May 2, 2026 06:38
@NathanFlurry NathanFlurry mentioned this pull request May 2, 2026
11 tasks
@NathanFlurry NathanFlurry force-pushed the 04-29-feat_sqlite_add_cold_read_benchmarks_and_simplify_optimizations branch from 3412c8f to b178905 Compare May 2, 2026 21:32
@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch from b084261 to 7600ef8 Compare May 2, 2026 21:32
@NathanFlurry NathanFlurry force-pushed the 04-29-feat_sqlite_add_cold_read_benchmarks_and_simplify_optimizations branch from b178905 to e5c68c2 Compare May 2, 2026 22:13
@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch from 7600ef8 to ecba65a Compare May 2, 2026 22:13
@NathanFlurry NathanFlurry force-pushed the 04-29-feat_sqlite_add_cold_read_benchmarks_and_simplify_optimizations branch from e5c68c2 to 9d83f98 Compare May 2, 2026 22:44
@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch from ecba65a to 4bb9291 Compare May 2, 2026 22:44
Base automatically changed from 04-29-feat_sqlite_add_cold_read_benchmarks_and_simplify_optimizations to main May 2, 2026 22:44
@NathanFlurry NathanFlurry merged commit 4bb9291 into main May 2, 2026
4 of 10 checks passed
@NathanFlurry NathanFlurry deleted the 04-29-chore_rivetkit_wasm_support branch May 2, 2026 22:44
This was referenced May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant