Skip to content

fix(rivetkit): preserve internal bridge errors#4993

Open
NathanFlurry wants to merge 1 commit intomainfrom
runtime-options/preserve-bridge-errors
Open

fix(rivetkit): preserve internal bridge errors#4993
NathanFlurry wants to merge 1 commit intomainfrom
runtime-options/preserve-bridge-errors

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented May 7, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 7, 2026

PR Review: fix(rivetkit): preserve internal bridge errors

Overview

This PR makes four related improvements:

  1. Eliminates Box::leak memory leaks for bridge errors in actor_factory.rs, napi_actor_events.rs, and rivetkit-wasm/src/lib.rs by replacing the interned-schema pattern with a new RivetErrorKind::Dynamic variant.
  2. Adds ActorSpecifier to the error system, allowing callers to attach actor identity (id, generation, key) to errors as anyhow context so it survives chain traversal and is visible to clients.
  3. Bumps client protocol to v4, adding actor: optional<ActorSpecifier> to Error and HttpResponseError BARE types with proper upgrade/downgrade paths.
  4. Masks private error details at the client boundary via client_error_message() / client_error_metadata() — internal errors get generic messages, only public error codes pass their real message and metadata through.

The structural work is well-thought-out and the test coverage is thorough.


Issues

impl std::error::Error for ActorSpecifier (engine/packages/error/src/error.rs)

ActorSpecifier is not an error — it is an identity tag. Implementing std::error::Error solely to make it usable as anyhow context is an abuse of the trait. This shows up in IDE tooling, source() chains, and any code that inspects error chains expecting error-type semantics. A dedicated newtype (e.g. struct ActorContext(ActorSpecifier)) would be cleaner. If keeping the current approach, add a comment explaining why.

actor_id not sanitized in HTTP response headers (rivetkit-core/src/registry/http.rs, attach_actor_response_headers)

The key field is correctly guarded against CRLF injection:

if !key.contains(chr_r) && !key.contains(chr_n) {
    response.headers.insert(HEADER_RIVET_ACTOR_KEY...);
}

But actor_id is inserted without the same guard. Since actor IDs are UUIDs in practice this is low-risk, but the inconsistency is worth noting. Either apply the same guard or add a comment documenting why actor IDs are trusted.

Misleading comment in RivetError::extract (engine/packages/error/src/error.rs)

The comment says:

anyhow::Error::downcast_ref walks both the chain and any .context(...) wrappers, so this finds an ActorSpecifier no matter where it was attached.

downcast_ref only checks the outermost error — it does not walk the chain. The search for ActorSpecifier is only on the outermost context layer via error.downcast_ref::<ActorSpecifier>(), meaning if ActorSpecifier is buried mid-chain under another context wrapper it would not be found. Verify this is intentional and correct the comment.

callback_error fallback now loses structured error type (rivetkit-napi/src/actor_factory.rs)

Previously, unrecognized NAPI callback errors returned a structured JsCallbackUnavailable error. The new fallback is anyhow::anyhow!(reason). This means unrecognized NAPI failures silently become core.internal_error without any distinguishing code, losing the JsCallbackUnavailable diagnosis path. The removed tracing logs also make these failures harder to observe. Consider preserving the structured error type here.

Stale error message in impl_versioned_v3_only! macro (rivetkit-rust/packages/client-protocol/src/versioned.rs)

The bail message in deserialize_version / serialize_version still reads "only exists in client protocol v3" even though the macro now supports v4. Minor but will confuse future debugging.


Observations

format_actor_key with empty keysactor_specifier_for_instance always calls .with_key(format_actor_key(...)), setting a non-None key even when the actor has no meaningful key. Clients may receive key: "" in actor specifiers. Consider keeping key as None when the formatted value is empty.

Duplicate test bodies — Tests appear in both the new moved location (tests/modules/registry_http.rs) and the old shim (tests/registry_http.rs). Looks intentional per the CLAUDE.md move-test pattern, but confirming the shim is a module re-export rather than duplicate logic would help.

WS "outgoing message too long" has actor: None (websocket.rs) — Fine since the actor specifier is not available at that call site, but it is the one WS error that will never carry actor context.


Positives

  • Memory leak elimination: Removal of the interned Box::leak / SccHashMap schema caches in favor of owned RivetErrorKind::Dynamic is the right call. The updated CLAUDE.md bullet codifies the new rule correctly.
  • Error masking: The client_error_message / client_error_metadata gate cleanly prevents leaking private details like SQL text, stack traces, or internal state from actor errors.
  • Protocol versioning: The v3→v4 upgrade/downgrade converters correctly null out actor when downgrading, preserving backward compat with older clients.
  • Test coverage: New tests cover the sanitization boundary, actor specifier serialization, header presence, and log field assertions — all the right things to pin.

@NathanFlurry NathanFlurry force-pushed the runtime-options/preserve-bridge-errors branch from 5613480 to 814b9ab Compare May 7, 2026 04:26
@NathanFlurry NathanFlurry force-pushed the 05-06-fix_rivetkit_fix_default_spawn_engine_behavior branch from 7685070 to 077bd4e Compare May 7, 2026 04:26
@NathanFlurry NathanFlurry changed the base branch from 05-06-fix_rivetkit_fix_default_spawn_engine_behavior to graphite-base/4993 May 7, 2026 04:27
@NathanFlurry NathanFlurry force-pushed the runtime-options/preserve-bridge-errors branch from 814b9ab to 811f9c5 Compare May 7, 2026 04:28
@NathanFlurry NathanFlurry changed the base branch from graphite-base/4993 to runtime-options/remove-deprecated-actor-options May 7, 2026 04:28
@NathanFlurry NathanFlurry changed the base branch from runtime-options/remove-deprecated-actor-options to graphite-base/4993 May 7, 2026 04:34
@NathanFlurry NathanFlurry force-pushed the runtime-options/preserve-bridge-errors branch from 811f9c5 to b24f3be Compare May 7, 2026 04:34
@NathanFlurry NathanFlurry force-pushed the graphite-base/4993 branch from eb0a964 to 395aa83 Compare May 7, 2026 04:34
@NathanFlurry NathanFlurry changed the base branch from graphite-base/4993 to main May 7, 2026 04:34
@NathanFlurry NathanFlurry marked this pull request as ready for review May 7, 2026 04:34
@NathanFlurry NathanFlurry force-pushed the runtime-options/preserve-bridge-errors branch from b24f3be to f5ede57 Compare May 7, 2026 07:03
@NathanFlurry NathanFlurry force-pushed the runtime-options/preserve-bridge-errors branch from f5ede57 to ac0098e Compare May 8, 2026 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant