Skip to content

fix(rivetkit): keep actors awake until keepAwake work finishes#4937

Merged
NathanFlurry merged 1 commit intomainfrom
05-04-fix_rivetkit_keepawake_sleep_grace
May 5, 2026
Merged

fix(rivetkit): keep actors awake until keepAwake work finishes#4937
NathanFlurry merged 1 commit intomainfrom
05-04-fix_rivetkit_keepawake_sleep_grace

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented May 4, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 4, 2026

Review: fix(rivetkit): keep actors awake until keepAwake work finishes

Overview

This PR fixes a real race condition where actors could have their JS-side sleep cleanup (database close, context disposal) run while keepAwake work was still in progress. The fix has two components:

  1. Rust side: A new KeepAwakeRegion RAII guard (modeled after WebSocketCallbackRegion) that synchronously increments the core keep-awake counter. This eliminates the old async counter-arm race where the guard increment was delayed until first poll of the Rust future.
  2. TypeScript side: A keepAwakeCount JS counter that defers the onSleep cleanup (DB close, context dispose) until all tracked promises settle.

The approach is sound and well-targeted. A few specific issues worth addressing.


Issues

Bug: deferred cleanup can execute against a live actor

The deferred cleanup path in native.ts:4013-4023 is triggered when keepAwakeCount reaches 0 after deferSleepCleanupUntilKeepAwakeIdle is set. These fields are never reset when the actor wakes. The problematic sequence:

  1. keepAwake(longWork) called during an action -> keepAwakeCount = 1, Rust region created
  2. Explicit c.sleep() fires -> actor sleeps -> cleanup deferred because keepAwakeCount > 0
  3. Actor woken by an incoming action before longWork finishes
  4. longWork eventually settles -> keepAwakeCount hits 0 -> cleanupDeferredNativeSleepRuntimeState runs
  5. Cleanup calls closeNativeDatabaseClient / closeNativeSqlDatabase / clearNativeRuntimeState on the currently live actor

The fields keepAwakeCount, deferSleepCleanupUntilKeepAwakeIdle, and deferredSleepCleanupActorCtx need to be reset in the wake path, or the deferred cleanup needs a generation counter to abort if the actor has since woken.

Unhandled error from deferred cleanup

cleanupDeferredNativeSleepRuntimeState is called inside a .finally(async () => {...}) callback. If it throws, the rejection propagates into trackedPromise which is passed fire-and-forget to native with no .catch() -- the error silently becomes an unhandled promise rejection. Add a .catch(error => logger().error(...)) after the .then(() => null) on trackedPromise, or wrap the cleanup call in a try/catch with logging.

WASM deprecation warning via console_error

rivetkit-wasm/src/lib.rs:1381 uses console_error to emit a deprecation notice. That maps to console.error in the browser and will appear as a hard error in devtools. Use web_sys::console::warn_1 or remove the log if the old binding stays as a forwarding shim.

begin_keep_awake allocation inconsistent with begin_websocket_callback

begin_websocket_callback uses a simple monotonic counter with no collision check. begin_keep_awake uses a collision-avoidance loop up to u32::MAX iterations. The extra complexity is unnecessary -- the same simple pattern is sufficient.

actorBeginKeepAwake / actorEndKeepAwake are unused in the TypeScript layer

Both are added to the CoreRuntime interface (runtime.ts:392-393) and implemented in both runtimes, but nothing in native.ts or ActorContextHandleAdapter calls them. If they are placeholders for future use, add a comment; otherwise remove them from the interface.


Minor / Style

  • Redundant Drop impl on KeepAwakeRegion (context.rs:1678): self.guard.take() in Drop is equivalent to letting Option<KeepAwakeGuard> drop naturally. Mirrors WebSocketCallbackRegion -- no action needed unless both are cleaned up together.
  • Comment documents a delta (context.rs:1681): "Take the guard explicitly to mirror WebSocketCallbackRegion." Per project conventions, avoid documenting that something mirrors another thing. Can be removed.
  • SQL string interpolation in test fixtures (sleep-db.ts:175-189): Event strings are interpolated directly into SQL. Low-risk in tests, but parameterized queries would match production patterns.

What works well

  • Synchronous KeepAwakeRegion creation via callNativeSync correctly eliminates the old async counter-arm race.
  • The registered flag in keepAwake correctly prevents double-decrement when native registration throws.
  • keepAwakeCount is incremented before callNativeSync, so the sleep cleanup observer always sees JS work even if the promise settles immediately.
  • Tests cover single and nested keepAwake with DB log ordering assertions; the indexOf comparison approach is resilient to timing variation.
  • reset_runtime_state in Rust clears keep_awake_regions on wake, bounding Rust-level region leaks to one per wake cycle.

The live-actor deferred cleanup bug is the main concern before merging. The rest are refinements.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Preview packages published to npm

Install with:

npm install rivetkit@pr-4937

All packages published as 0.0.0-pr.4937.f0c1a00 with tag pr-4937.

Engine binary is shipped via @rivetkit/engine-cli on linux-x64-musl, linux-arm64-musl, darwin-x64, and darwin-arm64. Windows users should use the release installer or set RIVET_ENGINE_BINARY.

Docker images:

docker pull rivetdev/engine:slim-f0c1a00
docker pull rivetdev/engine:full-f0c1a00
Individual packages
npm install rivetkit@pr-4937
npm install @rivetkit/react@pr-4937
npm install @rivetkit/rivetkit-napi@pr-4937
npm install @rivetkit/workflow-engine@pr-4937

@NathanFlurry NathanFlurry force-pushed the 05-04-fix_rivetkit_keepawake_sleep_grace branch from e887f7e to d3d9f0e Compare May 4, 2026 13:58
Base automatically changed from 05-04-fix_rivetkit_persist_onsleep_error_state to main May 5, 2026 14:57
@NathanFlurry NathanFlurry merged commit d3d9f0e into main May 5, 2026
23 of 29 checks passed
@NathanFlurry NathanFlurry deleted the 05-04-fix_rivetkit_keepawake_sleep_grace branch May 5, 2026 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant