Skip to content

feat: SQLite VFS Pool - share WASM instances across actors#4454

Open
NathanFlurry wants to merge 27 commits intomainfrom
ralph/sqlite-vfs-pool
Open

feat: SQLite VFS Pool - share WASM instances across actors#4454
NathanFlurry wants to merge 27 commits intomainfrom
ralph/sqlite-vfs-pool

Conversation

@NathanFlurry
Copy link
Member

Summary

  • Adds SqliteVfsPool to share WASM SQLite instances across multiple actors, reducing memory from ~16.6 MB per actor to ~16.6 MB per 50 actors (200 actors: 3.3 GB -> 66 MB)
  • Implements bin-packing assignment, idle timer scale-down, short name lifecycle with poisoning, WASM module caching via promise memoization, and graceful shutdown
  • Integrates pool into file-system and engine drivers via ActorDriver.createSqliteVfs(actorId) interface change
  • Includes adversarial review findings (US-018 through US-032) covering: prefix collision fix, Database.close() idempotency, concurrent acquire() deduplication, trackOp coverage for all DB operations, TOCTOU race fix, and more

Key changes

  • @rivetkit/sqlite-vfs: SqliteSystem supports multiple registered files, SqliteVfs supports multiple open databases per instance, new SqliteVfsPool and PooledSqliteHandle classes
  • rivetkit drivers: ActorDriver.createSqliteVfs() now accepts actorId, FS and engine drivers use pool
  • ISqliteVfs interface: Extracted for interchangeable use of SqliteVfs and PooledSqliteHandle
  • Defensive fixes: HEAPU8 re-read after await, Database.close() idempotency guard, forceCloseByFilePrefix exact match

Test plan

  • Run cd rivetkit-typescript/packages/sqlite-vfs && npx tsc --noEmit to verify typecheck
  • Run existing driver tests: pnpm test driver-file-system
  • Verify pool behavior with multiple actors sharing one instance
  • Test idle timer scale-down (instance destroyed after grace period with no actors)
  • Test burst acquisition (200 concurrent acquire() calls)
  • Test short name poisoning on failed close
  • Verify no prefix collision (releasing actor "1" does not affect actor "10")

🤖 Generated with Claude Code

NathanFlurry and others added 27 commits March 17, 2026 21:37
Added US-018 through US-032 covering bugs found during adversarial
review: prefix collision in forceCloseByFilePrefix, Database.close()
idempotency, WASM promise rejection caching, trackOp coverage gaps,
concurrent acquire() burst duplication, and more.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@railway-app
Copy link

railway-app bot commented Mar 18, 2026

🚅 Deployed to the rivet-pr-4454 environment in rivet-frontend

Service Status Web Updated (UTC)
website ❌ Build Failed (View Logs) Web Mar 18, 2026 at 6:23 am
frontend-cloud ❌ Build Failed (View Logs) Web Mar 18, 2026 at 6:22 am
frontend-inspector ❌ Build Failed (View Logs) Web Mar 18, 2026 at 6:22 am
ladle ❌ Build Failed (View Logs) Web Mar 18, 2026 at 6:21 am
mcp-hub ✅ Success (View Logs) Web Mar 18, 2026 at 6:21 am

@claude
Copy link

claude bot commented Mar 18, 2026

PR Review: SQLite VFS Pool

This is a substantial and well-designed PR that solves a real memory problem. The adversarial review process (US-018 through US-032) shows careful attention to edge cases. Here are my observations:


Must Address Before Merging

1. scripts/ralph/ directory should not be committed

The scripts/ralph/ directory contains internal agent orchestration artifacts (CLAUDE.md, prd.json, progress.txt, ralph.sh). These are development tooling files that don't belong in the main repo — they expose internal implementation details and add noise to the codebase history. Please remove them before merging or add to .gitignore.

2. US-027 through US-032 have passes: false

The prd.json in scripts/ralph/ shows that 6 of the 32 user stories are unverified/failing. Even if these are tracked separately, it would help to clarify in the PR description which ones remain open, whether they're blocking, and if there are follow-up issues filed.


Code Concerns

3. forceCloseByFilePrefix naming mismatch

Despite being named ByFilePrefix, the method uses exact match (not startsWith). The comment explains why (to avoid '1' matching '10', '11'), but the method name is actively misleading. Consider renaming to forceCloseByFileName or forceCloseExact to match its actual semantics.

4. Short name exhaustion under repeated poisoning

When an actor closes uncleanly, its short name is "poisoned" and never reused. If actors crash frequently enough, a PoolInstance could exhaust all available short names without ever reaching capacity in terms of concurrent actors. There should be a safety valve: either a maximum poison count after which the instance is destroyed and rebuilt, or a documented upper bound on how many short names exist per instance.

5. Mutex scope narrowing in db/mod.ts and db/drizzle/mod.ts

The mutex is now scoped only to guard the closed flag rather than wrapping entire DB operations. This is correct if database operations are otherwise safe to call concurrently (SQLite in WAL mode is generally fine for reads, but writes serialize internally). However, this is a behavioral change from the original design — if the mutex was previously protecting against concurrent writes at the application layer, removing it could surface issues. A comment explaining why the narrower scope is safe would help future readers.

6. WASM module cache — failed compilation not cleared

In #getModule(), the compiled module promise is cached. If the initial WebAssembly.compile() call fails (e.g., out of memory, malformed WASM), the rejected promise will be stored in #wasmModuleCache and all subsequent acquire() calls will immediately reject with the same error. The cache should be cleared on rejection so a retry is possible:

this.#wasmModuleCache = WebAssembly.compile(wasmBuffer).catch(err => {
  this.#wasmModuleCache = undefined;
  throw err;
});

Positive Observations

  • HEAPU8 re-read after await is a critical correctness fix that's easy to miss. Good catch.
  • Database.close() idempotency via the #closed flag is clean and correct.
  • ISqliteVfs interface extraction is the right abstraction to enable pool and standalone VFS to be used interchangeably without coupling the actor layer to the pool implementation.
  • #resolveFile O(1) lookup (Map vs. linear scan) is a straightforward improvement.
  • TrackedDatabase wrapping all operations with opsInFlight provides a reliable drain-before-destroy guarantee.
  • Promise-level deduplication for concurrent acquire() calls is the correct pattern to avoid creating multiple WASM instances under concurrent startup load.
  • Double-release guard in PooledSqliteHandle prevents subtle bugs from callers accidentally releasing twice.

Minor

  • wasm.d.ts provides minimal WebAssembly type declarations for Node. Make sure these don't conflict with lib.dom.d.ts declarations in browser-targeted builds — checking tsconfig.json lib targets would be worthwhile.
  • The idleDestroyMs default (30s) seems reasonable, but it would be worth documenting the tradeoff: lower values save memory faster but add WASM recompilation overhead on the next actor start.

Overall this is a solid change that addresses a meaningful operational problem. The main blockers are removing the scripts/ralph/ artifacts and clarifying the status of the unverified user stories.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant