test(kitchen-sink): add sqlite memory soak harness by NathanFlurry · Pull Request #4910 · rivet-dev/rivet

NathanFlurry · 2026-05-03T22:47:20Z

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

railway-app · 2026-05-03T22:47:29Z

🚅 Deployed to the rivet-pr-4910 environment in rivet-frontend

Service	Status	Web	Updated (UTC)
kitchen-sink	❌ Build Failed (View Logs)	Web	May 4, 2026 at 4:13 am
website	😴 Sleeping (View Logs)	Web	May 3, 2026 at 10:59 pm
ladle	❌ Build Failed (View Logs)	Web	May 3, 2026 at 10:54 pm
mcp-hub	✅ Success (View Logs)	Web	May 3, 2026 at 10:50 pm
frontend-cloud	❌ Build Failed (View Logs)	Web	May 3, 2026 at 10:49 pm
frontend-inspector	❌ Build Failed (View Logs)	Web	May 3, 2026 at 10:49 pm

NathanFlurry · 2026-05-03T22:47:34Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

fix(pegboard): restore hibernating request ids #4898 : 3 dependent PRs (#4899 , #4916 , #4917 )
chore(guard): configure local timeout knobs #4913
docs(rivetkit): raise serverless start payload limit #4911
test(kitchen-sink): add sqlite memory soak harness #4910 👈 (View in Graphite)
feat(rivetkit): expose runtime diagnostics #4909
chore(depot-client): migrate to worker thread #4907 : 2 other dependent PRs (#4912 , #4918 )
test(driver): select matrix cells with vitest filters #4906
fix(pegboard): restore hibernating requests on serverful start #4904 : 1 other dependent PR (#4905 )
fix(rivetkit): drain shutdown work before sleep #4902 : 1 other dependent PR (#4903 )
feat(sqlite): wire remote sqlite execute handlers #4897
test(serverless): avoid pinning header validation order #4896
chore(driver-test-runner): add native/wasm runtime matrix to skill #4884 : 3 other dependent PRs (#4892 , #4894 , #4901 )
chroe: rename rivetkit-sqlite to depot-client #4883
fix(rivetkit-wasm): fix mem leaks #4880
chore(rivetkit): wasm support #4860
feat(sqlite): add cold read benchmarks and simplify optimizations #4857
feat(sqlite): benchmark cold reads #4848 : 1 other dependent PR (#4864 )
docs(actors): document manual lifecycle controls #4847
test(sqlite): expand db stress coverage #4846 : 1 other dependent PR (#4849 )
test(raw-websocket): add serverless smoke coverage #4845
feat(serverless): configure drain grace period #4844
fix(publish): enable frontend build for engine target and use turbo build instead of turbo build:engine #4843
feat(rivetkit): expose conn handle in onWebSocket context #4842
fix(rivetkit): normalize waitUntil/keepAwake promises to null to avoid serde undefined error #4841
fix(rivetkit): remove requirement for token #4839 : 1 other dependent PR (#4840 )
fix(pegboard-envoy): allow sqlite requests across lifecycle states #4838
fix(pegboard): route scoped actor key reads by runner dc #4837
fix(sqlite-storage): let sqlite open take over stale owners #4834
fix(engine-tests): parallelize multi-DC startup + enable 6 multi-DC tests #4833
fix(api): honor cursor in actor_ids list path #4832
fix(api): silently filter invalid IDs from actor_ids query instead of 400 #4831
fix(api-peer): make actors_delete idempotent for already-destroyed actors #4830
fix(pegboard): use Iterator streaming mode in list_names so limit param is honored #4829
chore(engine-tests): convert envoy tests to Sleep crash policy + triage ignored #4828
feat(engine-tests): wire envoy actor names through prepopulate_actor_names #4827
chore(depot): fault injection tests #4868
feat(sqlite): pitr & forking #4859 : 1 other dependent PR (#4863 )
chore(sqlite): stateless storage refactor #4856
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

claude · 2026-05-03T22:51:10Z

PR Review: test(kitchen-sink): add sqlite memory soak harness

Overview

This PR adds a SQLite memory soak test harness for the kitchen-sink example, consisting of:

Test actor (sqlite-memory-pressure.ts): A Rivet Actor that exercises SQLite under load (bulk inserts, scans, aggregates, and integrity checks).
Soak harness (sqlite-memory-soak.ts): Spawns engine + kitchen-sink, drives concurrent actors, samples /proc memory metrics, and verifies sleep-based memory reclamation.
Report generator (proc-metrics-report.ts): Offline Plotly-based HTML chart renderer from the harness JSONL event log.
Agent notes (sqlite-memory-soak-issues.md): Run diary documenting four soak runs, open issues, and theories.

Overall this is solid exploratory tooling. The structured JSONL event log + offline renderer is a clean design. A few issues worth addressing before relying on results from this harness:

Issues

Correctness

quantile is off-by-one (sqlite-memory-soak.ts): sorted[Math.floor(q * sorted.length)] is wrong — for p95 on a 100-element array it returns sorted[95] (the 96th element). Use sorted[Math.floor(q * (sorted.length - 1))] instead. Currently gives a slightly pessimistic reading of latency percentiles.
copyNativeMetrics camelCase/snake_case dual-handling (sqlite-memory-pressure.ts): The defensive property-name fallback (e.g. m.pageCache ?? m.page_cache) suggests SqliteNativeMetrics returns inconsistent field names from the NAPI layer. The workaround is correct but masks an upstream bug. Per the layer constraints in CLAUDE.md, NAPI bindings must translate cleanly — the inconsistency should be fixed in rivetkit-napi rather than papered over in a test file.
configureServerlessRunner uses datacenters[0] without name matching: If the local engine returns multiple datacenters or in an unexpected order, the wrong one gets configured. Add a filter by a known datacenter name/tag.

Reliability

Sleep verification is racy (waitForActorSleepLog): The log-scraping approach polls at 100ms intervals against a saved byte offset. Log buffering delays are exactly what caused the "timed out waiting for actor sleeping log" failure documented in the notes. Consider using the engine's actor status API to poll for sleep state instead of scraping log text — that removes the dependency on log buffering and text matching.
Drain after startup failure hangs indefinitely: When failed = true stops new spawns in runWithSpikeConcurrency, already-running workers continue and Promise.allSettled(workers) waits forever if any worker is stuck (e.g. actor_ready_timeout). The notes document this as Issue 3. A drain grace period (e.g. Promise.race([Promise.allSettled(workers), sleep(drainTimeoutMs)])) would prevent the harness from hanging.
DEFAULT_ENGINE_PORT = 6520 vs blocked port 6420: The harness blocks port 6420 to avoid colliding with the normal local engine, but starts its own engine on 6520. If a dev engine is already on 6520, the failure at waitForHttpOk won't have a clear message. Consider also blocking 6520 with an explicit early error, or documenting that only non-default ports are supported.

Code Quality

Commented-out dead code in the test actor: The releaseStorage action body and the runCycle delete/VACUUM block are fully commented out with inline context. The intent is clear, but dead code in a test actor can confuse future readers about which invariants are active. Either remove and note intent in a doc comment, or gate on a runtime argument.
--actors pool exhaustion is silent in fixed-concurrency mode: The default is 4 actors, but only runWithSpikeConcurrency throws when the pool is exhausted. In fixed-concurrency mode, workers silently stop when nextActor >= args.actors. This would produce a misleading measurement if a user forgets to set --actors high enough.

Positive Notes

The JSONL event log + offline renderer split is clean — measurement and reporting are fully decoupled.
/proc-based sampling (RSS, HWM, VmSize, smaps PSS, IO, FD count) gives a much more complete picture than process.memoryUsage() alone.
MALLOC_ARENA_MAX=2 + MALLOC_TRIM_THRESHOLD_ defaults are good for a soak measuring RSS reclamation.
Triangle-wave spike shape (runWithSpikeConcurrency) provides a repeating, predictable load profile that's easy to correlate with memory traces.
The agent notes are thorough — capturing raw numbers, open issues, and theories for each run is exactly the kind of context that makes soak results usable later.

Summary

The two most impactful fixes are the sleep verification reliability (log-scraping vs. status API) and the unbounded drain on failure — both directly caused run failures documented in the notes. The quantile off-by-one and the NAPI field naming inconsistency are correctness issues worth fixing before drawing latency conclusions. The rest are minor quality improvements.

NathanFlurry · 2026-05-04T09:06:49Z

Landed in main via stack-merge fast-forward push. Commits are in main; closing to match.

railway-app Bot temporarily deployed to rivet-frontend / rivet-pr-4910 May 3, 2026 22:47 Destroyed

This was referenced May 3, 2026

docs(rivetkit): raise serverless start payload limit #4911

Closed

feat(rivetkit): expose runtime diagnostics #4909

Closed

fix(depot-client): bound sqlite vfs cache #4908

Closed

NathanFlurry force-pushed the sqlite-soak/runtime-diagnostics branch from e602008 to 6728e22 Compare May 3, 2026 22:50

NathanFlurry force-pushed the sqlite-soak/kitchen-harness branch from b6516dd to fc4d9c4 Compare May 3, 2026 22:50

railway-app Bot temporarily deployed to rivet-frontend / rivet-pr-4910 May 3, 2026 22:50 Destroyed

This was referenced May 3, 2026

fix(pegboard): restore hibernating requests on serverful start #4904

Closed

test(driver): wait for wasm db sleep recovery #4905

Closed

test(driver): select matrix cells with vitest filters #4906

Closed

chore(depot-client): migrate to worker thread #4907

Closed

NathanFlurry force-pushed the sqlite-soak/kitchen-harness branch from fc4d9c4 to 534a0e9 Compare May 4, 2026 00:04

NathanFlurry force-pushed the sqlite-soak/runtime-diagnostics branch from 6728e22 to 3066210 Compare May 4, 2026 00:04

railway-app Bot temporarily deployed to rivet-frontend / rivet-pr-4910 May 4, 2026 00:04 Destroyed

NathanFlurry marked this pull request as ready for review May 4, 2026 00:31

NathanFlurry force-pushed the sqlite-soak/kitchen-harness branch from 534a0e9 to 1d7905a Compare May 4, 2026 00:32

NathanFlurry force-pushed the sqlite-soak/runtime-diagnostics branch from 3066210 to 134c537 Compare May 4, 2026 00:32

railway-app Bot temporarily deployed to rivet-frontend / rivet-pr-4910 May 4, 2026 00:32 Destroyed

test(kitchen-sink): add sqlite memory soak harness

0e3a03e

NathanFlurry force-pushed the sqlite-soak/kitchen-harness branch from 1d7905a to 0e3a03e Compare May 4, 2026 04:13

railway-app Bot temporarily deployed to rivet-frontend / rivet-pr-4910 May 4, 2026 04:13 Destroyed

NathanFlurry closed this May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(kitchen-sink): add sqlite memory soak harness#4910

test(kitchen-sink): add sqlite memory soak harness#4910
NathanFlurry wants to merge 1 commit intosqlite-soak/runtime-diagnosticsfrom
sqlite-soak/kitchen-harness

NathanFlurry commented May 3, 2026

Uh oh!

railway-app Bot commented May 3, 2026 •

edited

Loading

Uh oh!

NathanFlurry commented May 3, 2026 •

edited

Loading

Uh oh!

claude Bot commented May 3, 2026 •

edited

Loading

Uh oh!

NathanFlurry commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NathanFlurry commented May 3, 2026

Description

Type of change

How Has This Been Tested?

Checklist:

Uh oh!

railway-app Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NathanFlurry commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: test(kitchen-sink): add sqlite memory soak harness

Overview

Issues

Correctness

Reliability

Code Quality

Positive Notes

Summary

Uh oh!

NathanFlurry commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

railway-app Bot commented May 3, 2026 •

edited

Loading

NathanFlurry commented May 3, 2026 •

edited

Loading

claude Bot commented May 3, 2026 •

edited

Loading