Skip to content

feat(kitchen-sink): add mock agentic lifecycle loop#4914

Closed
NathanFlurry wants to merge 1 commit intomock-agentic-loop/kitchen-sink-examplefrom
05-03-feat_kitchen-sink_add_mock_agentic_lifecycle_loop
Closed

feat(kitchen-sink): add mock agentic lifecycle loop#4914
NathanFlurry wants to merge 1 commit intomock-agentic-loop/kitchen-sink-examplefrom
05-03-feat_kitchen-sink_add_mock_agentic_lifecycle_loop

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented May 4, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 4, 2026

PR Review: feat(kitchen-sink): add mock agentic lifecycle loop

Overview

This PR substantially expands the mock agentic lifecycle test harness in examples/kitchen-sink. The script gains the ability to auto-spawn a local kitchen-sink server, run a concurrent probe loop validating actor liveness during sleep/reconnect cycles, track close events, validate full history consistency at teardown, and report aggregate statistics. The actor gains onSleep delay, ping/pong WebSocket support, and a verifyAll action.


Issues

1. Timer leak in runProbeAttempt

The timeout() helper creates a setTimeout that is never cancelled when Promise.race resolves via a competing branch. Each probe attempt leaks up to two pending timers (open and pong phases). The inner pong promise already uses cleanup() correctly -- apply the same pattern to the outer timeout() helper.

2. Platform-specific process management

pidsWithEnvValue reads from a Linux-only pseudo-filesystem, failing silently or erroring on macOS. listenerPids uses lsof flags that differ across platforms. Add a runtime platform guard (return empty array on non-Linux) or document the assumption.

3. Hard-coded 2-second sleeps in cleanup helpers

stopListeners and stopProcessesWithEnvValue sleep 2 seconds after SIGTERM then force-kill. Polling the port check with a short interval would be more reliable.

4. DEFAULT_ON_SLEEP_DELAY_MS = 15_000 has no comment

This 15-second delay in onSleep is non-obvious and extends test runtime. Per CLAUDE.md, add a comment explaining why it exists (presumably to let probe connections observe the actor stopping).

5. IIFE for final history validation is hard to read

The inline self-invoking async function around line 1121 should be extracted as a named function (e.g. getFinalHistory()).

6. Double cast as unknown as ActionVerifier

Bypasses the type system entirely. If actor action signatures change there is no compile-time check. Define the interface against the typed handle's action surface.

7. Unbounded backlog in RawSession

The #backlog array has no size cap. Add a cap (~500 entries) and shift off the oldest when exceeded.

8. Repeated reconnect-time check

if (reconnectMs > MAX_RECONNECT_MS) throw appears three times in runWorkload. Extract into a helper.


Minor

  • Removed verifier param from runInference: good -- verification is now explicit at the call site with layered checks.
  • connect() returns 0 when already open (reads as "opened instantly" vs "was already open"). A named result would be clearer.
  • --import args in startLocalKitchenSinkServer: add a comment that @rivetkit/sql-loader is required for SQLite.

Summary

Architecture is sound; the probe loop + sleep-close observation pattern is a solid approach for validating lifecycle behavior under reconnect. Main blocking issue: timer leak in runProbeAttempt. Linux-only assumptions and the onSleep delay need documentation. IIFE and double-cast are maintainability items.

@NathanFlurry NathanFlurry force-pushed the mock-agentic-loop/kitchen-sink-example branch from 2e3e511 to 2261075 Compare May 4, 2026 00:44
@NathanFlurry NathanFlurry force-pushed the 05-03-feat_kitchen-sink_add_mock_agentic_lifecycle_loop branch from 58a8948 to d4ad418 Compare May 4, 2026 00:44
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 4, 2026

PR Review: feat(kitchen-sink) — add mock agentic lifecycle loop

Overview

This PR substantially expands the kitchen-sink mock agentic loop test harness with:

  • Auto-managed local server: optionally spawns/stops the kitchen sink server + engine in-process so the script is fully self-contained
  • Probe loop: continuously fires WebSocket ping/pong connections in parallel to the main workload, observing expected actor.stopping closes during sleep transitions
  • Richer history validation: validates full history (row counts, contiguity, expected request IDs) on every reconnect and at the end
  • Sleep-close observation: the main session now explicitly waits for and validates the close event triggered by sleep posts
  • verifyAll action: new bulk verification action that validates all rows in a single DB query
  • onSleep hook with configurable delay: adds a sleep(delayMs) inside onSleep to give the probe loop time to observe the stopping close

Issues

Hard: Logic / Correctness

sleepPostsObservedByMain guard can silently skip a close observation

There are two separate places that check sleepStats.posts > sleepPostsObservedByMain. If the connection drops before runInference starts (the !session.open branch fires first), sleepPostsObservedByMain is bumped. The post-inference guard then skips, so waitForClose(SLEEP_CLOSE_TIMEOUT_MS) is never called and the close event goes unvalidated. Consider a single unified reconnect path rather than two parallel guards.

IIFE in runWorkload is hard to follow and has a missing finally

validateHistory(await (async () => {
    const finalSession = new RawSession(...);
    ...
    finalSession.close();   // not called if an exception is thrown first
    return history;
})(), expectedRequests);

Extract this into a named async function (e.g. requestFinalHistory). The inline IIFE is difficult to read and finalSession.close() is not in a finally block, so the session leaks on error.

Unbounded #backlog in RawSession

Messages pushed to #backlog are only consumed if a subsequent waitFor finds them. Stray progress or done events from a prior inference can accumulate and never be drained. For a long-running workload this is a slow memory leak. Consider capping the backlog or logging when messages are silently dropped.


Medium: Platform Assumptions

Linux-only: pidsWithEnvValue reads the proc filesystem

The function iterates over /proc entries and reads each process's environment file to find processes by env-var value. This is Linux-only: it silently returns nothing on macOS and fails on Windows. lsof in listenerPids has a similar constraint (macOS has lsof, Windows does not). These are test-only scripts so impact is limited, but a comment explaining the Linux assumption would prevent confusion when running locally on macOS.

resolveEngineBinary only looks at the debug build path

target/debug/rivet-engine is the only fallback. A release build (target/release/rivet-engine) is not tried. Developers who built with --release get a confusing "binary not found" error. Either check both paths or update the error message to mention RIVET_ENGINE_BINARY and --release as alternatives.


Medium: Style / Conventions

numberFromEnv duplicated between script and actor

The function appears verbatim in both scripts/mock-agentic-loop.ts and src/actors/testing/mock-agentic-loop.ts. A shared utility module would avoid drift.

Error event in waitForOpen drops the error detail

const onError = () => {
    cleanup();
    reject(new Error("websocket error"));
};

The ErrorEvent argument is discarded. Include event.message in the rejection message to aid debugging.

Probe timeout helper leaks its setTimeout handle

const timeout = (phase: string) =>
    new Promise<never>((_resolve, reject) => {
        setTimeout(() => reject(...), PROBE_TIMEOUT_MS);
    });

The setTimeout is never cleared if the Promise.race resolves on another branch first. The timer fires, calls reject on a settled promise (harmless but noisy), and holds a reference until it fires. Store and clear the handle, or use AbortSignal.timeout.

verifyAll action discards validation return values

stringValue(request.requestId, "requestId");
positiveInteger(request.seconds, "seconds");

These are called only for their throw-on-invalid side effect. A comment or renaming to assertStringValue / assertPositiveInteger would make the intent clear.


Low: Minor Notes

  • stopLocalKitchenSinkServer calls stopListeners for each port sequentially; each call sleeps 2 s between SIGTERM and SIGKILL. Use Promise.all to run the two port cleanups in parallel.
  • appendServerLog uses Array.shift() in a while loop to maintain a 200-entry cap; shift is O(n). For 200 items this is negligible, but a single splice(0, logs.length - 200) is cleaner.
  • connect() return type changed from void to number (openMs) without a type annotation update. Existing callers that ignore the value still compile, but the changed contract is invisible at call sites.
  • The final [done] log line is already long; splitting probe stats into a separate [done-probe] line would improve readability.

Summary

The core design is solid: the probe loop and sleep-close observation are good additions that catch real liveness regressions. The main asks before merging are (1) unifying the two sleepPostsObservedByMain guards so every sleep close is validated, (2) extracting the final-session IIFE into a named function with proper finally cleanup, and (3) adding comments about the Linux-only platform assumptions.

@NathanFlurry NathanFlurry force-pushed the mock-agentic-loop/kitchen-sink-example branch from 2261075 to f116193 Compare May 4, 2026 00:54
@NathanFlurry NathanFlurry force-pushed the 05-03-feat_kitchen-sink_add_mock_agentic_lifecycle_loop branch from d4ad418 to 2483b16 Compare May 4, 2026 00:54
@NathanFlurry NathanFlurry force-pushed the 05-03-feat_kitchen-sink_add_mock_agentic_lifecycle_loop branch from 2483b16 to 39bcaec Compare May 4, 2026 03:01
@NathanFlurry NathanFlurry force-pushed the mock-agentic-loop/kitchen-sink-example branch from f116193 to 0290633 Compare May 4, 2026 03:01
@NathanFlurry NathanFlurry marked this pull request as ready for review May 4, 2026 03:32
@NathanFlurry
Copy link
Copy Markdown
Member Author

Landed in main via stack-merge fast-forward push. Commits are in main; closing to match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant