Skip to content

feat(engine): portable @relaycast/engine package + independent publish flow#138

Merged
willwashburn merged 9 commits into
mainfrom
feat/engine-package
May 29, 2026
Merged

feat(engine): portable @relaycast/engine package + independent publish flow#138
willwashburn merged 9 commits into
mainfrom
feat/engine-package

Conversation

@willwashburn
Copy link
Copy Markdown
Member

What

Adds @relaycast/engine — a platform-agnostic extraction of the Relaycast server that runs as a plain Node + SQLite process (self-host) or behind a Cloudflare Durable Object adapter (the hosted product, which lives in the cloud repo). This is the open-core core: auth, entitlements/billing, and telemetry are injected providers, so the hosted layer composes on top without forking the engine.

It's set up on an independent publish flow (separate from the lockstep @relaycast/* packages) so it can ship on its own cadence (e.g. 1.3.0-rc.x on the next tag).

The engine is sourced from the cloud copy (the most up-to-date implementation), so it ships at cloud's feature level. The existing @relaycast/server Cloudflare worker is untouched by this PR.

Architecture — the seams

9 infrastructure ports + 3 provider seams (src/ports/), aggregated into EngineDeps:

Ports (infra) Providers (open-core seam)
Database, RealtimeBus, ConnectionRegistry, PresenceTracker, RateLimiter, McpSessionHost, FileStorage, KeyValueStore, EventQueue AuthProvider, EntitlementsProvider, TelemetrySink

createEngine(deps) returns the Hono app; every route/engine module reads infra off the injected runtime instead of Cloudflare c.env bindings. OSS defaults ship in-package: SqliteApiKeyAuthProvider, StaticEntitlementsProvider, NoopTelemetrySink.

Node in-process adapter (src/adapters/node/)

Single-process implementations that preserve the Cloudflare DO wire behavior:

  • Realtime: channel_seq/agent_seq stamping, 500-event resync ring, mute filtering, resync/resync_ack protocol, SQLite member-cache reload on cold start.
  • Presence: heartbeat + online→offline setInterval sweep.
  • Rate limiter, TTL KV, in-process webhook queue, local-fs file storage (HMAC-signed upload/download URLs replacing R2 presigning), MCP session host (real MCP SDK transport), better-sqlite3 driver + migration runner.
  • relaycast-engine bin@hono/node-server + ws upgrade handling.

Verification

  • tsc: 0 errors; clean dist/ emit; npm pack includes the bin + all 14 migrations.
  • Conformance suite: 13/13 green — agent_seq monotonic under 100 concurrent pushes, channel_seq monotonic under concurrent broadcasts, mute filtering, resync ring replay, resync gap→DB-fallback, presence online+sweep-offline, workspace-stream fanout, rate-limit window, MCP unknown-session 404, force-disconnect, full HTTP→realtime delivery.
  • Real-network E2E: relaycast-engine --db … --port … boots, serves /health, creates a workspace + agents over HTTP, and a real ws:// client receives message.created with monotonic channel_seq/agent_seq.

Publish flow

  • New publish-engine.yml — builds, tests, and publishes only @relaycast/engine on its own version input + dist-tag (defaults: prerelease bump, next tag, dry-run on by default).
  • publish-npm.yml lockstep sweep now skips engine and won't rewrite other packages' dependency on @relaycast/engine, keeping it independently versioned.

How to publish (after merge)

Run the Publish Engine workflow (Actions → Publish Engine) with e.g. custom_version=1.3.0-rc.0, tag=next, dry_run=false.

Not in this PR

Cloudflare DO adapter + cloud consuming the engine (next), and the self-host guide.

🤖 Generated with Claude Code

…blish flow

Introduce `@relaycast/engine`: a platform-agnostic extraction of the Relaycast
server that runs on plain Node + SQLite (self-host) or behind a Cloudflare
Durable Object adapter (hosted, lives in the cloud repo). Open-core seam:
auth, entitlements/billing, and telemetry are injected providers.

What's included:
- Ports contract (src/ports): Database, RealtimeBus, ConnectionRegistry,
  PresenceTracker, RateLimiter, McpSessionHost, FileStorage, KeyValueStore,
  EventQueue + AuthProvider/EntitlementsProvider/TelemetrySink.
- createEngine(deps) Hono factory; all domain logic reads infra off the
  injected runtime instead of Cloudflare c.env bindings.
- OSS default providers: SqliteApiKeyAuthProvider, StaticEntitlementsProvider,
  NoopTelemetrySink.
- Node in-process adapter (src/adapters/node): in-memory realtime with
  channel_seq/agent_seq + 500-event resync ring, presence sweep, rate limiter,
  KV, local-fs file storage (HMAC-signed URLs), in-process webhook queue, MCP
  session host, better-sqlite3 driver + migration runner.
- `relaycast-engine` bin (Node entrypoint via @hono/node-server + ws upgrade).
- Conformance suite (13 tests) proving seq monotonicity under concurrency, mute
  filtering, resync ring + DB gap-fill, presence sweep, workspace stream, rate
  limiting, MCP routing, force-disconnect, and HTTP->realtime delivery.

Publish flow:
- New publish-engine.yml: builds, tests, and publishes ONLY @relaycast/engine
  on its own version + dist-tag (defaults: prerelease bump, `next` tag, dry-run).
- publish-npm.yml lockstep sweep now skips `engine` and won't rewrite other
  packages' dependency on @relaycast/engine, keeping it independently versioned.

The engine is sourced from the cloud copy (the most up-to-date implementation),
so it ships at cloud's feature level. The existing @relaycast/server Cloudflare
worker is untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a new @relaycast/engine package with Node runtime adapters, full DB schema/migrations, engine domain modules, HTTP routes and middleware, WebSocket handling, server entrypoint/CLI, tests, and CI workflows to independently publish the engine.

Changes

Engine package, runtime, and APIs

Layer / File(s) Summary
End-to-end engine implementation and wiring
.github/workflows/*, packages/engine/**, package.json, .trajectories/index.json
Introduces the engine package with ports, adapters (realtime/presence/rate-limit/kv/files/mcp), DB schema and migrations, core engine modules, REST/WS routes and middleware, Node server/CLI, conformance tests, and independent publish workflows.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant Hono as Hono Router
  participant Auth as AuthProvider
  participant Engine as Engine Modules
  participant DB as SQLite/Drizzle
  participant RT as InProcessRealtime
  Client->>Hono: HTTP request (JSON)
  Hono->>Auth: authenticate(token)
  Auth-->>Hono: workspace/agent
  Hono->>Engine: handle(route, body)
  Engine->>DB: read/write
  Engine->>RT: broadcast/push
  RT-->>Client: WS event (if subscribed)
  Engine-->>Hono: JSON response
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~150 minutes

Possibly related PRs

  • AgentWorkforce/relaycast#128: Extends DM routing to support "@self" and stable websocket event IDs built on new engine DM and event-id utilities.

Poem

A rabbit built an engine—click, whirr, zing!
SQLite burrows, sockets on a string.
Hono routes hop-light through moonlit logs,
Realtime ripples race like meadow frogs.
With keys and queues, migrations tight—
We publish, thump, and launch tonight! 🐇✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/engine-package

@github-actions
Copy link
Copy Markdown

Preview deployed!

Environment URL
API https://pr138-api.relaycast.dev
Health https://pr138-api.relaycast.dev/health
Observer https://pr138-observer.relaycast.dev/observer

This preview shares the staging database and will be cleaned up when the PR is merged or closed.

Run E2E tests

npm run e2e -- https://pr138-api.relaycast.dev --ci

Open observer dashboard

https://pr138-observer.relaycast.dev/observer

@agent-relay-bot
Copy link
Copy Markdown
Contributor

Reviewed and fixed PR #138 locally.

Changes made:

  • Fixed Zod 4 / TS 5.9 type breakage in the engine dependency path:
    • packages/types/src/api.ts now preserves typed API response envelope data.
    • packages/sdk-typescript/src/casing.ts now camelizes normal object/interface shapes, not only Record<string, unknown>.
  • Verified the new @relaycast/engine package builds, tests, lints, and packs with expected dist contents including the CLI bin and migrations.

Local verification run:

  • npx turbo build --filter=@relaycast/engine --force
  • npx turbo test --filter=@relaycast/engine --force
  • npx turbo lint --filter=@relaycast/engine --force
  • npm publish --dry-run --workspace=@relaycast/engine --access public --tag next --ignore-scripts
  • npm ci --ignore-scripts

Note: trail could not record decisions because existing trajectory files fail the installed agent-trajectories schema validation.

@agent-relay-bot
Copy link
Copy Markdown
Contributor

pr-reviewer applied fixes — committed and pushed 210e629 to this PR. The notes below describe what changed.

Reviewed and fixed PR #138 locally.

Changes made:

  • Fixed Zod 4 / TS 5.9 type breakage in the engine dependency path:
    • packages/types/src/api.ts now preserves typed API response envelope data.
    • packages/sdk-typescript/src/casing.ts now camelizes normal object/interface shapes, not only Record<string, unknown>.
  • Verified the new @relaycast/engine package builds, tests, lints, and packs with expected dist contents including the CLI bin and migrations.

Local verification run:

  • npx turbo build --filter=@relaycast/engine --force
  • npx turbo test --filter=@relaycast/engine --force
  • npx turbo lint --filter=@relaycast/engine --force
  • npm publish --dry-run --workspace=@relaycast/engine --access public --tag next --ignore-scripts
  • npm ci --ignore-scripts

Note: trail could not record decisions because existing trajectory files fail the installed agent-trajectories schema validation.

Copy link
Copy Markdown
Contributor

@agent-relay-bot agent-relay-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pr-reviewer applied fixes — committed and pushed 210e629 to this PR. The notes below describe what changed.

Reviewed and fixed PR #138 locally.

Changes made:

  • Fixed Zod 4 / TS 5.9 type breakage in the engine dependency path:
    • packages/types/src/api.ts now preserves typed API response envelope data.
    • packages/sdk-typescript/src/casing.ts now camelizes normal object/interface shapes, not only Record<string, unknown>.
  • Verified the new @relaycast/engine package builds, tests, lints, and packs with expected dist contents including the CLI bin and migrations.

Local verification run:

  • npx turbo build --filter=@relaycast/engine --force
  • npx turbo test --filter=@relaycast/engine --force
  • npx turbo lint --filter=@relaycast/engine --force
  • npm publish --dry-run --workspace=@relaycast/engine --access public --tag next --ignore-scripts
  • npm ci --ignore-scripts

Note: trail could not record decisions because existing trajectory files fail the installed agent-trajectories schema validation.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

29 issues found

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/engine/src/engine/cache.ts">

<violation number="1" location="packages/engine/src/engine/cache.ts:48">
P2: Cache evict only on read. Stale keys that never read again stay forever. Prune expired entries during writes (or on interval) to prevent unbounded memory growth.</violation>
</file>

<file name="packages/engine/src/db/migrations/meta/_journal.json">

<violation number="1" location="packages/engine/src/db/migrations/meta/_journal.json:2">
P2: _journal.json tracks 1 migration (0000_broad_dagger) but 14 SQL migration files exist (0000–0013). Drizzle Kit reads this journal to determine migration state; future `drizzle-kit generate` calls will see only migration 0000 and generate a new migration duplicating changes already in 0001–0013, producing corrupt history.</violation>
</file>

<file name="packages/engine/src/auth/index.ts">

<violation number="1" location="packages/engine/src/auth/index.ts:47">
P2: Bad agent token returns generic `unauthorized` code. Use `agent_token_invalid` here to keep API contract and client handling stable.</violation>
</file>

<file name="packages/engine/src/adapters/node/kv.ts">

<violation number="1" location="packages/engine/src/adapters/node/kv.ts:28">
P2: Expired keys can stick forever in memory. TTL cleanup runs only on `get`. Add periodic/scheduled eviction so one-shot idempotency keys do not leak memory.</violation>
</file>

<file name="packages/engine/src/adapters/node/presence.ts">

<violation number="1" location="packages/engine/src/adapters/node/presence.ts:105">
P2: Empty workspace maps never cleaned up. Memory grows over time with workspace churn. Delete workspace entry when its agent map becomes empty.</violation>
</file>

<file name="packages/engine/src/engine/file.ts">

<violation number="1" location="packages/engine/src/engine/file.ts:19">
P2: Create URL before DB insert. Avoid orphan pending rows on storage error.</violation>

<violation number="2" location="packages/engine/src/engine/file.ts:61">
P2: Move status update after URL generation. Keep completeUpload retry-safe on transient storage errors.</violation>
</file>

<file name="packages/engine/src/engine/activity.ts">

<violation number="1" location="packages/engine/src/engine/activity.ts:12">
P2: Limit clamp not handling NaN. Bad `limit` query can pass through and break DB limit behavior. Guard non-finite values before clamp.</violation>

<violation number="2" location="packages/engine/src/engine/activity.ts:28">
P2: Sort has no tie-breaker. Same-second messages can shuffle and make feed order unstable. Add deterministic secondary sort.</violation>
</file>

<file name="packages/engine/src/engine/eventQueue.ts">

<violation number="1" location="packages/engine/src/engine/eventQueue.ts:35">
P2: Cleanup uses created time, not completion time. Old queued jobs can get deleted right after finishing. Filter by `completedAt` instead.</violation>
</file>

<file name="packages/engine/src/adapters/node/index.ts">

<violation number="1" location="packages/engine/src/adapters/node/index.ts:81">
P2: Migration failure leaves SQLite handle open. Close handle before rethrow.</violation>
</file>

<file name="packages/engine/src/engine/a2a-health.ts">

<violation number="1" location="packages/engine/src/engine/a2a-health.ts:30">
P1: Health ping need timeout. No timeout means one bad endpoint can block whole sweep. Add abort signal with short deadline.</violation>

<violation number="2" location="packages/engine/src/engine/a2a-health.ts:73">
P1: External URL fetch missing SSRF guard. Engine can call internal/private hosts if bad URL gets stored. Validate protocol/host before fetch.</violation>
</file>

<file name="packages/engine/src/db/migrations/0013_status_active.sql">

<violation number="1" location="packages/engine/src/db/migrations/0013_status_active.sql:4">
P2: Data migration updates existing rows but leaves column DEFAULT as 'online'. New rows inserted without an explicit status get the deprecated default. Migration 0012 acknowledged the same SQLite limitation with a comment; this migration should too, or handle it if possible.</violation>
</file>

<file name="packages/engine/src/engine/search.ts">

<violation number="1" location="packages/engine/src/engine/search.ts:32">
P1: When a `channel` filter is provided but not found, the filter is silently dropped and search returns unfiltered results. Return an empty list for unknown channels instead of broadening the query.</violation>

<violation number="2" location="packages/engine/src/engine/search.ts:41">
P1: When a `from` filter is provided but no matching agent exists, results are returned without agent filtering. Unknown senders should produce zero matches.</violation>

<violation number="3" location="packages/engine/src/engine/search.ts:67">
P1: Cursor and sort do not match. ID cursor with rank order causes unstable paging. Use rank-aware cursor or sort by ID when before/after is used.</violation>
</file>

<file name="packages/engine/src/db/migrations/0012_delivery_and_sequence_constraints.sql">

<violation number="1" location="packages/engine/src/db/migrations/0012_delivery_and_sequence_constraints.sql:3">
P1: Unique index added before data cleanup. Migration can fail on existing duplicate deliveries. Deduplicate first, then add index.</violation>

<violation number="2" location="packages/engine/src/db/migrations/0012_delivery_and_sequence_constraints.sql:12">
P1: Unique sequence index missing pre-dedup step. Existing duplicate session events can break migration.</violation>
</file>

<file name="packages/engine/src/engine/dmAll.ts">

<violation number="1" location="packages/engine/src/engine/dmAll.ts:48">
P2: Latest message lookup use text max on id. Wrong row possible. Use numeric compare for id before picking last_message.</violation>

<violation number="2" location="packages/engine/src/engine/dmAll.ts:140">
P1: Cursor compare and sort use text id. Pagination can skip or repeat messages. Compare/sort on numeric id value instead.</violation>
</file>

<file name="packages/engine/src/engine/event-id.ts">

<violation number="1" location="packages/engine/src/engine/event-id.ts:55">
P2: Empty input and `"empty-event"` input collide to same event id. Different events can look identical to dedupe/order logic. Make empty-case namespace unique (for example include arity marker).</violation>
</file>

<file name="packages/engine/src/adapters/node/database.ts">

<violation number="1" location="packages/engine/src/adapters/node/database.ts:78">
P1: Migration apply not atomic. Crash/fail after DDL can leave schema changed but migration unmarked, then next boot replays and fails on existing tables/indexes. Wrap DDL + migration record write in one transaction.</violation>
</file>

<file name="packages/engine/src/db/migrations/0010_actions_and_harness.sql">

<violation number="1" location="packages/engine/src/db/migrations/0010_actions_and_harness.sql:92">
P2: Default status wrong here. Set to accepted to match schema/runtime.</violation>
</file>

<file name="packages/engine/src/engine/index.ts">

<violation number="1" location="packages/engine/src/engine/index.ts:1">
P2: Barrel exports have no consumers and duplicate public API symbols.

The entire `engine/index.ts` barrel is unused — no internal file imports from `./engine/index` or `./engine/index.js`, the `package.json` exports map does not expose it, and routes import directly from individual engine modules (e.g., `'../engine/workspace.js'`). The Snowflake symbols (`SnowflakeGenerator`, `getSnowflakeGenerator`, `generateId`) on line 1 duplicate the same re-exports already in `src/index.ts` (line 32). This adds maintenance surface (9 exports to keep in sync) with zero active consumers.</violation>
</file>

<file name="packages/engine/src/db/migrations/0001_fts5_search.sql">

<violation number="1" location="packages/engine/src/db/migrations/0001_fts5_search.sql:21">
P1: This migration never backfills the new external-content FTS index, so existing messages can be missing from search results after migration.</violation>
</file>

<file name="packages/engine/src/engine/eventDelivery.ts">

<violation number="1" location="packages/engine/src/engine/eventDelivery.ts:68">
P2: Treat HTTP 429 as retryable; currently it is incorrectly grouped into non-retryable 4xx failures.</violation>

<violation number="2" location="packages/engine/src/engine/eventDelivery.ts:71">
P1: Do not swallow subscription lookup errors; this causes silent event loss because callers only handle thrown delivery failures.</violation>
</file>

<file name="packages/engine/src/engine/receipt.ts">

<violation number="1" location="packages/engine/src/engine/receipt.ts:30">
P1: `markRead` is missing a channel-membership authorization check, so non-members can mark messages as read if they know the message ID.</violation>
</file>

Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.
On a pro plan you can use ultrareview for larger PRs.

Re-trigger cubic

Comment thread packages/engine/src/engine/a2a-health.ts
Comment thread packages/engine/src/engine/a2a-health.ts
Comment thread packages/engine/src/engine/search.ts
.groupBy(messages.channelId);

const latestMessageIds = await db
.select({ channelId: messages.channelId, lastId: sql<string>`max(${messages.id})` })
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Latest message lookup use text max on id. Wrong row possible. Use numeric compare for id before picking last_message.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/engine/src/engine/dmAll.ts, line 48:

<comment>Latest message lookup use text max on id. Wrong row possible. Use numeric compare for id before picking last_message.</comment>

<file context>
@@ -0,0 +1,150 @@
+    .groupBy(messages.channelId);
+
+  const latestMessageIds = await db
+    .select({ channelId: messages.channelId, lastId: sql<string>`max(${messages.id})` })
+    .from(messages)
+    .where(inArray(messages.channelId, channelIds))
</file context>

Comment thread packages/engine/src/engine/event-id.ts Outdated
reason TEXT,
priority TEXT NOT NULL DEFAULT 'normal',
deadline INTEGER DEFAULT NULL,
status TEXT NOT NULL DEFAULT 'pending',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Default status wrong here. Set to accepted to match schema/runtime.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/engine/src/db/migrations/0010_actions_and_harness.sql, line 92:

<comment>Default status wrong here. Set to accepted to match schema/runtime.</comment>

<file context>
@@ -0,0 +1,108 @@
+  reason TEXT,
+  priority TEXT NOT NULL DEFAULT 'normal',
+  deadline INTEGER DEFAULT NULL,
+  status TEXT NOT NULL DEFAULT 'pending',
+  retryable INTEGER DEFAULT NULL,
+  available_at INTEGER DEFAULT NULL,
</file context>

Comment thread packages/engine/src/engine/index.ts Outdated
Comment thread packages/engine/src/engine/eventDelivery.ts Outdated
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

29 issues found

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/engine/src/engine/cache.ts">

<violation number="1" location="packages/engine/src/engine/cache.ts:48">
P2: Cache evict only on read. Stale keys that never read again stay forever. Prune expired entries during writes (or on interval) to prevent unbounded memory growth.</violation>
</file>

<file name="packages/engine/src/db/migrations/meta/_journal.json">

<violation number="1" location="packages/engine/src/db/migrations/meta/_journal.json:2">
P2: _journal.json tracks 1 migration (0000_broad_dagger) but 14 SQL migration files exist (0000–0013). Drizzle Kit reads this journal to determine migration state; future `drizzle-kit generate` calls will see only migration 0000 and generate a new migration duplicating changes already in 0001–0013, producing corrupt history.</violation>
</file>

<file name="packages/engine/src/auth/index.ts">

<violation number="1" location="packages/engine/src/auth/index.ts:47">
P2: Bad agent token returns generic `unauthorized` code. Use `agent_token_invalid` here to keep API contract and client handling stable.</violation>
</file>

<file name="packages/engine/src/adapters/node/kv.ts">

<violation number="1" location="packages/engine/src/adapters/node/kv.ts:28">
P2: Expired keys can stick forever in memory. TTL cleanup runs only on `get`. Add periodic/scheduled eviction so one-shot idempotency keys do not leak memory.</violation>
</file>

<file name="packages/engine/src/adapters/node/presence.ts">

<violation number="1" location="packages/engine/src/adapters/node/presence.ts:105">
P2: Empty workspace maps never cleaned up. Memory grows over time with workspace churn. Delete workspace entry when its agent map becomes empty.</violation>
</file>

<file name="packages/engine/src/engine/file.ts">

<violation number="1" location="packages/engine/src/engine/file.ts:19">
P2: Create URL before DB insert. Avoid orphan pending rows on storage error.</violation>

<violation number="2" location="packages/engine/src/engine/file.ts:61">
P2: Move status update after URL generation. Keep completeUpload retry-safe on transient storage errors.</violation>
</file>

<file name="packages/engine/src/engine/activity.ts">

<violation number="1" location="packages/engine/src/engine/activity.ts:12">
P2: Limit clamp not handling NaN. Bad `limit` query can pass through and break DB limit behavior. Guard non-finite values before clamp.</violation>

<violation number="2" location="packages/engine/src/engine/activity.ts:28">
P2: Sort has no tie-breaker. Same-second messages can shuffle and make feed order unstable. Add deterministic secondary sort.</violation>
</file>

<file name="packages/engine/src/engine/eventQueue.ts">

<violation number="1" location="packages/engine/src/engine/eventQueue.ts:35">
P2: Cleanup uses created time, not completion time. Old queued jobs can get deleted right after finishing. Filter by `completedAt` instead.</violation>
</file>

<file name="packages/engine/src/adapters/node/index.ts">

<violation number="1" location="packages/engine/src/adapters/node/index.ts:81">
P2: Migration failure leaves SQLite handle open. Close handle before rethrow.</violation>
</file>

<file name="packages/engine/src/engine/a2a-health.ts">

<violation number="1" location="packages/engine/src/engine/a2a-health.ts:30">
P1: Health ping need timeout. No timeout means one bad endpoint can block whole sweep. Add abort signal with short deadline.</violation>

<violation number="2" location="packages/engine/src/engine/a2a-health.ts:73">
P1: External URL fetch missing SSRF guard. Engine can call internal/private hosts if bad URL gets stored. Validate protocol/host before fetch.</violation>
</file>

<file name="packages/engine/src/db/migrations/0013_status_active.sql">

<violation number="1" location="packages/engine/src/db/migrations/0013_status_active.sql:4">
P2: Data migration updates existing rows but leaves column DEFAULT as 'online'. New rows inserted without an explicit status get the deprecated default. Migration 0012 acknowledged the same SQLite limitation with a comment; this migration should too, or handle it if possible.</violation>
</file>

<file name="packages/engine/src/engine/search.ts">

<violation number="1" location="packages/engine/src/engine/search.ts:32">
P1: When a `channel` filter is provided but not found, the filter is silently dropped and search returns unfiltered results. Return an empty list for unknown channels instead of broadening the query.</violation>

<violation number="2" location="packages/engine/src/engine/search.ts:41">
P1: When a `from` filter is provided but no matching agent exists, results are returned without agent filtering. Unknown senders should produce zero matches.</violation>

<violation number="3" location="packages/engine/src/engine/search.ts:67">
P1: Cursor and sort do not match. ID cursor with rank order causes unstable paging. Use rank-aware cursor or sort by ID when before/after is used.</violation>
</file>

<file name="packages/engine/src/db/migrations/0012_delivery_and_sequence_constraints.sql">

<violation number="1" location="packages/engine/src/db/migrations/0012_delivery_and_sequence_constraints.sql:3">
P1: Unique index added before data cleanup. Migration can fail on existing duplicate deliveries. Deduplicate first, then add index.</violation>

<violation number="2" location="packages/engine/src/db/migrations/0012_delivery_and_sequence_constraints.sql:12">
P1: Unique sequence index missing pre-dedup step. Existing duplicate session events can break migration.</violation>
</file>

<file name="packages/engine/src/engine/dmAll.ts">

<violation number="1" location="packages/engine/src/engine/dmAll.ts:48">
P2: Latest message lookup use text max on id. Wrong row possible. Use numeric compare for id before picking last_message.</violation>

<violation number="2" location="packages/engine/src/engine/dmAll.ts:140">
P1: Cursor compare and sort use text id. Pagination can skip or repeat messages. Compare/sort on numeric id value instead.</violation>
</file>

<file name="packages/engine/src/engine/event-id.ts">

<violation number="1" location="packages/engine/src/engine/event-id.ts:55">
P2: Empty input and `"empty-event"` input collide to same event id. Different events can look identical to dedupe/order logic. Make empty-case namespace unique (for example include arity marker).</violation>
</file>

<file name="packages/engine/src/adapters/node/database.ts">

<violation number="1" location="packages/engine/src/adapters/node/database.ts:78">
P1: Migration apply not atomic. Crash/fail after DDL can leave schema changed but migration unmarked, then next boot replays and fails on existing tables/indexes. Wrap DDL + migration record write in one transaction.</violation>
</file>

<file name="packages/engine/src/db/migrations/0010_actions_and_harness.sql">

<violation number="1" location="packages/engine/src/db/migrations/0010_actions_and_harness.sql:92">
P2: Default status wrong here. Set to accepted to match schema/runtime.</violation>
</file>

<file name="packages/engine/src/engine/index.ts">

<violation number="1" location="packages/engine/src/engine/index.ts:1">
P2: Barrel exports have no consumers and duplicate public API symbols.

The entire `engine/index.ts` barrel is unused — no internal file imports from `./engine/index` or `./engine/index.js`, the `package.json` exports map does not expose it, and routes import directly from individual engine modules (e.g., `'../engine/workspace.js'`). The Snowflake symbols (`SnowflakeGenerator`, `getSnowflakeGenerator`, `generateId`) on line 1 duplicate the same re-exports already in `src/index.ts` (line 32). This adds maintenance surface (9 exports to keep in sync) with zero active consumers.</violation>
</file>

<file name="packages/engine/src/db/migrations/0001_fts5_search.sql">

<violation number="1" location="packages/engine/src/db/migrations/0001_fts5_search.sql:21">
P1: This migration never backfills the new external-content FTS index, so existing messages can be missing from search results after migration.</violation>
</file>

<file name="packages/engine/src/engine/eventDelivery.ts">

<violation number="1" location="packages/engine/src/engine/eventDelivery.ts:68">
P2: Treat HTTP 429 as retryable; currently it is incorrectly grouped into non-retryable 4xx failures.</violation>

<violation number="2" location="packages/engine/src/engine/eventDelivery.ts:71">
P1: Do not swallow subscription lookup errors; this causes silent event loss because callers only handle thrown delivery failures.</violation>
</file>

<file name="packages/engine/src/engine/receipt.ts">

<violation number="1" location="packages/engine/src/engine/receipt.ts:30">
P1: `markRead` is missing a channel-membership authorization check, so non-members can mark messages as read if they know the message ID.</violation>
</file>

Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.
On a pro plan you can use ultrareview for larger PRs.

Re-trigger cubic

Comment thread packages/engine/src/engine/a2a-health.ts
Comment thread packages/engine/src/engine/a2a-health.ts
Comment thread packages/engine/src/engine/search.ts
Comment thread packages/engine/src/engine/dmAll.ts
Comment thread packages/engine/src/engine/event-id.ts Outdated
reason TEXT,
priority TEXT NOT NULL DEFAULT 'normal',
deadline INTEGER DEFAULT NULL,
status TEXT NOT NULL DEFAULT 'pending',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Default status wrong here. Set to accepted to match schema/runtime.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/engine/src/db/migrations/0010_actions_and_harness.sql, line 92:

<comment>Default status wrong here. Set to accepted to match schema/runtime.</comment>

<file context>
@@ -0,0 +1,108 @@
+  reason TEXT,
+  priority TEXT NOT NULL DEFAULT 'normal',
+  deadline INTEGER DEFAULT NULL,
+  status TEXT NOT NULL DEFAULT 'pending',
+  retryable INTEGER DEFAULT NULL,
+  available_at INTEGER DEFAULT NULL,
</file context>

Comment thread packages/engine/src/engine/index.ts Outdated
Comment thread packages/engine/src/engine/eventDelivery.ts Outdated
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name=".trajectories/index.json">

<violation number="1" location=".trajectories/index.json:4">
P2: Clearing `trajectories` to an empty object orphaned existing active/completed trajectory files and effectively drops discoverability/history from the index.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread .trajectories/index.json
}
}
"lastUpdated": "2026-05-29T15:00:49.465Z",
"trajectories": {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Clearing trajectories to an empty object orphaned existing active/completed trajectory files and effectively drops discoverability/history from the index.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .trajectories/index.json, line 4:

<comment>Clearing `trajectories` to an empty object orphaned existing active/completed trajectory files and effectively drops discoverability/history from the index.</comment>

<file context>
@@ -1,1062 +1,5 @@
-    }
-  }
+  "lastUpdated": "2026-05-29T15:00:49.465Z",
+  "trajectories": {}
 }
\ No newline at end of file
</file context>

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

🧹 Nitpick comments (23)
packages/engine/src/adapters/node/realtime.ts (1)

257-315: ⚡ Quick win

Validate inbound frames with a zod schema.

Inbound ping/resync frames are parsed via JSON.parse and validated with ad-hoc typeof checks. A small z.discriminatedUnion on type would centralize shape/last_seen_seq/since validation and reject malformed frames consistently.

As per coding guidelines: "Prefer zod schemas for validation instead of ad-hoc manual checks".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/adapters/node/realtime.ts` around lines 257 - 315,
Replace the ad-hoc JSON.parse + typeof checks in onAgentMessage and
onWorkspaceMessage with Zod validation: define a z.discriminatedUnion schema
(e.g., AgentFrameSchema) for agent frames with variants {type: 'ping'}, {type:
'resync', last_seen_seq: z.number(), since: z.string().optional()} and use
AgentFrameSchema.safeParse(parsed) (or parse inside try/catch) to reject
malformed frames early; similarly add a minimal WorkspaceFrameSchema for
workspace frames (type: 'ping') used in onWorkspaceMessage. Update
onAgentMessage to only handle validated frames (use the discriminant to narrow
types so you can safely access last_seen_seq and since) and return on validation
failure, and add the necessary zod import and exported schema names so types
(e.g., AgentFrame, WorkspaceFrame) can be reused elsewhere.
packages/engine/src/adapters/node/rate-limit.ts (1)

3-38: 💤 Low value

This is a fixed-window limiter, not sliding-window.

A single bucket with one expiresAt resets entirely at the window boundary, so a client can issue limit requests at the end of one window and limit more immediately after — up to ~2× the intended rate across a boundary. The doc comment ("sliding-window") overstates the guarantee. Either rename/clarify the comment, or implement a true sliding window (e.g. dual-bucket weighting) if the boundary burst is a concern for the protected routes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/adapters/node/rate-limit.ts` around lines 3 - 38, The
implementation of InProcessRateLimiter is actually a fixed-window limiter
(single bucket with expiresAt) rather than a sliding-window: update either the
documentation or the implementation. To fix, choose one: (A) change the class
comment to say "fixed-window" and clarify behavior so callers know bursts across
window boundaries are allowed, referencing InProcessRateLimiter, check, cleanup,
and buckets; or (B) implement a true sliding-window algorithm (e.g., keep two
buckets per bucketKey with timestamps and weight counts by time elapsed, or
store timestamped event counts and compute the sliding sum) in
InProcessRateLimiter.check/cleanup so requests are counted proportionally across
the window and the burst at boundaries is prevented. Ensure you update the class
comment accordingly and keep bucket storage/cleanup logic consistent with the
chosen approach.
packages/engine/src/engine/agent.ts (1)

2-2: ⚡ Quick win

Remove unused imports sql and or.

Static analysis confirms these imports are never used.

-import { eq, and, lt, or, sql, inArray } from 'drizzle-orm';
+import { eq, and, lt, inArray } from 'drizzle-orm';
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/engine/agent.ts` at line 2, The import statement in
packages/engine/src/engine/agent.ts currently brings in unused symbols `sql` and
`or`; update the import from 'drizzle-orm' to remove `sql` and `or` (keeping
only used imports `eq`, `and`, `lt`, `inArray`) and run a quick grep/IDE search
for `sql` and `or` to confirm there are no remaining references before
committing.
packages/engine/src/engine/channel.ts (1)

276-299: ⚡ Quick win

Consider using onConflictDoNothing for membership insert in joinChannel.

The check at lines 277-289 prevents duplicate membership, but a concurrent join request could pass the check and fail at the INSERT (line 291-295) with a constraint error. Using onConflictDoNothing() would make this idempotent and race-safe without changing behavior.

Suggested fix
-  await db.insert(channelMembers).values({
-    channelId: channel.id,
-    agentId,
-    role: 'member',
-  });
+  await db.insert(channelMembers).values({
+    channelId: channel.id,
+    agentId,
+    role: 'member',
+  }).onConflictDoNothing();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/engine/channel.ts` around lines 276 - 299, The
joinChannel flow currently queries channelMembers then inserts, which can race;
modify the insert in the joinChannel function to use onConflictDoNothing() on
channelMembers (matching the unique constraint on channelId+agentId) so
concurrent inserts won't raise a constraint error, then determine already_member
by checking the pre-check "existing" or the insert result (affectedRows === 0)
before calling invalidateChannelCache(workspaceId, channelName); keep references
to channelMembers, joinChannel, and invalidateChannelCache to locate the change.
packages/engine/src/engine/action.ts (1)

1-1: ⚡ Quick win

Remove unused import sql.

Static analysis confirms sql is imported but never used.

-import { eq, and, sql } from 'drizzle-orm';
+import { eq, and } from 'drizzle-orm';
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/engine/action.ts` at line 1, The import statement in
engine/action.ts currently brings in an unused symbol `sql` from 'drizzle-orm';
remove `sql` from the import list so only used symbols (`eq`, `and`) are
imported, e.g., update the import that references eq, and, sql to exclude `sql`
to satisfy static analysis and prevent unused-import warnings.
packages/engine/src/engine/directory.ts (1)

621-640: ⚡ Quick win

Wrap JSON.parse calls in try-catch to handle malformed JSON gracefully.

Lines 632-634 call JSON.parse on raw SQL results. If the stored JSON is malformed (e.g., due to a migration issue or direct DB edit), this will throw and crash the search. Consider defensive parsing.

Suggested defensive parsing
+function safeJsonParse<T>(value: string | null, fallback: T): T {
+  if (!value) return fallback;
+  try {
+    return JSON.parse(value) as T;
+  } catch {
+    return fallback;
+  }
+}
+
 const hydratedRows: DirectoryAgentRow[] = rows.map((row) => ({
   id: row.id,
   workspaceId: row.workspace_id,
   sourceAgentId: row.source_agent_id,
   slug: row.slug,
   name: row.name,
   description: row.description,
   provider: row.provider,
   endpointUrl: row.endpoint_url,
   documentationUrl: row.documentation_url,
   version: row.version,
-  tags: JSON.parse(row.tags || '[]'),
-  capabilities: JSON.parse(row.capabilities || '{}'),
-  metadata: JSON.parse(row.metadata || '{}'),
+  tags: safeJsonParse<string[]>(row.tags, []),
+  capabilities: safeJsonParse<Record<string, unknown>>(row.capabilities, {}),
+  metadata: safeJsonParse<Record<string, unknown>>(row.metadata, {}),
   status: row.status,
   ratingSum: row.rating_sum,
   ratingCount: row.rating_count,
   createdAt: new Date(row.created_at * 1000),
   updatedAt: new Date(row.updated_at * 1000),
 }));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/engine/directory.ts` around lines 621 - 640, The
JSON.parse calls inside the rows.map that builds hydratedRows (producing
DirectoryAgentRow objects) can throw on malformed DB values; update the mapping
in the DirectoryAgentRow construction (the rows.map callback) to defensively
parse tags, capabilities, and metadata by wrapping each JSON.parse in try-catch
(or extract a small helper like safeParseJson used by the map) and return sane
defaults (e.g., [] for tags, {} for capabilities/metadata) on error and
optionally log/debug the parse failure; ensure the rest of the mapped fields
remain unchanged.
packages/engine/src/engine/dmAll.ts (1)

13-23: Unbounded workspace-wide listing.

listAllDmConversations fetches every conversation (plus participants, counts, and latest messages) for a workspace with no limit/pagination. This is fine for small workspaces but can grow expensive for large ones on an admin/console path. Consider adding a limit + cursor.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/engine/dmAll.ts` around lines 13 - 23, The function
listAllDmConversations performs an unbounded workspace-wide select; update it to
support paginated listing by adding optional parameters (e.g., limit: number and
cursor: string or {createdAt: string, id: string}) to the function signature and
use them in the query (apply an ORDER BY on createdAt, id for deterministic
paging and a WHERE clause to filter >/< the cursor), use .limit(limit + 1) to
detect next page, and return a page payload (items, nextCursor). Change
references in callers of listAllDmConversations accordingly so admin/console
paths request a sensible default limit.
packages/engine/src/engine/groupDm.ts (1)

10-38: 💤 Low value

fetchAttachmentsBatch duplicates the identical helper in dm.ts.

Consider extracting this into a shared module (e.g. an attachments helper) to avoid divergence between the two copies.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/engine/groupDm.ts` around lines 10 - 38,
fetchAttachmentsBatch is duplicated in this file and dm.ts; extract it into a
single shared helper to avoid divergence. Create a new shared attachments helper
module that exports fetchAttachmentsBatch (keeping the signature
Promise<Map<string, AttachmentRow[]>>) and ensure it imports/uses the same query
symbols (messageAttachments, files, eq, inArray, asc, and types like Db and
AttachmentRow); then replace the local implementations in both groupDm.ts and
dm.ts with a single import from that helper and remove the duplicated function
bodies. Ensure TypeScript types and exports/imports match the consumers so
callers compile without changes.
packages/engine/src/engine/snowflake.ts (1)

36-41: Worker-id collisions are likely across multiple processes.

The implicit worker id is a 10-bit hash of a random UUID (1024 slots), so independent processes/isolates hashing to the same value can mint identical (timestamp, worker, sequence) IDs and collide on the primary key. This is safe for single-process self-host and per-DO isolation, but if a future deployment runs multiple engine instances against one SQLite DB, expose an explicit worker id (e.g. via env) so each instance is distinct.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/engine/snowflake.ts` around lines 36 - 41, The implicit
10-bit worker id from fnv1a10Bits(getIsolateId()) can collide across processes;
modify getWorkerId (and where workerId is set) to prefer an explicit configured
worker id (e.g. read from an env var like ENGINE_WORKER_ID or from startup
config) parsed as an integer and validated to be within 0..1023 (mask with 0x3ff
or throw/log if out of range), falling back to the existing
fnv1a10Bits(getIsolateId()) only when no explicit value is provided; update any
constructor/initialization code that sets this.workerId to read and validate the
env/config value and ensure process-level uniqueness documentation is noted.
packages/engine/src/engine/sessionEvent.ts (1)

44-58: ⚡ Quick win

Derive the type and the validator set from one source.

The SessionEventType union (Lines 8-42) and VALID_EVENT_TYPES (Lines 44-58) duplicate all 34 values; they match today but will drift independently. Define the values once as a const tuple and derive both the type and the Set from it.

♻️ Single source of truth
-export type SessionEventType =
-  | 'status.changed'
-  ...
-  | 'error';
-
-const VALID_EVENT_TYPES = new Set<string>([
-  'status.changed', ...
-  'log', 'error',
-]);
+const SESSION_EVENT_TYPES = [
+  'status.changed', 'status.idle', 'status.active', 'status.blocked',
+  'status.waiting', 'status.offline',
+  // ...remaining values...
+  'log', 'error',
+] as const;
+
+export type SessionEventType = (typeof SESSION_EVENT_TYPES)[number];
+const VALID_EVENT_TYPES = new Set<string>(SESSION_EVENT_TYPES);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/engine/sessionEvent.ts` around lines 44 - 58, Replace the
duplicated literal lists by defining a single const tuple (e.g.
SESSION_EVENT_TYPES) containing all event strings, then derive the union type
and the Set from it: declare SessionEventType as typeof
SESSION_EVENT_TYPES[number] and build VALID_EVENT_TYPES via new
Set(SESSION_EVENT_TYPES); update any existing exports/uses that referred to
SessionEventType and VALID_EVENT_TYPES to use these derived values so the list
is maintained in one place.
packages/engine/src/engine/routing.ts (1)

96-106: ⚡ Quick win

Prefer a zod schema over per-field typeof checks.

withDefaultWeights parses an unknown JSON column with ad-hoc typeof === 'number' guards. A small zod schema (e.g. z.object({...}).partial().catch({})) would centralize parsing/coercion and also let you reject out-of-range values for circuit_breaker_threshold/circuit_breaker_cooldown_seconds in setRoutingConfig.

As per coding guidelines: "Prefer zod schemas for validation instead of ad-hoc manual checks".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/engine/routing.ts` around lines 96 - 106, The
withDefaultWeights function currently uses ad-hoc typeof checks to parse an
unknown JSON into RoutingWeights; replace this with a zod schema: create a
z.object({...}).partial() schema that defines numeric fields matching
RoutingWeights, use .catch(DEFAULT_WEIGHTS) or .parse/coerce to apply defaults
from DEFAULT_WEIGHTS and ensure non-numeric/missing values fall back to
defaults; also reuse or extend that schema (and validate ranges) in
setRoutingConfig for circuit_breaker_threshold and
circuit_breaker_cooldown_seconds to reject out-of-range values instead of
relying on manual guards.
packages/engine/src/engine/usage.ts (1)

15-34: 💤 Low value

Deduplicate the metrics list.

The ['messages', 'api_calls', 'files', 'file_bytes', 'ws_minutes'] array is repeated in getUsageCounters and resetUsageCounters; they can silently diverge. Extract a shared const USAGE_METRICS (e.g. as const).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/engine/usage.ts` around lines 15 - 34, The metrics array
is duplicated between getUsageCounters and resetUsageCounters; extract a shared
constant (e.g. const USAGE_METRICS =
['messages','api_calls','files','file_bytes','ws_minutes'] as const) and use
that constant in both functions (replace the inline arrays in getUsageCounters
and resetUsageCounters with USAGE_METRICS); optionally derive a type with typeof
USAGE_METRICS[number] for strong typing where needed to ensure the two functions
cannot diverge.
packages/engine/src/engine/workspace.ts (2)

68-92: ⚡ Quick win

Workspace + #general channel creation isn't transactional.

The workspaces insert and the channels insert run as two separate statements. If the channel insert fails (or the process dies between them), you're left with a workspace that has no default channel and no easy way to detect/repair it. Consider wrapping both inserts in a single SQLite transaction so the default channel creation is all-or-nothing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/engine/workspace.ts` around lines 68 - 92, The workspace
and default channel creation (the insert into workspaces via
db.insert(workspaces) and the insert into channels) must be executed inside a
single database transaction so both succeed or both roll back; modify the
function that calls db.insert(workspaces) and generateId()/db.insert(channels)
to run within a transaction (use your DB client's transaction API, e.g.,
db.transaction or equivalent) and perform both inserts and
buildWorkspaceResponse(apiKey) inside that transactional callback so failure to
create the `#general` channel rolls back the workspace insert.

54-61: 💤 Low value

Document the engine-level double-flattening in createWorkspace (workspace + top-level fields).

  • packages/engine/src/engine/workspace.ts returns { workspace, created: <bool>, ...workspace } in both the reuse path (54-61) and the create path (86-91), so keys like workspace_id/api_key/created_at exist both as result.workspace.* and at the top level.
  • The HTTP endpoint POST /v1/workspaces only returns data: result.workspace, so the API shape is driven by the nested workspace object (SDK expects data.workspace_id/data.api_key/data.created_at).
  • Engine-level callers/tests appear to use the top-level fields (result.workspace_id, result.api_key, etc.), so the flattening likely supports direct engine consumers/back-compat—add a brief inline comment explaining the intent to avoid drift.
if (existing) {
  const workspace = buildWorkspaceResponse(existing, providedOwnerApiKey);
  return {
    workspace,
    created: false,
    ...workspace,
  };
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/engine/workspace.ts` around lines 54 - 61, The return
value in createWorkspace currently returns both a nested workspace object and
also spreads its fields to the top level (return { workspace, created: false,
...workspace }), which causes an intentional “double-flattening” (workspace.*
and top-level workspace_id/api_key/created_at) for engine-level consumers and
backward compatibility; add a brief inline comment immediately above the return
in createWorkspace (and mirror in the create path where created: true is
returned) stating that we intentionally include both result.workspace and
top-level fields to preserve API shape for HTTP responses and support direct
engine callers/tests, referencing buildWorkspaceResponse to show the source of
the flattened fields.
packages/engine/src/lib/workspaceStream.ts (1)

14-78: Cache invalidation is process-local; note staleness window when scaled.

setWorkspaceStreamOverride only clears the in-process cache, so in any multi-instance deployment a change made on one instance remains stale on others for up to CACHE_TTL_MS (10s). This is fine for the single-process Node self-host targeted here, but worth tracking for the out-of-scope DO/cloud adapter where overrides would need cross-instance invalidation (or a shorter TTL). Also, expired entries are never evicted from the module-level Map; it's bounded by tenant count, so harmless now, but consider lazy eviction if workspace cardinality grows large.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/lib/workspaceStream.ts` around lines 14 - 78, The
module-level cache (cache Map) only invalidates entries in-process, so
setWorkspaceStreamOverride clears only this instance and other instances can
serve stale values for up to CACHE_TTL_MS; update getWorkspaceStreamConfig to
lazily evict expired entries by removing cache.delete(workspaceId) when an entry
is found with expiresAt <= now, and add a TODO comment on
setWorkspaceStreamOverride noting cross-instance invalidation is required for
multi-instance deployments (e.g., use pub/sub or a shorter TTL) and consider
adding a background/periodic or on-access eviction strategy if tenant
cardinality grows.
packages/engine/src/routes/agent.ts (1)

299-305: 💤 Low value

Consider extracting CLI validation list to a shared constant.

The valid CLI list is hardcoded in the spawn handler. While this works, extracting it to a shared constant (e.g., in a constants file or at module level) would improve maintainability if the list changes or needs to be referenced elsewhere.

♻️ Example refactor

At the top of the file or in a shared constants module:

const VALID_AGENT_CLIS = ['claude', 'codex', 'opencode', 'gemini', 'aider', 'goose'] as const;

Then use it in validation:

-      const validClis = ['claude', 'codex', 'opencode', 'gemini', 'aider', 'goose'];
-      if (!validClis.includes(cli)) {
-        return c.json({
-          ok: false,
-          error: { code: 'invalid_request', message: `cli must be one of: ${validClis.join(', ')}` },
-        }, 400);
-      }
+      if (!VALID_AGENT_CLIS.includes(cli as any)) {
+        return c.json({
+          ok: false,
+          error: { code: 'invalid_request', message: `cli must be one of: ${VALID_AGENT_CLIS.join(', ')}` },
+        }, 400);
+      }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/routes/agent.ts` around lines 299 - 305, Extract the
hardcoded validClis array used in the spawn handler into a shared constant
(e.g., VALID_AGENT_CLIS) so it can be reused and maintained centrally; define it
at module scope or in a shared constants module (export if shared) as a readonly
tuple (use "as const") and then replace the local validClis reference in the
spawn handler with VALID_AGENT_CLIS, keeping the same includes-based validation
and error message construction (VALID_AGENT_CLIS.join(', ')).
packages/engine/src/routes/directory.ts (2)

69-69: ⚡ Quick win

Replace as any cast with proper ContentfulStatusCode typing.

The status code is cast to any. Import and use ContentfulStatusCode from hono/utils/http-status.

♻️ Proposed fix

Add the import:

 import { Hono } from 'hono';
+import type { ContentfulStatusCode } from 'hono/utils/http-status';
 import { z } from 'zod';

Replace the cast:

-  }, (error.status || 500) as any);
+  }, (error.status || 500) as ContentfulStatusCode);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/routes/directory.ts` at line 69, The status code is
currently cast to any; import ContentfulStatusCode from 'hono/utils/http-status'
and replace the cast on the expression (error.status || 500) to use
ContentfulStatusCode instead of any so the response status uses the proper type
(i.e., change (error.status || 500) as any to (error.status || 500) as
ContentfulStatusCode); update the import list to include ContentfulStatusCode
and ensure the surrounding error handling that references error.status remains
unchanged.

62-62: ⚡ Quick win

Type the handleError context parameter properly.

The c parameter is typed as any, which bypasses type checking. It should be typed as Context<AppEnv> to match the route handler context type.

♻️ Proposed fix
+import type { Context } from 'hono';
+
-function handleError(c: any, err: unknown) {
+function handleError(c: Context<AppEnv>, err: unknown) {
   const error = err as Error & { code?: string; status?: number; cause?: unknown };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/routes/directory.ts` at line 62, The handleError function
currently types its context parameter as any; change its signature to accept a
properly typed Context<AppEnv> (i.e., function handleError(c: Context<AppEnv>,
err: unknown)) and import or reference the Context type and AppEnv interface
where needed so the route handler context is type-checked; update any call sites
if necessary to satisfy the new type.
packages/engine/src/routes/console.ts (1)

55-55: ⚡ Quick win

Replace as any casts with proper ContentfulStatusCode typing.

Error handlers cast status codes to any. Use ContentfulStatusCode from hono/utils/http-status instead.

♻️ Proposed fix

Add the import:

 import { Hono } from 'hono';
+import type { ContentfulStatusCode } from 'hono/utils/http-status';
 import { z } from 'zod';

Replace the casts:

-    }, (error.status || 500) as any);
+    }, (error.status || 500) as ContentfulStatusCode);

Also applies to: 78-78, 106-106, 129-129

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/routes/console.ts` at line 55, The error handlers in
packages/engine/src/routes/console.ts cast status codes to any; import
ContentfulStatusCode from 'hono/utils/http-status' and replace the casts such as
(error.status || 500) as any with (error.status || 500) as ContentfulStatusCode
in each error-response call (the arrow error handler callbacks used to send
responses at the locations flagged — e.g., the error handler around line 55 and
the similar handlers at the other flagged spots). Ensure the new import is added
at top of the file and each cast uses ContentfulStatusCode.
packages/engine/src/routes/action.ts (1)

78-78: ⚡ Quick win

Replace as any casts with proper ContentfulStatusCode typing.

The error handlers cast error status codes to any, which bypasses type safety. Import and use ContentfulStatusCode from hono/utils/http-status as demonstrated in agent.ts.

♻️ Proposed fix

Add the import at the top of the file:

 import { Hono } from 'hono';
+import type { ContentfulStatusCode } from 'hono/utils/http-status';
 import { z } from 'zod';

Then replace each as any with as ContentfulStatusCode:

-      (error.status || 500) as any,
+      (error.status || 500) as ContentfulStatusCode,

Also applies to: 94-94, 116-116, 141-141, 217-217, 287-287, 309-309

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/routes/action.ts` at line 78, Import ContentfulStatusCode
from 'hono/utils/http-status' at the top of packages/engine/src/routes/action.ts
and replace every cast of (error.status || 500) as any with (error.status ||
500) as ContentfulStatusCode; specifically update each occurrence where status
is set using that expression (e.g., in the action route error handlers around
the existing uses at the noted locations) so the response status uses the proper
ContentfulStatusCode type instead of any.
packages/engine/src/routes/certify.ts (1)

56-56: ⚡ Quick win

Replace as any casts with proper ContentfulStatusCode typing.

Same issue as in action.ts: error status codes are cast to any instead of using proper typing.

♻️ Proposed fix

Import ContentfulStatusCode:

 import { Hono } from 'hono';
+import type { ContentfulStatusCode } from 'hono/utils/http-status';
 import { z } from 'zod';

Replace the casts:

-    }, (error.status || 500) as any);
+    }, (error.status || 500) as ContentfulStatusCode);

Also applies to: 76-76

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/routes/certify.ts` at line 56, Replace the loose "as any"
status casts with the ContentfulStatusCode type: add an import for
ContentfulStatusCode at the top of packages/engine/src/routes/certify.ts, then
change the casts in the error handling expressions (e.g., the occurrences where
you have " (error.status || 500) as any") to "(error.status || 500) as
ContentfulStatusCode"; do the same for the other occurrence around line 76 so
all error status casts use ContentfulStatusCode instead of any.
packages/engine/src/routes/channel.ts (1)

90-90: ⚡ Quick win

Replace as any casts with proper ContentfulStatusCode typing.

Multiple error handlers cast status codes to any. Import and use ContentfulStatusCode from hono/utils/http-status for type safety.

♻️ Proposed fix

Add the import:

 import { Hono } from 'hono';
+import type { ContentfulStatusCode } from 'hono/utils/http-status';
 import { z } from 'zod';

Replace all occurrences:

-      }, status as any);
+      }, status as ContentfulStatusCode);
-      }, (error.status || 500) as any);
+      }, (error.status || 500) as ContentfulStatusCode);

Also applies to: 116-116, 147-147, 208-208, 271-271, 325-325, 381-381, 449-449, 475-475, 553-553, 609-609, 665-665

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/routes/channel.ts` at line 90, Import
ContentfulStatusCode from 'hono/utils/http-status' and replace the unsafe casts
of status to any in the error handlers: find all c.json(..., status as any)
usages and change them to use status as ContentfulStatusCode (or ensure the
local status variable is typed as ContentfulStatusCode). Specifically update the
error handler responses that call c.json to remove "as any" and use the imported
ContentfulStatusCode type so the second argument is correctly typed.
packages/engine/src/routes/dm.ts (1)

181-194: ⚡ Quick win

Validate limit query param (avoid silent fallback on invalid input).

parseInt(..., 10) can produce NaN, but getDmMessages already clamps via Math.min(Math.max(opts.limit || 50, 1), 100), so NaN doesn’t propagate to the SQL LIMIT (it silently falls back to 50, same as 0). Prefer zod query parsing and returning 400 on invalid limit instead of accepting bad input.

♻️ Coerce and validate query params
-      const limit = c.req.query('limit')
-        ? parseInt(c.req.query('limit')!, 10)
-        : undefined;
-      const before = c.req.query('before');
-      const after = c.req.query('after');
-
-      const conversationId = c.req.param('conversation_id');
+      const querySchema = z.object({
+        limit: z.coerce.number().int().positive().optional(),
+        before: z.string().optional(),
+        after: z.string().optional(),
+      });
+      const parsedQuery = querySchema.safeParse({
+        limit: c.req.query('limit'),
+        before: c.req.query('before'),
+        after: c.req.query('after'),
+      });
+      if (!parsedQuery.success) {
+        return c.json({ ok: false, error: { code: 'invalid_request', message: 'invalid query parameters' } }, 400);
+      }
+      const { limit, before, after } = parsedQuery.data;
+
+      const conversationId = c.req.param('conversation_id');
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/routes/dm.ts` around lines 181 - 194, The route silently
accepts invalid limit values because parseInt can return NaN; update the query
parsing for limit in dm route to validate and reject bad input (return 400)
instead of falling back silently: parse the raw c.req.query('limit') then check
Number.isInteger(parsed) and parsed between 1 and 100 (or use a zod schema to
coerce/validate the query param), and only pass a valid numeric limit to
dmEngine.getDmMessages (keep the other params and function call unchanged).
Ensure the validation logic references the same symbols (c.req.query, limit
variable, and dmEngine.getDmMessages) so invalid inputs are detected early and
cause a 400 response.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/publish-engine.yml:
- Around line 76-88: The run block currently interpolates the free-form input
github.event.inputs.custom_version directly into the shell, which allows shell
injection; instead pass the input via the workflow env (set an env: CUSTOM: ${{
github.event.inputs.custom_version }}) and in the run script reference it only
as a quoted variable ("$CUSTOM"), then use that quoted "$CUSTOM" in the npm
version call (or test/sanitize it before use) so commands like npm version
"$CUSTOM" --no-git-tag-version are executed safely without evaluating injected
shell content; update the step that defines CUSTOM and the npm version
invocations (the run block using CUSTOM, npm version, and NEW_VERSION)
accordingly.

In `@packages/engine/src/adapters/node/database.ts`:
- Around line 71-83: runMigrations applies each SQL file with sqlite.exec and
only records the file after exec succeeds, which can leave partial changes if a
mid-file statement fails; to fix, wrap each file's exec in a per-file
SAVEPOINT/RELEASE pattern: before calling sqlite.exec(ddl) create a unique
savepoint (e.g., SAVEPOINT sp_<file>), run sqlite.exec(ddl), then RELEASE the
savepoint and only then call insert.run(file, ...) and applied.push(file); on
any error, ROLLBACK TO the same savepoint and rethrow or surface the error so
the file is not marked as applied. This uses the existing loop over files, the
sqlite.exec call, and the insert.run/applied.push logic so only fully-applied
files get recorded.

In `@packages/engine/src/adapters/node/kv.ts`:
- Around line 9-33: InProcessKeyValueStore currently only evicts expired entries
lazily in live(), causing unbounded Map growth; add a time-gated cleanup to
proactively remove expired entries (similar to InProcessRateLimiter.cleanup):
add a private lastCleanup timestamp and cleanupIntervalMs constant to the
InProcessKeyValueStore class, implement a private cleanup() that iterates
this.store and deletes entries whose expiresAt is non-null and <= Date.now(),
and call cleanup() from put (and optionally get) when Date.now() - lastCleanup
>= cleanupIntervalMs to avoid doing a full sweep on every operation; ensure
cleanup updates lastCleanup after running and relies on the existing expiresAt
field structure so live(), get(), put(), and delete logic remains compatible.

In `@packages/engine/src/adapters/node/mcp.ts`:
- Line 51: The code uses global crypto.randomUUID() to create newSessionId which
can fail on Node runtimes without globalThis.crypto; update the module to import
randomUUID (or crypto) from 'node:crypto' and replace the global call with the
imported symbol (e.g., use imported randomUUID() where newSessionId is assigned)
so it matches other adapters and is robust across Node environments.

In `@packages/engine/src/engine/dm.ts`:
- Around line 387-391: The unread_count logic in listConversations currently
uses a total per-channel count (counts / countByChannel) which ignores per-agent
read state; change it to compute unread per-channel for the current agent by
using channelMembers.lastReadId (or readReceipts) instead of count(*).
Specifically, update the query that builds counts (variable counts) so it either
joins channelMembers filtered to the current agentId and counts messages where
messages.id > channelMembers.lastReadId, or if using readReceipts, count
messages not marked read by the agent; then set unread_count from this per-agent
countByChannel when constructing each conversation (replace usage of
countByChannel.get(conv.channelId) for unread_count). Reference:
listConversations, counts, messages, channelMembers.lastReadId, readReceipts,
channelIds, countByChannel, unread_count.
- Around line 493-515: The comparisons and ordering on messages.id (a TEXT
snowflake ID) must be numeric not lexicographic: change the where conditions,
ORDER BY and any max(...) usage to cast messages.id to integer (e.g. use
sql`CAST(${messages.id} AS INTEGER)` or equivalent) and compare against numeric
cursor values (parse opts.before/opts.after into numbers before calling
lt()/gt()); also update listConversations where max(${messages.id}) is used to
use max(CAST(${messages.id} AS INTEGER)) so “latest message” selection and
pagination follow numeric chronological order.

In `@packages/engine/src/engine/groupDm.ts`:
- Around line 46-91: The current implementation resolves participants in a loop
(the participantAgents resolution using db.select/from agents) and inserts
dmParticipants one-by-one, causing N+1 queries and mid-sequence partial writes;
replace the per-name loop with a single batched lookup using an IN filter (use
inArray against data.participants) to fetch all agents at once, compute the
unique agent IDs (combine creatorAgentId + fetched agent ids), and perform a
single batched insert for dmParticipants instead of per-agent inserts; also
perform the channel, dmConversations, and dmParticipants writes using your DB
abstraction’s atomic/batch primitive (e.g., a single multi-insert/batch
operation or the provided non-interactive transaction helper) so the three
entities (channels, dmConversations, dmParticipants) are created atomically
rather than relying on sequential inserts.

In `@packages/engine/src/engine/inbox.ts`:
- Around line 28-54: The mention query uses agentName (from the agent lookup)
directly which yields '%@%' when empty and treats '%'/'_' in names as wildcards;
update the code around agentName and the mentionRows DB query to first
short-circuit if agentName is empty (return an empty mentionRows / skip the LIKE
query) and otherwise escape any '%' '_' and '\' characters in agentName before
building the LIKE pattern, then use the escaped pattern in the messages.body
LIKE clause (and apply the SQL ESCAPE '\' or equivalent in your query builder)
to ensure only literal mentions like '`@agentName`' are matched; reference
variables/functions: agentName, agent lookup (const [agent] = await
db.select...), and the mentionRows query building/select from messages.

In `@packages/engine/src/engine/resyncQuery.ts`:
- Around line 17-23: In replayMissedEvents validate the incoming since timestamp
before converting to UNIX: parse the date (e.g., via new Date(since) or
Date.parse) and check for NaN (isNaN(date.getTime()) or Number.isNaN(parsed))
and if invalid throw a descriptive error (or return a clearly failing response)
instead of continuing; update the logic around sinceUnix in replayMissedEvents
so invalid cursors are surfaced immediately and not allowed to produce an empty
replay result.

In `@packages/engine/src/engine/routing.ts`:
- Around line 332-367: The current code reads existing.consecutiveFailures then
writes a fixed nextConsecutiveFailures, which races; change the
insert/onConflictDoUpdate so consecutiveFailures is computed atomically in SQL
(e.g. set consecutiveFailures = routingFailures.consecutiveFailures + 1 using
sql`${routingFailures.consecutiveFailures} + 1`) and compute circuitOpenUntil in
the same onConflict update with a CASE expression that checks the post-increment
value against config.circuit_breaker_threshold and returns either TIMESTAMP WITH
TIME ZONE (now + config.circuit_breaker_cooldown_seconds * INTERVAL '1 second')
or NULL; keep updating totalFailures as sql`${routingFailures.totalFailures} +
1`, lastFailureAt/lastError/updatedAt as now, and remove reliance on the
out-of-band `existing` value for consecutiveFailures/circuitOpenUntil
(references: routingFailures, consecutiveFailures, circuitOpenUntil,
getRoutingConfig, db.insert(...).onConflictDoUpdate).
- Around line 390-410: routeBySkill currently passes ftsQuery (from
buildFtsQuery) directly into the FTS MATCH queries, which throws when ftsQuery
is an empty string; guard by checking ftsQuery before running the two FTS rank
queries: if ftsQuery is falsy, skip the directory_skills_fts and
directory_agents_fts db.all calls and set skillFtsRaw and agentFtsRaw to empty
arrays (so downstream rank computation leaves ranks null); otherwise run them as
before—update the Promise.all construction in routeBySkill to conditionally
include those two queries or to branch early based on ftsQuery to avoid
executing MATCH "".

In `@packages/engine/src/engine/searchQuery.ts`:
- Around line 11-12: The current sanitization line .replace(/[^a-z0-9\s"]/g, '
') in packages/engine/src/engine/searchQuery.ts strips all non-ASCII characters
and must be replaced with a denylist/escaping approach so Unicode terms are
preserved; locate the MATCH-string construction where that replace appears and
change it to only remove or escape FTS5 special-syntax characters (e.g. quotes,
parentheses, operators and metacharacters like + - * " ^ ~ ( ) [ ] { } : \ / and
boolean keywords) instead of stripping non-Latin letters—ensure double quotes
used for phrase queries are properly escaped or balanced and retain all Unicode
letters/numbers (use Unicode-aware character classes for allowed letters/numbers
rather than an a-z allowlist) so non-English input (CJK, Cyrillic, accented
chars, emoji) is preserved.

In `@packages/engine/src/engine/sessionEvent.ts`:
- Around line 108-119: In listSessionEvents (sessionEventEngine) validate and
clamp options.limit before calling .limit(): parse the provided options?.limit
to a safe integer (treat non-numeric/NaN as default 100), enforce a lower bound
of 1 and an upper bound of 500 (e.g. Math.floor and clamp), then assign that
sanitized value to the local limit variable used in the query (replace the
current const limit = options?.limit ?? 100). This ensures negative or
non-numeric values cannot be passed into .limit().

In `@packages/engine/src/engine/usage.ts`:
- Around line 7-13: The incrementUsage function currently does a non-atomic
read-modify-write (kv.get → kv.put) causing lost-update races; change it to use
an atomic increment exposed by the KeyValueStore adapter (e.g., add and call
kv.increment(key, amount) or kv.transaction(key, fn)) or implement per-key
serialization/locking (mutex/queue) inside incrementUsage to ensure increments
for `usage:${workspaceId}:${metric}` are serialized; also remove the duplicated
metrics array by extracting a shared constant used by getUsageCounters and
resetUsageCounters; update the middleware usageTracker to call the new atomic
increment API for `usage:${workspace.id}:api_calls` (fire-and-forget OK only
after using atomic increment).

In `@packages/engine/src/lib/logger.ts`:
- Around line 198-208: When running in production with env.posthogApiKey the
code skips writeConsole and only calls sendToPostHog, causing silent log loss;
modify the logging path (in the logger function around
writeConsole/sendToPostHog and state.pending usage) so that sendToPostHog is
still attempted but failures fallback to console: either always call
writeConsole for level 'warn' and 'error' before/after the posthog call, or
attach a .catch on the promise returned by sendToPostHog to call
writeConsole(level, options.source, message, metadata) on rejection (and still
manage state.pending by adding the promise and deleting it in finally). Ensure
you reference writeConsole, sendToPostHog, and state.pending when updating the
logic.

In `@packages/engine/src/middleware/rateLimit.ts`:
- Around line 33-35: The entitlements.getLimits call is not covered by the
"fail-open" try/catch, so transient failures there will 500; move or guard that
lookup so an entitlements outage allows requests: wrap
entitlements.getLimits(workspace) in the same try/catch that currently surrounds
rateLimiter.check (or add a small try/catch just around getLimits) and on error
set a safe default (e.g., undefined/null) so limits.rate_per_min / globalLimit
is treated as no limit and the code proceeds to call rateLimiter.check; update
references to limits.rate_per_min / globalLimit accordingly so they handle the
missing value.

In `@packages/engine/src/middleware/usageTracker.ts`:
- Around line 8-13: The usage counter suffers TOCTOU races and NaN poisoning
because code in usageTracker.ts (and engine/usage.ts: incrementUsage) does
kv.get → parseInt(...) → kv.put using the KeyValueStore interface that only has
get/put/delete; add an atomic counter primitive to the KV port (e.g., add
increment(key: string, delta = 1): Promise<number> to KeyValueStore), implement
it in the adapters (InProcessKeyValueStore), and change usageTracker.ts and
incrementUsage to call kv.increment('usage:${workspace.id}:api_calls') and
handle/return the numeric result; also ensure the increment implementation
treats non-numeric values as 0 (not NaN) to avoid poisoning.

---

Nitpick comments:
In `@packages/engine/src/adapters/node/rate-limit.ts`:
- Around line 3-38: The implementation of InProcessRateLimiter is actually a
fixed-window limiter (single bucket with expiresAt) rather than a
sliding-window: update either the documentation or the implementation. To fix,
choose one: (A) change the class comment to say "fixed-window" and clarify
behavior so callers know bursts across window boundaries are allowed,
referencing InProcessRateLimiter, check, cleanup, and buckets; or (B) implement
a true sliding-window algorithm (e.g., keep two buckets per bucketKey with
timestamps and weight counts by time elapsed, or store timestamped event counts
and compute the sliding sum) in InProcessRateLimiter.check/cleanup so requests
are counted proportionally across the window and the burst at boundaries is
prevented. Ensure you update the class comment accordingly and keep bucket
storage/cleanup logic consistent with the chosen approach.

In `@packages/engine/src/adapters/node/realtime.ts`:
- Around line 257-315: Replace the ad-hoc JSON.parse + typeof checks in
onAgentMessage and onWorkspaceMessage with Zod validation: define a
z.discriminatedUnion schema (e.g., AgentFrameSchema) for agent frames with
variants {type: 'ping'}, {type: 'resync', last_seen_seq: z.number(), since:
z.string().optional()} and use AgentFrameSchema.safeParse(parsed) (or parse
inside try/catch) to reject malformed frames early; similarly add a minimal
WorkspaceFrameSchema for workspace frames (type: 'ping') used in
onWorkspaceMessage. Update onAgentMessage to only handle validated frames (use
the discriminant to narrow types so you can safely access last_seen_seq and
since) and return on validation failure, and add the necessary zod import and
exported schema names so types (e.g., AgentFrame, WorkspaceFrame) can be reused
elsewhere.

In `@packages/engine/src/engine/action.ts`:
- Line 1: The import statement in engine/action.ts currently brings in an unused
symbol `sql` from 'drizzle-orm'; remove `sql` from the import list so only used
symbols (`eq`, `and`) are imported, e.g., update the import that references eq,
and, sql to exclude `sql` to satisfy static analysis and prevent unused-import
warnings.

In `@packages/engine/src/engine/agent.ts`:
- Line 2: The import statement in packages/engine/src/engine/agent.ts currently
brings in unused symbols `sql` and `or`; update the import from 'drizzle-orm' to
remove `sql` and `or` (keeping only used imports `eq`, `and`, `lt`, `inArray`)
and run a quick grep/IDE search for `sql` and `or` to confirm there are no
remaining references before committing.

In `@packages/engine/src/engine/channel.ts`:
- Around line 276-299: The joinChannel flow currently queries channelMembers
then inserts, which can race; modify the insert in the joinChannel function to
use onConflictDoNothing() on channelMembers (matching the unique constraint on
channelId+agentId) so concurrent inserts won't raise a constraint error, then
determine already_member by checking the pre-check "existing" or the insert
result (affectedRows === 0) before calling invalidateChannelCache(workspaceId,
channelName); keep references to channelMembers, joinChannel, and
invalidateChannelCache to locate the change.

In `@packages/engine/src/engine/directory.ts`:
- Around line 621-640: The JSON.parse calls inside the rows.map that builds
hydratedRows (producing DirectoryAgentRow objects) can throw on malformed DB
values; update the mapping in the DirectoryAgentRow construction (the rows.map
callback) to defensively parse tags, capabilities, and metadata by wrapping each
JSON.parse in try-catch (or extract a small helper like safeParseJson used by
the map) and return sane defaults (e.g., [] for tags, {} for
capabilities/metadata) on error and optionally log/debug the parse failure;
ensure the rest of the mapped fields remain unchanged.

In `@packages/engine/src/engine/dmAll.ts`:
- Around line 13-23: The function listAllDmConversations performs an unbounded
workspace-wide select; update it to support paginated listing by adding optional
parameters (e.g., limit: number and cursor: string or {createdAt: string, id:
string}) to the function signature and use them in the query (apply an ORDER BY
on createdAt, id for deterministic paging and a WHERE clause to filter >/< the
cursor), use .limit(limit + 1) to detect next page, and return a page payload
(items, nextCursor). Change references in callers of listAllDmConversations
accordingly so admin/console paths request a sensible default limit.

In `@packages/engine/src/engine/groupDm.ts`:
- Around line 10-38: fetchAttachmentsBatch is duplicated in this file and dm.ts;
extract it into a single shared helper to avoid divergence. Create a new shared
attachments helper module that exports fetchAttachmentsBatch (keeping the
signature Promise<Map<string, AttachmentRow[]>>) and ensure it imports/uses the
same query symbols (messageAttachments, files, eq, inArray, asc, and types like
Db and AttachmentRow); then replace the local implementations in both groupDm.ts
and dm.ts with a single import from that helper and remove the duplicated
function bodies. Ensure TypeScript types and exports/imports match the consumers
so callers compile without changes.

In `@packages/engine/src/engine/routing.ts`:
- Around line 96-106: The withDefaultWeights function currently uses ad-hoc
typeof checks to parse an unknown JSON into RoutingWeights; replace this with a
zod schema: create a z.object({...}).partial() schema that defines numeric
fields matching RoutingWeights, use .catch(DEFAULT_WEIGHTS) or .parse/coerce to
apply defaults from DEFAULT_WEIGHTS and ensure non-numeric/missing values fall
back to defaults; also reuse or extend that schema (and validate ranges) in
setRoutingConfig for circuit_breaker_threshold and
circuit_breaker_cooldown_seconds to reject out-of-range values instead of
relying on manual guards.

In `@packages/engine/src/engine/sessionEvent.ts`:
- Around line 44-58: Replace the duplicated literal lists by defining a single
const tuple (e.g. SESSION_EVENT_TYPES) containing all event strings, then derive
the union type and the Set from it: declare SessionEventType as typeof
SESSION_EVENT_TYPES[number] and build VALID_EVENT_TYPES via new
Set(SESSION_EVENT_TYPES); update any existing exports/uses that referred to
SessionEventType and VALID_EVENT_TYPES to use these derived values so the list
is maintained in one place.

In `@packages/engine/src/engine/snowflake.ts`:
- Around line 36-41: The implicit 10-bit worker id from
fnv1a10Bits(getIsolateId()) can collide across processes; modify getWorkerId
(and where workerId is set) to prefer an explicit configured worker id (e.g.
read from an env var like ENGINE_WORKER_ID or from startup config) parsed as an
integer and validated to be within 0..1023 (mask with 0x3ff or throw/log if out
of range), falling back to the existing fnv1a10Bits(getIsolateId()) only when no
explicit value is provided; update any constructor/initialization code that sets
this.workerId to read and validate the env/config value and ensure process-level
uniqueness documentation is noted.

In `@packages/engine/src/engine/usage.ts`:
- Around line 15-34: The metrics array is duplicated between getUsageCounters
and resetUsageCounters; extract a shared constant (e.g. const USAGE_METRICS =
['messages','api_calls','files','file_bytes','ws_minutes'] as const) and use
that constant in both functions (replace the inline arrays in getUsageCounters
and resetUsageCounters with USAGE_METRICS); optionally derive a type with typeof
USAGE_METRICS[number] for strong typing where needed to ensure the two functions
cannot diverge.

In `@packages/engine/src/engine/workspace.ts`:
- Around line 68-92: The workspace and default channel creation (the insert into
workspaces via db.insert(workspaces) and the insert into channels) must be
executed inside a single database transaction so both succeed or both roll back;
modify the function that calls db.insert(workspaces) and
generateId()/db.insert(channels) to run within a transaction (use your DB
client's transaction API, e.g., db.transaction or equivalent) and perform both
inserts and buildWorkspaceResponse(apiKey) inside that transactional callback so
failure to create the `#general` channel rolls back the workspace insert.
- Around line 54-61: The return value in createWorkspace currently returns both
a nested workspace object and also spreads its fields to the top level (return {
workspace, created: false, ...workspace }), which causes an intentional
“double-flattening” (workspace.* and top-level workspace_id/api_key/created_at)
for engine-level consumers and backward compatibility; add a brief inline
comment immediately above the return in createWorkspace (and mirror in the
create path where created: true is returned) stating that we intentionally
include both result.workspace and top-level fields to preserve API shape for
HTTP responses and support direct engine callers/tests, referencing
buildWorkspaceResponse to show the source of the flattened fields.

In `@packages/engine/src/lib/workspaceStream.ts`:
- Around line 14-78: The module-level cache (cache Map) only invalidates entries
in-process, so setWorkspaceStreamOverride clears only this instance and other
instances can serve stale values for up to CACHE_TTL_MS; update
getWorkspaceStreamConfig to lazily evict expired entries by removing
cache.delete(workspaceId) when an entry is found with expiresAt <= now, and add
a TODO comment on setWorkspaceStreamOverride noting cross-instance invalidation
is required for multi-instance deployments (e.g., use pub/sub or a shorter TTL)
and consider adding a background/periodic or on-access eviction strategy if
tenant cardinality grows.

In `@packages/engine/src/routes/action.ts`:
- Line 78: Import ContentfulStatusCode from 'hono/utils/http-status' at the top
of packages/engine/src/routes/action.ts and replace every cast of (error.status
|| 500) as any with (error.status || 500) as ContentfulStatusCode; specifically
update each occurrence where status is set using that expression (e.g., in the
action route error handlers around the existing uses at the noted locations) so
the response status uses the proper ContentfulStatusCode type instead of any.

In `@packages/engine/src/routes/agent.ts`:
- Around line 299-305: Extract the hardcoded validClis array used in the spawn
handler into a shared constant (e.g., VALID_AGENT_CLIS) so it can be reused and
maintained centrally; define it at module scope or in a shared constants module
(export if shared) as a readonly tuple (use "as const") and then replace the
local validClis reference in the spawn handler with VALID_AGENT_CLIS, keeping
the same includes-based validation and error message construction
(VALID_AGENT_CLIS.join(', ')).

In `@packages/engine/src/routes/certify.ts`:
- Line 56: Replace the loose "as any" status casts with the ContentfulStatusCode
type: add an import for ContentfulStatusCode at the top of
packages/engine/src/routes/certify.ts, then change the casts in the error
handling expressions (e.g., the occurrences where you have " (error.status ||
500) as any") to "(error.status || 500) as ContentfulStatusCode"; do the same
for the other occurrence around line 76 so all error status casts use
ContentfulStatusCode instead of any.

In `@packages/engine/src/routes/channel.ts`:
- Line 90: Import ContentfulStatusCode from 'hono/utils/http-status' and replace
the unsafe casts of status to any in the error handlers: find all c.json(...,
status as any) usages and change them to use status as ContentfulStatusCode (or
ensure the local status variable is typed as ContentfulStatusCode). Specifically
update the error handler responses that call c.json to remove "as any" and use
the imported ContentfulStatusCode type so the second argument is correctly
typed.

In `@packages/engine/src/routes/console.ts`:
- Line 55: The error handlers in packages/engine/src/routes/console.ts cast
status codes to any; import ContentfulStatusCode from 'hono/utils/http-status'
and replace the casts such as (error.status || 500) as any with (error.status ||
500) as ContentfulStatusCode in each error-response call (the arrow error
handler callbacks used to send responses at the locations flagged — e.g., the
error handler around line 55 and the similar handlers at the other flagged
spots). Ensure the new import is added at top of the file and each cast uses
ContentfulStatusCode.

In `@packages/engine/src/routes/directory.ts`:
- Line 69: The status code is currently cast to any; import ContentfulStatusCode
from 'hono/utils/http-status' and replace the cast on the expression
(error.status || 500) to use ContentfulStatusCode instead of any so the response
status uses the proper type (i.e., change (error.status || 500) as any to
(error.status || 500) as ContentfulStatusCode); update the import list to
include ContentfulStatusCode and ensure the surrounding error handling that
references error.status remains unchanged.
- Line 62: The handleError function currently types its context parameter as
any; change its signature to accept a properly typed Context<AppEnv> (i.e.,
function handleError(c: Context<AppEnv>, err: unknown)) and import or reference
the Context type and AppEnv interface where needed so the route handler context
is type-checked; update any call sites if necessary to satisfy the new type.

In `@packages/engine/src/routes/dm.ts`:
- Around line 181-194: The route silently accepts invalid limit values because
parseInt can return NaN; update the query parsing for limit in dm route to
validate and reject bad input (return 400) instead of falling back silently:
parse the raw c.req.query('limit') then check Number.isInteger(parsed) and
parsed between 1 and 100 (or use a zod schema to coerce/validate the query
param), and only pass a valid numeric limit to dmEngine.getDmMessages (keep the
other params and function call unchanged). Ensure the validation logic
references the same symbols (c.req.query, limit variable, and
dmEngine.getDmMessages) so invalid inputs are detected early and cause a 400
response.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

Comment thread .github/workflows/publish-engine.yml
Comment thread packages/engine/src/adapters/node/database.ts
Comment thread packages/engine/src/adapters/node/kv.ts
Comment thread packages/engine/src/adapters/node/mcp.ts Outdated
Comment on lines +387 to +391
const counts = await db
.select({ channelId: messages.channelId, count: sql<number>`count(*)` })
.from(messages)
.where(inArray(messages.channelId, channelIds))
.groupBy(messages.channelId);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
fd -t f --glob 'receipt.ts' packages/engine/src/engine --exec sed -n '1,80p' {}
rg -nP "lastRead|last_read|read_at|readAt" -g '**/db/schema.ts'

Repository: AgentWorkforce/relaycast

Length of output: 2713


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="packages/engine/src/engine/dm.ts"
echo "== unread_count occurrences =="
rg -n "unread_count" "$FILE" || true

echo
echo "== lastReadId / readReceipts usage =="
rg -n "lastReadId|readReceipts|read_at|readAt" "$FILE" || true

echo
echo "== dm.ts around 340-470 =="
sed -n '340,470p' "$FILE"

Repository: AgentWorkforce/relaycast

Length of output: 4279


Fix unread_count in listConversations to use read-state instead of count(*).

listConversations derives unread_count from a total message count per channel:

const counts = await db
  .select({ channelId: messages.channelId, count: sql<number>`count(*)` })
  .from(messages)
  .where(inArray(messages.channelId, channelIds))
  .groupBy(messages.channelId);

and then returns unread_count: countByChannel.get(conv.channelId) ?? 0 without consulting channelMembers.lastReadId / readReceipts, so it includes already-read (and likely the caller’s own) messages and will not decrease. Use per-agent read-state to compute actual unread for the conversation’s channel.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/engine/dm.ts` around lines 387 - 391, The unread_count
logic in listConversations currently uses a total per-channel count (counts /
countByChannel) which ignores per-agent read state; change it to compute unread
per-channel for the current agent by using channelMembers.lastReadId (or
readReceipts) instead of count(*). Specifically, update the query that builds
counts (variable counts) so it either joins channelMembers filtered to the
current agentId and counts messages where messages.id >
channelMembers.lastReadId, or if using readReceipts, count messages not marked
read by the agent; then set unread_count from this per-agent countByChannel when
constructing each conversation (replace usage of
countByChannel.get(conv.channelId) for unread_count). Reference:
listConversations, counts, messages, channelMembers.lastReadId, readReceipts,
channelIds, countByChannel, unread_count.

Comment thread packages/engine/src/engine/sessionEvent.ts Outdated
Comment thread packages/engine/src/engine/usage.ts
Comment thread packages/engine/src/lib/logger.ts Outdated
Comment thread packages/engine/src/middleware/rateLimit.ts Outdated
Comment thread packages/engine/src/middleware/usageTracker.ts Outdated
Adds scripts/e2e-actions.ts exercising the NEW action contract end-to-end
against a running server, distinct from the legacy command API.

Actions are an async agent-to-agent RPC (not a synchronous command):
handler registers (ownership enforced) → caller invokes (gets invocation_id,
status 'invoked') → handler receives action.invoked over WS → handler completes
→ caller receives action.completed over WS → poll GET invocation shows
completed + output → delete.

10/10 pass against `relaycast-engine`. Run:
  npm run e2e:actions -- http://localhost:8787

Note: the engine intentionally does NOT alias /v1/commands → /v1/actions.
They share lineage (migration 0010 renames commands→actions and migrates rows)
but the contracts are incompatible: command-invoke returned an inline result,
whereas action-invoke is async with a handler/caller completion lifecycle.
Actions is the canonical contract going forward; the SDK's command methods are
superseded and should migrate to actions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="scripts/e2e-actions.ts">

<violation number="1" location="scripts/e2e-actions.ts:159">
P2: Assertion empty here. Mismatch does nothing. Test can pass with wrong handler. Throw on mismatch.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread scripts/e2e-actions.ts Outdated
…nds)

`npm run e2e -- --next-version` skips the legacy /v1/commands section and
instead exercises the actions contract end-to-end: handler registers (ownership
enforced) → caller invokes (invocation_id, status invoked) → handler receives
action.invoked over WS → handler completes → caller receives action.completed
over WS → GET invocation shows completed + output → delete.

Default (no flag) keeps the legacy command section unchanged. Captures the
lead/backend agent tokens (incl. across registerOrRotate) for the raw /v1/actions
HTTP calls.

Against relaycast-engine: 106 passed, 1 failed (the lone remaining failure is an
unrelated A2A re-register-after-remove edge case, not commands/actions).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="scripts/e2e.ts">

<violation number="1" location="scripts/e2e.ts:1184">
P2: The raw WebSocket message handler can throw on non-JSON payloads, which can crash the e2e run unexpectedly.</violation>

<violation number="2" location="scripts/e2e.ts:1205">
P2: Action WS connect bypasses `run(...)`. One connect error aborts whole script, even in continue mode. Wrap this await in a `run` step.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread scripts/e2e.ts Outdated
Comment thread scripts/e2e.ts Outdated
removeA2aAgent intentionally soft-removes — it deletes the a2a_agents row but
keeps the relay proxy agent (marked offline, a2a_active:false) to preserve DM
history. But registerA2aAgent then always called registerAgent(), which tripped
the agents (workspace_id, name) UNIQUE constraint on re-registration of the same
external agent ("Agent already exists").

Now registerA2aAgent detects a leftover A2A proxy of the same name and
reactivates + rotates its token instead of inserting a duplicate. A name owned
by a non-proxy agent still conflicts (409).

Fixes the "Re-register mock A2A agent" e2e failure: `npm run e2e -- --next-version`
against relaycast-engine now reports 107 passed, 0 failed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/engine/src/engine/cache.ts">

<violation number="1" location="packages/engine/src/engine/cache.ts:48">
P2: Cache evict only on read. Stale keys that never read again stay forever. Prune expired entries during writes (or on interval) to prevent unbounded memory growth.</violation>
</file>

<file name="packages/engine/src/engine/file.ts">

<violation number="1" location="packages/engine/src/engine/file.ts:19">
P2: Create URL before DB insert. Avoid orphan pending rows on storage error.</violation>

<violation number="2" location="packages/engine/src/engine/file.ts:61">
P2: Move status update after URL generation. Keep completeUpload retry-safe on transient storage errors.</violation>
</file>

<file name="packages/engine/src/engine/eventQueue.ts">

<violation number="1" location="packages/engine/src/engine/eventQueue.ts:35">
P2: Cleanup uses created time, not completion time. Old queued jobs can get deleted right after finishing. Filter by `completedAt` instead.</violation>
</file>

<file name="packages/engine/src/db/migrations/0010_actions_and_harness.sql">

<violation number="1" location="packages/engine/src/db/migrations/0010_actions_and_harness.sql:92">
P2: Default status wrong here. Set to accepted to match schema/runtime.</violation>
</file>

<file name="packages/engine/src/engine/index.ts">

<violation number="1" location="packages/engine/src/engine/index.ts:1">
P2: Barrel exports have no consumers and duplicate public API symbols.

The entire `engine/index.ts` barrel is unused — no internal file imports from `./engine/index` or `./engine/index.js`, the `package.json` exports map does not expose it, and routes import directly from individual engine modules (e.g., `'../engine/workspace.js'`). The Snowflake symbols (`SnowflakeGenerator`, `getSnowflakeGenerator`, `generateId`) on line 1 duplicate the same re-exports already in `src/index.ts` (line 32). This adds maintenance surface (9 exports to keep in sync) with zero active consumers.</violation>
</file>

<file name="packages/engine/src/db/migrations/0001_fts5_search.sql">

<violation number="1" location="packages/engine/src/db/migrations/0001_fts5_search.sql:21">
P1: This migration never backfills the new external-content FTS index, so existing messages can be missing from search results after migration.</violation>
</file>

<file name="packages/engine/src/db/migrations/meta/_journal.json">

<violation number="1" location="packages/engine/src/db/migrations/meta/_journal.json:2">
P2: _journal.json tracks 1 migration (0000_broad_dagger) but 14 SQL migration files exist (0000–0013). Drizzle Kit reads this journal to determine migration state; future `drizzle-kit generate` calls will see only migration 0000 and generate a new migration duplicating changes already in 0001–0013, producing corrupt history.</violation>
</file>

<file name="packages/engine/src/db/migrations/0013_status_active.sql">

<violation number="1" location="packages/engine/src/db/migrations/0013_status_active.sql:4">
P2: Data migration updates existing rows but leaves column DEFAULT as 'online'. New rows inserted without an explicit status get the deprecated default. Migration 0012 acknowledged the same SQLite limitation with a comment; this migration should too, or handle it if possible.</violation>
</file>

<file name=".trajectories/index.json">

<violation number="1" location=".trajectories/index.json:4">
P2: Clearing `trajectories` to an empty object orphaned existing active/completed trajectory files and effectively drops discoverability/history from the index.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread packages/engine/src/engine/a2a.ts Outdated
…tness)

Fixes legitimate review findings — including in code lifted from the cloud copy,
since the engine is now the canonical source and that code was lightly tested.

Security / correctness:
- a2a: SSRF guard (lib/ssrf.ts) on agent-card + send + health fetches; reject
  non-http(s) always, with opt-in strict mode (block loopback/private/metadata)
  for multi-tenant hosting. Self-host stays permissive (LAN/localhost agents).
- a2a-health: bound each health ping with a 5s AbortSignal timeout so one hung
  endpoint can't stall the sweep.
- a2a: tighten proxy-reuse to require the a2a marker (not just type 'external').
- search: a requested channel/from filter that resolves to nothing now returns
  no results instead of silently running unfiltered; numeric (CAST) cursor +
  stable id tiebreaker for deterministic paging.
- receipt.markRead: authorize the agent (channel member or DM participant)
  before recording a read receipt.
- eventDelivery: don't swallow subscription-lookup errors (rethrow so the queue
  retries instead of silently dropping); treat 408/429 as retryable.
- auth: invalid agent token returns `agent_token_invalid` (the documented code).
- migration 0012: deduplicate before creating the unique indexes.
- dmAll / activity: numeric id compare/sort + stable tiebreaker.
- event-id: empty-input sentinel can no longer collide with real input.

Node adapter robustness:
- database: apply each migration (DDL + record) in one transaction (atomic).
- index: close the SQLite handle if startup migrations throw.
- kv: periodic TTL prune (+ dispose) so one-shot keys can't accumulate.
- presence: reclaim empty workspace maps during sweep.
- resyncQuery: validate `since` (invalid → no replay, not a NaN query).
- rateLimit: entitlements lookup is inside the fail-open boundary.

Tooling:
- publish-engine.yml: bind inputs to env to avoid shell injection.
- e2e/e2e-actions: real assertion for "get action by name"; wrap action WS
  connect in run(); guard the raw WS handler against non-JSON frames.

Verified: tsc 0 errors, eslint 0 errors, 13/13 conformance, and
`npm run e2e -- --next-version` → 108 passed, 0 failed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 20 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/engine/src/engine/cache.ts">

<violation number="1" location="packages/engine/src/engine/cache.ts:48">
P2: Cache evict only on read. Stale keys that never read again stay forever. Prune expired entries during writes (or on interval) to prevent unbounded memory growth.</violation>
</file>

<file name="packages/engine/src/engine/eventQueue.ts">

<violation number="1" location="packages/engine/src/engine/eventQueue.ts:35">
P2: Cleanup uses created time, not completion time. Old queued jobs can get deleted right after finishing. Filter by `completedAt` instead.</violation>
</file>

<file name="packages/engine/src/db/migrations/0010_actions_and_harness.sql">

<violation number="1" location="packages/engine/src/db/migrations/0010_actions_and_harness.sql:92">
P2: Default status wrong here. Set to accepted to match schema/runtime.</violation>
</file>

<file name="packages/engine/src/db/migrations/0001_fts5_search.sql">

<violation number="1" location="packages/engine/src/db/migrations/0001_fts5_search.sql:21">
P1: This migration never backfills the new external-content FTS index, so existing messages can be missing from search results after migration.</violation>
</file>

<file name="packages/engine/src/db/migrations/meta/_journal.json">

<violation number="1" location="packages/engine/src/db/migrations/meta/_journal.json:2">
P2: _journal.json tracks 1 migration (0000_broad_dagger) but 14 SQL migration files exist (0000–0013). Drizzle Kit reads this journal to determine migration state; future `drizzle-kit generate` calls will see only migration 0000 and generate a new migration duplicating changes already in 0001–0013, producing corrupt history.</violation>
</file>

<file name=".trajectories/index.json">

<violation number="1" location=".trajectories/index.json:4">
P2: Clearing `trajectories` to an empty object orphaned existing active/completed trajectory files and effectively drops discoverability/history from the index.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread packages/engine/src/middleware/rateLimit.ts Outdated
willwashburn and others added 2 commits May 29, 2026 14:14
- search FTS sanitization keeps Unicode letters/numbers (non-English search).
- sessionEvent: clamp/validate limit (NaN/negative/over-cap).
- node/mcp: import randomUUID from node:crypto (don't rely on global crypto).
- file: sign upload URL before insert (no orphan pending row); generate download
  URL before flipping status to complete (retry-safe).
- inbox: guard the mention LIKE against an empty agent name + escape % / _.
- routing: guard routeBySkill against an empty FTS query before MATCH.
- dm: numeric (CAST) cursor compare + sort for snowflake ids.
- logger: always write to console so prod logs aren't lost if OTLP export fails.
- usage: atomic KeyValueStore.increment (new port method) replaces the racy
  get→parse→put in incrementUsage + usageTracker.
- migration 0013: document the inert SQLite ALTER-DEFAULT limitation.
- remove unused engine/index.ts barrel.

Verified: tsc/eslint clean, 13/13 conformance, e2e --next-version 108/0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lements failure

Re-review correction: moving entitlements.getLimits inside the fail-open
disabled rate limiting entirely when the lookup failed (unlimited throughput).
Now an entitlements failure falls back to a conservative 300/min limit that is
still enforced; only the limiter backend failing is fail-open.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@willwashburn
Copy link
Copy Markdown
Member Author

Review findings — resolution summary

Worked through the automated review (cubic + CodeRabbit). 65 threads resolved with fixes pushed across 4 commits. Since the engine is the canonical source going forward, I fixed legitimate findings in lifted code too — not just new code.

Fixed (security / correctness): A2A SSRF guard (scheme always; private/loopback opt-in strict for multi-tenant) + health-ping timeout; search returns empty for unresolved channel/from filters + numeric cursor & stable sort; receipt.markRead authorizes channel/DM membership; eventDelivery rethrows lookup errors (no silent drop) + 408/429 retryable; auth returns agent_token_invalid; migration 0012 dedupes before unique indexes; numeric id cursors in dm/dmAll; event-id empty-input collision; FTS keeps Unicode; A2A proxy-reuse tightened; rate-limit falls back to a conservative default on entitlements failure (still enforced).

Fixed (Node adapter robustness): atomic migrations + handle-close on failure; KV periodic prune + atomic increment (fixes usage-counter lost updates); presence map reclamation; resync since validation; always-console logging.

Fixed (tooling): publish-engine.yml shell-injection; e2e assertions + WS guards.

Intentionally left open (with rationale):

  • 0001_fts5_search.sql backfill — moot: in the engine's migration order the FTS table is created before any messages exist; triggers index all subsequent rows.
  • 0010_actions_and_harness.sql:92 default status — false positive: action_invocations.status='invoked' is correct for the action lifecycle; accepted is the unrelated deliveries status.
  • meta/_journal.json — vestigial for the engine: the Node migration runner reads .sql files directly and tracks state in its own _engine_migrations table; it never reads the drizzle journal.
  • routing.ts:367 (atomic circuit-breaker counter), dm.ts:391 (unread_count via count(*)), groupDm.ts:91 (N+1), eventQueue.ts:35 (cleanup by created vs completed — needs a schema column), dmAll.ts:48 (latest-message max(id) — correct for same-width snowflakes), cache.ts:48 (bounded by channel count) — real-but-lower-impact / heavier refactors tracked as follow-ups, not blockers for the extraction.
  • .trajectories/index.json — the pr-reviewer bot's own change, not part of this work.

Verified after every batch: tsc + eslint clean, 13/13 conformance, and npm run e2e -- --next-version108 passed, 0 failed.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 15 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/engine/src/engine/cache.ts">

<violation number="1" location="packages/engine/src/engine/cache.ts:48">
P2: Cache evict only on read. Stale keys that never read again stay forever. Prune expired entries during writes (or on interval) to prevent unbounded memory growth.</violation>
</file>

<file name="packages/engine/src/engine/eventQueue.ts">

<violation number="1" location="packages/engine/src/engine/eventQueue.ts:35">
P2: Cleanup uses created time, not completion time. Old queued jobs can get deleted right after finishing. Filter by `completedAt` instead.</violation>
</file>

<file name="packages/engine/src/db/migrations/0010_actions_and_harness.sql">

<violation number="1" location="packages/engine/src/db/migrations/0010_actions_and_harness.sql:92">
P2: Default status wrong here. Set to accepted to match schema/runtime.</violation>
</file>

<file name="packages/engine/src/db/migrations/0001_fts5_search.sql">

<violation number="1" location="packages/engine/src/db/migrations/0001_fts5_search.sql:21">
P1: This migration never backfills the new external-content FTS index, so existing messages can be missing from search results after migration.</violation>
</file>

<file name="packages/engine/src/db/migrations/meta/_journal.json">

<violation number="1" location="packages/engine/src/db/migrations/meta/_journal.json:2">
P2: _journal.json tracks 1 migration (0000_broad_dagger) but 14 SQL migration files exist (0000–0013). Drizzle Kit reads this journal to determine migration state; future `drizzle-kit generate` calls will see only migration 0000 and generate a new migration duplicating changes already in 0001–0013, producing corrupt history.</violation>
</file>

<file name=".trajectories/index.json">

<violation number="1" location=".trajectories/index.json:4">
P2: Clearing `trajectories` to an empty object orphaned existing active/completed trajectory files and effectively drops discoverability/history from the index.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread packages/engine/src/db/migrations/0013_status_active.sql
Comment thread packages/engine/src/engine/dm.ts Outdated
Comment thread packages/engine/src/adapters/node/kv.ts Outdated
- registerAgent now sets status: 'active' explicitly. It previously omitted
  status, so on a DB migrated before 0013 (column DEFAULT still 'online') new
  agents were created with the deprecated 'online' status. Makes the 0013 note
  accurate too.
- Revert the CAST(id AS INTEGER) cursor/sort changes in dm/dmAll/search: casting
  defeats the PK index (full scan on large channels). Snowflake ids are
  fixed-width (19 digits) so the indexed text order already matches numeric
  order — keep the indexed comparison.
- kv.increment: strict integer parse so a non-numeric stored value is treated as
  0 instead of parseInt's partial parse ("12abc" → 12).

Verified: tsc/eslint clean, 13/13 conformance, e2e --next-version 108/0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 5 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/engine/src/engine/dm.ts">

<violation number="1" location="packages/engine/src/engine/dm.ts:499">
P1: Use numeric compare for snowflake bounds. Text compare can break pagination when id width changes. Switch back to numeric CAST in both before/after filters.</violation>

<violation number="2" location="packages/engine/src/engine/dm.ts:517">
P1: Sort by numeric snowflake value, not text. Text DESC can return wrong message order across id-width boundaries.</violation>
</file>

<file name="packages/engine/src/engine/dmAll.ts">

<violation number="1" location="packages/engine/src/engine/dmAll.ts:129">
P2: Text compare on snowflake id can break cursor paging. Snowflake string width grows over time, so lexical order can diverge from numeric order at width boundaries. Keep numeric compare/sort for before/after queries.</violation>

<violation number="2" location="packages/engine/src/engine/dmAll.ts:143">
P1: Text compare on message id breaks pagination order. Snowflake string is variable-length, not fixed-width. Use numeric compare for before/after/order.</violation>
</file>

<file name="packages/engine/src/engine/search.ts">

<violation number="1" location="packages/engine/src/engine/search.ts:71">
P1: ID compare now text compare. Pagination and tie-break sort can go wrong on numeric snowflake IDs. Use numeric compare here.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

.from(messages)
.innerJoin(agents, eq(messages.agentId, agents.id))
.where(and(...conditions))
.orderBy(sql`${messages.id} DESC`)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Sort by numeric snowflake value, not text. Text DESC can return wrong message order across id-width boundaries.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/engine/src/engine/dm.ts, line 517:

<comment>Sort by numeric snowflake value, not text. Text DESC can return wrong message order across id-width boundaries.</comment>

<file context>
@@ -513,7 +514,7 @@ export async function getDmMessages(
     .innerJoin(agents, eq(messages.agentId, agents.id))
     .where(and(...conditions))
-    .orderBy(sql`CAST(${messages.id} AS INTEGER) DESC`)
+    .orderBy(sql`${messages.id} DESC`)
     .limit(limit);
 
</file context>

// (19 digits) so lexical order matches numeric order, and this keeps the PK
// index usable for range scans (a CAST would force a full scan).
if (opts.before) {
conditions.push(lt(messages.id, opts.before));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Use numeric compare for snowflake bounds. Text compare can break pagination when id width changes. Switch back to numeric CAST in both before/after filters.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/engine/src/engine/dm.ts, line 499:

<comment>Use numeric compare for snowflake bounds. Text compare can break pagination when id width changes. Switch back to numeric CAST in both before/after filters.</comment>

<file context>
@@ -492,13 +492,14 @@ export async function getDmMessages(
+  // index usable for range scans (a CAST would force a full scan).
   if (opts.before) {
-    conditions.push(sql`CAST(${messages.id} AS INTEGER) < CAST(${opts.before} AS INTEGER)`);
+    conditions.push(lt(messages.id, opts.before));
   }
   if (opts.after) {
</file context>

.from(messages)
.innerJoin(agents, eq(messages.agentId, agents.id))
.where(and(...conditions))
.orderBy(sql`${messages.id} DESC`)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Text compare on message id breaks pagination order. Snowflake string is variable-length, not fixed-width. Use numeric compare for before/after/order.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/engine/src/engine/dmAll.ts, line 143:

<comment>Text compare on message id breaks pagination order. Snowflake string is variable-length, not fixed-width. Use numeric compare for before/after/order.</comment>

<file context>
@@ -139,7 +140,7 @@ export async function getDmMessagesForWorkspace(
     .innerJoin(agents, eq(messages.agentId, agents.id))
     .where(and(...conditions))
-    .orderBy(sql`CAST(${messages.id} AS INTEGER) DESC`)
+    .orderBy(sql`${messages.id} DESC`)
     .limit(limit);
 
</file context>

Comment on lines +71 to +73
${opts.before ? sql`AND m.id < ${opts.before}` : sql``}
${opts.after ? sql`AND m.id > ${opts.after}` : sql``}
ORDER BY rank, m.id DESC
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: ID compare now text compare. Pagination and tie-break sort can go wrong on numeric snowflake IDs. Use numeric compare here.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/engine/src/engine/search.ts, line 71:

<comment>ID compare now text compare. Pagination and tie-break sort can go wrong on numeric snowflake IDs. Use numeric compare here.</comment>

<file context>
@@ -68,9 +68,9 @@ export async function searchMessages(
-      ${opts.before ? sql`AND CAST(m.id AS INTEGER) < CAST(${opts.before} AS INTEGER)` : sql``}
-      ${opts.after ? sql`AND CAST(m.id AS INTEGER) > CAST(${opts.after} AS INTEGER)` : sql``}
-    ORDER BY rank, CAST(m.id AS INTEGER) DESC
+      ${opts.before ? sql`AND m.id < ${opts.before}` : sql``}
+      ${opts.after ? sql`AND m.id > ${opts.after}` : sql``}
+    ORDER BY rank, m.id DESC
</file context>
Suggested change
${opts.before ? sql`AND m.id < ${opts.before}` : sql``}
${opts.after ? sql`AND m.id > ${opts.after}` : sql``}
ORDER BY rank, m.id DESC
${opts.before ? sql`AND CAST(m.id AS INTEGER) < CAST(${opts.before} AS INTEGER)` : sql``}
${opts.after ? sql`AND CAST(m.id AS INTEGER) > CAST(${opts.after} AS INTEGER)` : sql``}
ORDER BY rank, CAST(m.id AS INTEGER) DESC

// (19 digits) so lexical order matches numeric order, keeping the PK index
// usable for range scans (a CAST would force a full scan).
const conditions = [eq(messages.channelId, conv.channelId)];
if (opts.before) conditions.push(lt(messages.id, opts.before));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Text compare on snowflake id can break cursor paging. Snowflake string width grows over time, so lexical order can diverge from numeric order at width boundaries. Keep numeric compare/sort for before/after queries.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/engine/src/engine/dmAll.ts, line 129:

<comment>Text compare on snowflake id can break cursor paging. Snowflake string width grows over time, so lexical order can diverge from numeric order at width boundaries. Keep numeric compare/sort for before/after queries.</comment>

<file context>
@@ -122,11 +122,12 @@ export async function getDmMessagesForWorkspace(
   const conditions = [eq(messages.channelId, conv.channelId)];
-  if (opts.before) conditions.push(sql`CAST(${messages.id} AS INTEGER) < CAST(${opts.before} AS INTEGER)`);
-  if (opts.after) conditions.push(sql`CAST(${messages.id} AS INTEGER) > CAST(${opts.after} AS INTEGER)`);
+  if (opts.before) conditions.push(lt(messages.id, opts.before));
+  if (opts.after) conditions.push(gt(messages.id, opts.after));
 
</file context>

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
packages/engine/src/engine/eventDelivery.ts (1)

108-151: Partial-failure retries re-deliver to already-succeeded targets without a dedup key.

When any target yields a retryable failure, deliverEvent throws 503 and the queue retries the whole event — but all targets were already attempted via Promise.all, so the subscribers that returned 2xx receive the event again. Compounding this, the event payload carries no stable identifier and timestamp is regenerated on every attempt (Line 111), so the body (and HMAC signature) differs across retries, leaving consumers no reliable way to deduplicate.

Consider generating a stable event id once (outside the retry path) and including it in the payload/headers (e.g. X-Relay-Event-Id) so receivers can dedup. Tracking per-target delivery state to skip already-succeeded targets on retry would further reduce duplicate webhook deliveries.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/engine/eventDelivery.ts` around lines 108 - 151, The
delivery currently generates timestamp/body per attempt and retries the whole
batch, causing duplicate deliveries; to fix, generate a stable event id once
before building eventPayload (e.g., const eventId = uuid/v4()) and include it as
event_id in eventPayload and as a header (e.g., 'X-Relay-Event-Id') alongside
the existing 'X-Relay-Timestamp', then compute body and signature from that
stable payload so retries use the same body/signature; additionally, consider
persisting per-subscription delivery state (using subscription id or URL from
filteredSubscriptions and the eventId) so retry logic in
attemptDelivery/deliverEvent can skip targets that already returned ok.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/engine/src/adapters/node/kv.ts`:
- Around line 62-72: In increment(key: string, delta: number) validate that
delta is a finite integer before mutating the store: at the top of the increment
function (before computing current/next and before calling this.store.set) check
that typeof delta === 'number' && Number.isFinite(delta) &&
Number.isInteger(delta), and if not throw a descriptive TypeError (e.g. "delta
must be an integer"); this prevents writing "NaN" or fractional values into the
map and preserves the integer-only counter semantics used by the current/next
logic.

In `@packages/engine/src/lib/ssrf.ts`:
- Around line 62-64: The strict-mode IPv6 check currently misses IPv4-mapped
IPv6 addresses (e.g. ::ffff:127.0.0.1) so add a guard in the same place that
handles IPv6 unique-local/link-local: if strict is true and host includes the
IPv4-mapped prefix (::ffff: or ::FFFF:), extract the embedded IPv4 suffix and
treat it like a normal IPv4 — either run it through the existing
private/loopback checks or simply reject it outright; update the block that
inspects the hostname (the code using the host variable and the strict flag in
packages/engine/src/lib/ssrf.ts) to detect "::ffff:" (case-insensitive), parse
the trailing IPv4, and return false when it maps to loopback/private ranges.

---

Nitpick comments:
In `@packages/engine/src/engine/eventDelivery.ts`:
- Around line 108-151: The delivery currently generates timestamp/body per
attempt and retries the whole batch, causing duplicate deliveries; to fix,
generate a stable event id once before building eventPayload (e.g., const
eventId = uuid/v4()) and include it as event_id in eventPayload and as a header
(e.g., 'X-Relay-Event-Id') alongside the existing 'X-Relay-Timestamp', then
compute body and signature from that stable payload so retries use the same
body/signature; additionally, consider persisting per-subscription delivery
state (using subscription id or URL from filteredSubscriptions and the eventId)
so retry logic in attemptDelivery/deliverEvent can skip targets that already
returned ok.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c0d2fed4-7380-4461-a0f7-b3d22c4a63b0

📥 Commits

Reviewing files that changed from the base of the PR and between 7cfd973 and e9f03da.

📒 Files selected for processing (33)
  • .github/workflows/publish-engine.yml
  • packages/engine/src/adapters/node/database.ts
  • packages/engine/src/adapters/node/index.ts
  • packages/engine/src/adapters/node/kv.ts
  • packages/engine/src/adapters/node/mcp.ts
  • packages/engine/src/adapters/node/presence.ts
  • packages/engine/src/auth/index.ts
  • packages/engine/src/db/migrations/0012_delivery_and_sequence_constraints.sql
  • packages/engine/src/db/migrations/0013_status_active.sql
  • packages/engine/src/engine/a2a-health.ts
  • packages/engine/src/engine/a2a.ts
  • packages/engine/src/engine/activity.ts
  • packages/engine/src/engine/agent.ts
  • packages/engine/src/engine/dm.ts
  • packages/engine/src/engine/dmAll.ts
  • packages/engine/src/engine/event-id.ts
  • packages/engine/src/engine/eventDelivery.ts
  • packages/engine/src/engine/file.ts
  • packages/engine/src/engine/inbox.ts
  • packages/engine/src/engine/receipt.ts
  • packages/engine/src/engine/resyncQuery.ts
  • packages/engine/src/engine/routing.ts
  • packages/engine/src/engine/search.ts
  • packages/engine/src/engine/searchQuery.ts
  • packages/engine/src/engine/sessionEvent.ts
  • packages/engine/src/engine/usage.ts
  • packages/engine/src/lib/logger.ts
  • packages/engine/src/lib/ssrf.ts
  • packages/engine/src/middleware/rateLimit.ts
  • packages/engine/src/middleware/usageTracker.ts
  • packages/engine/src/ports/kv.ts
  • scripts/e2e-actions.ts
  • scripts/e2e.ts
🚧 Files skipped from review as they are similar to previous changes (19)
  • packages/engine/src/engine/activity.ts
  • packages/engine/src/engine/searchQuery.ts
  • packages/engine/src/engine/a2a-health.ts
  • packages/engine/src/engine/sessionEvent.ts
  • .github/workflows/publish-engine.yml
  • packages/engine/src/engine/resyncQuery.ts
  • packages/engine/src/engine/inbox.ts
  • packages/engine/src/engine/receipt.ts
  • packages/engine/src/engine/file.ts
  • packages/engine/src/adapters/node/presence.ts
  • packages/engine/src/adapters/node/index.ts
  • packages/engine/src/engine/dmAll.ts
  • packages/engine/src/auth/index.ts
  • packages/engine/src/engine/routing.ts
  • packages/engine/src/engine/dm.ts
  • packages/engine/src/lib/logger.ts
  • packages/engine/src/adapters/node/mcp.ts
  • packages/engine/src/engine/a2a.ts
  • packages/engine/src/engine/agent.ts

Comment on lines +62 to +72
async increment(key: string, delta: number): Promise<number> {
// Single-process: there is no await between read and write, so this is
// atomic with respect to other JS turns — no lost-update window.
const raw = this.live(key)?.value;
// Treat missing or non-integer values as 0 (avoid parseInt's partial parse
// of e.g. "12abc" → 12).
const current = raw !== undefined && /^-?\d+$/.test(raw) ? parseInt(raw, 10) : 0;
const next = current + delta;
const existing = this.store.get(key);
this.store.set(key, { value: String(next), expiresAt: existing?.expiresAt ?? null });
return next;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reject non-integer delta values before writing.

increment() hardens the stored value, but it still accepts delta = NaN or delta = 0.5. That will write "NaN"/"0.5" into the map, and subsequent reads here treat those values as 0, silently corrupting the counter. Please validate delta as an integer before updating the entry.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/adapters/node/kv.ts` around lines 62 - 72, In
increment(key: string, delta: number) validate that delta is a finite integer
before mutating the store: at the top of the increment function (before
computing current/next and before calling this.store.set) check that typeof
delta === 'number' && Number.isFinite(delta) && Number.isInteger(delta), and if
not throw a descriptive TypeError (e.g. "delta must be an integer"); this
prevents writing "NaN" or fractional values into the map and preserves the
integer-only counter semantics used by the current/next logic.

Comment on lines +62 to +64
// IPv6 unique-local (fc00::/7) and link-local (fe80::/10).
if (host.includes(':') && (host.startsWith('fc') || host.startsWith('fd') || host.startsWith('fe8') || host.startsWith('fe9') || host.startsWith('fea') || host.startsWith('feb'))) {
return false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Block IPv4-mapped IPv6 hosts in strict mode.

strict mode still allows hosts like http://[::ffff:127.0.0.1]/ or http://[::ffff:169.254.169.254]/, which tunnel loopback/private IPv4 through IPv6 notation and bypass the private-range checks above. That leaves a real SSRF path open in the hosted deployment mode this helper is meant to protect.

Suggested hardening
   // IPv6 unique-local (fc00::/7) and link-local (fe80::/10).
-  if (host.includes(':') && (host.startsWith('fc') || host.startsWith('fd') || host.startsWith('fe8') || host.startsWith('fe9') || host.startsWith('fea') || host.startsWith('feb'))) {
+  if (
+    host.startsWith('::ffff:') ||
+    (host.includes(':') && (
+      host.startsWith('fc') ||
+      host.startsWith('fd') ||
+      host.startsWith('fe8') ||
+      host.startsWith('fe9') ||
+      host.startsWith('fea') ||
+      host.startsWith('feb')
+    ))
+  ) {
     return false;
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/engine/src/lib/ssrf.ts` around lines 62 - 64, The strict-mode IPv6
check currently misses IPv4-mapped IPv6 addresses (e.g. ::ffff:127.0.0.1) so add
a guard in the same place that handles IPv6 unique-local/link-local: if strict
is true and host includes the IPv4-mapped prefix (::ffff: or ::FFFF:), extract
the embedded IPv4 suffix and treat it like a normal IPv4 — either run it through
the existing private/loopback checks or simply reject it outright; update the
block that inspects the hostname (the code using the host variable and the
strict flag in packages/engine/src/lib/ssrf.ts) to detect "::ffff:"
(case-insensitive), parse the trailing IPv4, and return false when it maps to
loopback/private ranges.

@willwashburn willwashburn merged commit 9e1d3ea into main May 29, 2026
6 checks passed
@willwashburn willwashburn deleted the feat/engine-package branch May 29, 2026 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant