CS-11045: fix Anthropic rejection from commandResult event_id drift by FadhlanR · Pull Request #4705 · cardstack/boxel

FadhlanR · 2026-05-07T10:48:28Z

Summary

Fixes ai-bot Anthropic API rejection (unexpected tool_use_id) when the host's commandResult.m.relates_to.event_id disagrees with the bot message's event_id as ai-bot sees it via /messages. Once it fires, the room is dead until reset.
Two layered changes: (1) ai-bot's getCommandResults now also pairs by commandRequestId (the primary fix — robust to the timing race where the host hasn't re-synced the normalized form yet); (2) host's command-service.ts run task sources the bot-message event_id from current room state instead of the snapshot captured during streaming (defense in depth).
getCodePatchResults is intentionally left event_id-only (no analogous unique id available; once Phase B lands the strict match suffices). The patchCode path is intentionally not touched — it currently picks the right id post-sync via the dual-Message-cache effect; touching it could regress the working path.

What was happening (verified against the production transcript)

Production room !djuQxlnuYUoIABVEme:boxel.ai, "create a recipe tracker":

Bot message owning the write-text-file tool_call appears in /messages at event_id = $s7uXPT1Dw… with m.relates_to.event_id = $s7uXPT1Dw… (self-referential — the matrix server collapsed the original + N m.replace events into one normalized event).
Host emits codePatchResult.m.relates_to.event_id = $s7uXPT1Dw… ✓
Host emits commandResult.m.relates_to.event_id = $eeFRvu_QCWWtidrDmb2RwG5-FAQFWiQiW4KCJyy-iWU — the original streaming id, which doesn't appear anywhere else in the room.
ai-bot's strict m.relates_to.event_id filter drops the commandResult, leaving the write-text-file tool_use without a tool_result. The next bot message's checkCorrectness tool_use makes the messages array contain two adjacent assistant {tool_calls=…} — Anthropic rejects.

Test plan

cd packages/ai-bot && pnpm test — 160 pass, 2 unrelated pre-existing failures (AI Bot Locking). Three new CS-11045 tests pass:
- getCommandResults pairs by commandRequestId when m.relates_to.event_id drifts
- strict event_id match still works and does not duplicate tool messages
- recipe-tracker production transcript shape — every tool_call has its tool result (mirrors the failing production transcript exactly)
All three new ai-bot tests fail on main for the right reasons (orphan tool_use shape; missing tool message for the drifted write-text-file id) before the prompt.ts fix.
Lint clean: pnpm lint:js on touched files in host and runtime-common.
CI: full host test suite (run in CI per AGENTS.md). Includes the new commands-test.gts regression covering the host fix.
Local repro after merge: fresh ai-bot room, "create a recipe tracker", let it emit SEARCH/REPLACE + tool_call, accept the patch, send a follow-up — conversation should continue normally instead of returning "There was an error processing your request".

Files

packages/runtime-common/ai/prompt.ts — getCommandResults adds commandRequestId fallback.
packages/host/app/services/command-service.ts — adds getCurrentEventIdForCommandRequest; run task uses lookup with snapshot fallback.
packages/ai-bot/tests/prompt-construction-test.ts — three new regression tests.
packages/host/tests/integration/components/ai-assistant-panel/commands-test.gts — one new regression test.
docs/cs-11045-host-event-id-drift-plan.md — research/scoping notes.

Out of scope (intentional, per the approved plan)

Orphan-tool_use defensive sweep in buildPromptForModel — held back unless drift recurs in another shape.
Rewrite of getAggregatedReplacement / getEffectiveEventId / streaming m.replace handling.
Restructuring _messageCache keying so a single Message represents the bot message across stream-vs-normalized identities (the "real" structural fix; much larger scope; B1 sidesteps it).
getCodePatchResults (no unique id; remains event_id-only).
executeReadyCodePatches / patchCode (currently picks the right id post-sync; touching it could regress the working path).

🤖 Generated with Claude Code

… event_id from current room state The ai-bot's prompt builder strictly filtered tool_results by m.relates_to.event_id matching the bot message's event_id. The host captured the bot message's streaming/original event_id when constructing MessageCommand; once a later m.replace event in room.events owned the commandRequest, that event's id is what the matrix server's /messages view (which ai-bot reads via getRoomEvents) showed. The result: a commandResult linked to the streaming id while ai-bot saw the bot message at the normalized id, the result got dropped, and Anthropic rejected the request with "unexpected tool_use_id" — leaving the room unable to make further progress. Two layered changes: 1. ai-bot (primary, robust to host timing): getCommandResults now also matches by commandRequestId in addition to m.relates_to.event_id. commandRequestIds are uuids generated per tool_call, so cross-message contamination isn't a real risk. 2. host (defense in depth): command-service.ts run task now sources the bot-message event_id from current room state via a newly added getCurrentEventIdForCommandRequest helper that walks roomResource.events newest-first, falling back to the snapshot when nothing matches. Tests: - ai-bot: three new prompt-construction tests including one that reproduces the production transcript shape exactly (switch-submode + write-text-file with drifted event_id + codePatchCorrectness with checkCorrectness). - host: one new commands-test asserting commandResult's m.relates_to.event_id resolves to the latest bot-message event_id in room.events, not the streaming snapshot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-07T10:52:50Z

Preview deployments

Host staging preview

Host production preview

Host Test Results

1 files 1 suites 1h 51m 20s ⏱️
2 465 tests 2 448 ✅ 15 💤 1 ❌ 1 🔥
2 484 runs 2 466 ✅ 15 💤 2 ❌ 1 🔥

Results for commit 95fc0bb.

For more details on these errors, see this check.

Realm Server Test Results

1 files ±0 1 suites ±0 19m 1s ⏱️ + 1m 40s
1 262 tests +1 1 262 ✅ +1 0 💤 ±0 0 ❌ ±0
1 340 runs +1 1 340 ✅ +1 0 💤 ±0 0 ❌ ±0

Results for commit 95fc0bb. ± Comparison against earlier commit 928da82.

…t casts) - Break long TOOL_CALL_CHECK_CORRECTNESS string per prettier. - Cast each m.room.message event element with `as DiscreteMatrixEvent`. Without the cast, the inner `as CardMessageContent` widens content.msgtype to the union 'app.boxel.message' | 'app.boxel.codePatchCorrectness', and TS can no longer narrow the discriminated MatrixEvent union for the array element. Pre-existing ../base/*.gts type errors are unrelated and unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The shipped scope is now captured by the PR description, the Linear ticket, and the code itself. The in-branch plan doc was a pre-impl scoping artifact; keeping it would just risk drifting out of sync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 95fc0bb171

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-07T12:37:31Z

+      this.getCurrentEventIdForCommandRequest(
+        command.message.roomId,
+        commandRequestId,
+      ) ?? command.eventId;


Keep commandResult linked to message cache effective event id

Using getCurrentEventIdForCommandRequest(... ) here can switch invokedToolFromEventId to the latest m.replace event id, but host message state is keyed by the effective/original id for replaced messages (getEffectiveEventId in room.ts resolves m.replace to the parent id, and updateMessageCommandResult then does _messageCache.get(effectiveEventId)). In the replace-flow this mismatch means the command result event no longer updates the corresponding MessageCommand to applied, so the UI can stay in a re-applicable state and lose persisted applied status after reload. This regression appears when commands come from streamed messages that later emit tool calls via m.replace.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

Fixes a production failure mode where Anthropic rejects the prompt (unexpected tool_use_id) when a commandResult can’t be paired to its originating tool call due to Matrix event_id drift between streaming/original events and normalized /messages output.

Changes:

runtime-common prompt construction: pair commandResult tool results by commandRequestId as a fallback when m.relates_to.event_id doesn’t match.
host command execution: derive the invokedToolFromEventId from current room state (latest event that “owns” the commandRequestId) instead of a potentially stale snapshot.
Adds regression tests covering both the ai-bot pairing behavior and the host event_id sourcing behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
packages/runtime-common/ai/prompt.ts	Adds `commandRequestId`-based fallback matching for command results during prompt construction.
packages/host/app/services/command-service.ts	Looks up the current bot-message `event_id` for a `commandRequestId` at execution time and uses it when emitting command results.
packages/ai-bot/tests/prompt-construction-test.ts	Adds regression tests reproducing the production transcript shape and validating tool_call/tool_result pairing.
packages/host/tests/integration/components/ai-assistant-panel/commands-test.gts	Adds a host-side regression test ensuring commandResult linkage uses the current room event_id (m.replace-aware).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    let eventId =
+      this.getCurrentEventIdForCommandRequest(
+        command.message.roomId,
+        commandRequestId,
+      ) ?? command.eventId;


FadhlanR and others added 2 commits May 7, 2026 18:01

FadhlanR marked this pull request as ready for review May 7, 2026 12:33

FadhlanR requested a review from a team May 7, 2026 12:35

chatgpt-codex-connector Bot reviewed May 7, 2026

View reviewed changes

habdelra requested a review from Copilot May 7, 2026 13:43

Copilot started reviewing on behalf of habdelra May 7, 2026 13:43 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread packages/host/app/services/command-service.ts

Comment on lines +564 to +568

let eventId =

this.getCurrentEventIdForCommandRequest(

command.message.roomId,

commandRequestId,

) ?? command.eventId;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CS-11045: fix Anthropic rejection from commandResult event_id drift#4705

CS-11045: fix Anthropic rejection from commandResult event_id drift#4705
FadhlanR wants to merge 3 commits intomainfrom
cs-11045-host-event-id-drift

FadhlanR commented May 7, 2026

Uh oh!

github-actions Bot commented May 7, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FadhlanR commented May 7, 2026

Summary

What was happening (verified against the production transcript)

Test plan

Files

Out of scope (intentional, per the approved plan)

Uh oh!

github-actions Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Preview deployments

Host Test Results

Realm Server Test Results

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 7, 2026 •

edited

Loading