CS-11045: fix Anthropic rejection from commandResult event_id drift#4705
CS-11045: fix Anthropic rejection from commandResult event_id drift#4705
Conversation
… event_id from current room state The ai-bot's prompt builder strictly filtered tool_results by m.relates_to.event_id matching the bot message's event_id. The host captured the bot message's streaming/original event_id when constructing MessageCommand; once a later m.replace event in room.events owned the commandRequest, that event's id is what the matrix server's /messages view (which ai-bot reads via getRoomEvents) showed. The result: a commandResult linked to the streaming id while ai-bot saw the bot message at the normalized id, the result got dropped, and Anthropic rejected the request with "unexpected tool_use_id" — leaving the room unable to make further progress. Two layered changes: 1. ai-bot (primary, robust to host timing): getCommandResults now also matches by commandRequestId in addition to m.relates_to.event_id. commandRequestIds are uuids generated per tool_call, so cross-message contamination isn't a real risk. 2. host (defense in depth): command-service.ts run task now sources the bot-message event_id from current room state via a newly added getCurrentEventIdForCommandRequest helper that walks roomResource.events newest-first, falling back to the snapshot when nothing matches. Tests: - ai-bot: three new prompt-construction tests including one that reproduces the production transcript shape exactly (switch-submode + write-text-file with drifted event_id + codePatchCorrectness with checkCorrectness). - host: one new commands-test asserting commandResult's m.relates_to.event_id resolves to the latest bot-message event_id in room.events, not the streaming snapshot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Preview deploymentsHost Test Results 1 files 1 suites 1h 51m 20s ⏱️ Results for commit 95fc0bb. For more details on these errors, see this check. Realm Server Test Results 1 files ±0 1 suites ±0 19m 1s ⏱️ + 1m 40s Results for commit 95fc0bb. ± Comparison against earlier commit 928da82. |
…t casts) - Break long TOOL_CALL_CHECK_CORRECTNESS string per prettier. - Cast each m.room.message event element with `as DiscreteMatrixEvent`. Without the cast, the inner `as CardMessageContent` widens content.msgtype to the union 'app.boxel.message' | 'app.boxel.codePatchCorrectness', and TS can no longer narrow the discriminated MatrixEvent union for the array element. Pre-existing ../base/*.gts type errors are unrelated and unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The shipped scope is now captured by the PR description, the Linear ticket, and the code itself. The in-branch plan doc was a pre-impl scoping artifact; keeping it would just risk drifting out of sync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 95fc0bb171
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| this.getCurrentEventIdForCommandRequest( | ||
| command.message.roomId, | ||
| commandRequestId, | ||
| ) ?? command.eventId; |
There was a problem hiding this comment.
Keep commandResult linked to message cache effective event id
Using getCurrentEventIdForCommandRequest(... ) here can switch invokedToolFromEventId to the latest m.replace event id, but host message state is keyed by the effective/original id for replaced messages (getEffectiveEventId in room.ts resolves m.replace to the parent id, and updateMessageCommandResult then does _messageCache.get(effectiveEventId)). In the replace-flow this mismatch means the command result event no longer updates the corresponding MessageCommand to applied, so the UI can stay in a re-applicable state and lose persisted applied status after reload. This regression appears when commands come from streamed messages that later emit tool calls via m.replace.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Fixes a production failure mode where Anthropic rejects the prompt (unexpected tool_use_id) when a commandResult can’t be paired to its originating tool call due to Matrix event_id drift between streaming/original events and normalized /messages output.
Changes:
runtime-commonprompt construction: paircommandResulttool results bycommandRequestIdas a fallback whenm.relates_to.event_iddoesn’t match.hostcommand execution: derive theinvokedToolFromEventIdfrom current room state (latest event that “owns” thecommandRequestId) instead of a potentially stale snapshot.- Adds regression tests covering both the ai-bot pairing behavior and the host event_id sourcing behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| packages/runtime-common/ai/prompt.ts | Adds commandRequestId-based fallback matching for command results during prompt construction. |
| packages/host/app/services/command-service.ts | Looks up the current bot-message event_id for a commandRequestId at execution time and uses it when emitting command results. |
| packages/ai-bot/tests/prompt-construction-test.ts | Adds regression tests reproducing the production transcript shape and validating tool_call/tool_result pairing. |
| packages/host/tests/integration/components/ai-assistant-panel/commands-test.gts | Adds a host-side regression test ensuring commandResult linkage uses the current room event_id (m.replace-aware). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let eventId = | ||
| this.getCurrentEventIdForCommandRequest( | ||
| command.message.roomId, | ||
| commandRequestId, | ||
| ) ?? command.eventId; |
Summary
unexpected tool_use_id) when the host'scommandResult.m.relates_to.event_iddisagrees with the bot message's event_id as ai-bot sees it via/messages. Once it fires, the room is dead until reset.getCommandResultsnow also pairs bycommandRequestId(the primary fix — robust to the timing race where the host hasn't re-synced the normalized form yet); (2) host'scommand-service.tsruntask sources the bot-messageevent_idfrom current room state instead of the snapshot captured during streaming (defense in depth).getCodePatchResultsis intentionally left event_id-only (no analogous unique id available; once Phase B lands the strict match suffices). ThepatchCodepath is intentionally not touched — it currently picks the right id post-sync via the dual-Message-cache effect; touching it could regress the working path.What was happening (verified against the production transcript)
Production room
!djuQxlnuYUoIABVEme:boxel.ai, "create a recipe tracker":write-text-filetool_call appears in/messagesatevent_id = $s7uXPT1Dw…withm.relates_to.event_id = $s7uXPT1Dw…(self-referential — the matrix server collapsed the original + N m.replace events into one normalized event).codePatchResult.m.relates_to.event_id = $s7uXPT1Dw…✓commandResult.m.relates_to.event_id = $eeFRvu_QCWWtidrDmb2RwG5-FAQFWiQiW4KCJyy-iWU— the original streaming id, which doesn't appear anywhere else in the room.m.relates_to.event_idfilter drops the commandResult, leaving thewrite-text-filetool_use without a tool_result. The next bot message'scheckCorrectnesstool_use makes the messages array contain two adjacentassistant {tool_calls=…}— Anthropic rejects.Test plan
cd packages/ai-bot && pnpm test— 160 pass, 2 unrelated pre-existing failures (AI Bot Locking). Three new CS-11045 tests pass:getCommandResults pairs by commandRequestId when m.relates_to.event_id driftsstrict event_id match still works and does not duplicate tool messagesrecipe-tracker production transcript shape — every tool_call has its tool result(mirrors the failing production transcript exactly)mainfor the right reasons (orphan tool_use shape; missing tool message for the drifted write-text-file id) before the prompt.ts fix.pnpm lint:json touched files inhostandruntime-common.hosttest suite (run in CI per AGENTS.md). Includes the newcommands-test.gtsregression covering the host fix.Files
packages/runtime-common/ai/prompt.ts—getCommandResultsadds commandRequestId fallback.packages/host/app/services/command-service.ts— addsgetCurrentEventIdForCommandRequest;runtask uses lookup with snapshot fallback.packages/ai-bot/tests/prompt-construction-test.ts— three new regression tests.packages/host/tests/integration/components/ai-assistant-panel/commands-test.gts— one new regression test.docs/cs-11045-host-event-id-drift-plan.md— research/scoping notes.Out of scope (intentional, per the approved plan)
buildPromptForModel— held back unless drift recurs in another shape.getAggregatedReplacement/getEffectiveEventId/ streaming m.replace handling._messageCachekeying so a single Message represents the bot message across stream-vs-normalized identities (the "real" structural fix; much larger scope; B1 sidesteps it).getCodePatchResults(no unique id; remains event_id-only).executeReadyCodePatches/patchCode(currently picks the right id post-sync; touching it could regress the working path).🤖 Generated with Claude Code