Skip to content

CS-11045: fix Anthropic rejection from commandResult event_id drift#4705

Open
FadhlanR wants to merge 3 commits intomainfrom
cs-11045-host-event-id-drift
Open

CS-11045: fix Anthropic rejection from commandResult event_id drift#4705
FadhlanR wants to merge 3 commits intomainfrom
cs-11045-host-event-id-drift

Conversation

@FadhlanR
Copy link
Copy Markdown
Contributor

@FadhlanR FadhlanR commented May 7, 2026

Summary

  • Fixes ai-bot Anthropic API rejection (unexpected tool_use_id) when the host's commandResult.m.relates_to.event_id disagrees with the bot message's event_id as ai-bot sees it via /messages. Once it fires, the room is dead until reset.
  • Two layered changes: (1) ai-bot's getCommandResults now also pairs by commandRequestId (the primary fix — robust to the timing race where the host hasn't re-synced the normalized form yet); (2) host's command-service.ts run task sources the bot-message event_id from current room state instead of the snapshot captured during streaming (defense in depth).
  • getCodePatchResults is intentionally left event_id-only (no analogous unique id available; once Phase B lands the strict match suffices). The patchCode path is intentionally not touched — it currently picks the right id post-sync via the dual-Message-cache effect; touching it could regress the working path.

What was happening (verified against the production transcript)

Production room !djuQxlnuYUoIABVEme:boxel.ai, "create a recipe tracker":

  • Bot message owning the write-text-file tool_call appears in /messages at event_id = $s7uXPT1Dw… with m.relates_to.event_id = $s7uXPT1Dw… (self-referential — the matrix server collapsed the original + N m.replace events into one normalized event).
  • Host emits codePatchResult.m.relates_to.event_id = $s7uXPT1Dw…
  • Host emits commandResult.m.relates_to.event_id = $eeFRvu_QCWWtidrDmb2RwG5-FAQFWiQiW4KCJyy-iWU — the original streaming id, which doesn't appear anywhere else in the room.
  • ai-bot's strict m.relates_to.event_id filter drops the commandResult, leaving the write-text-file tool_use without a tool_result. The next bot message's checkCorrectness tool_use makes the messages array contain two adjacent assistant {tool_calls=…} — Anthropic rejects.

Test plan

  • cd packages/ai-bot && pnpm test — 160 pass, 2 unrelated pre-existing failures (AI Bot Locking). Three new CS-11045 tests pass:
    • getCommandResults pairs by commandRequestId when m.relates_to.event_id drifts
    • strict event_id match still works and does not duplicate tool messages
    • recipe-tracker production transcript shape — every tool_call has its tool result (mirrors the failing production transcript exactly)
  • All three new ai-bot tests fail on main for the right reasons (orphan tool_use shape; missing tool message for the drifted write-text-file id) before the prompt.ts fix.
  • Lint clean: pnpm lint:js on touched files in host and runtime-common.
  • CI: full host test suite (run in CI per AGENTS.md). Includes the new commands-test.gts regression covering the host fix.
  • Local repro after merge: fresh ai-bot room, "create a recipe tracker", let it emit SEARCH/REPLACE + tool_call, accept the patch, send a follow-up — conversation should continue normally instead of returning "There was an error processing your request".

Files

  • packages/runtime-common/ai/prompt.tsgetCommandResults adds commandRequestId fallback.
  • packages/host/app/services/command-service.ts — adds getCurrentEventIdForCommandRequest; run task uses lookup with snapshot fallback.
  • packages/ai-bot/tests/prompt-construction-test.ts — three new regression tests.
  • packages/host/tests/integration/components/ai-assistant-panel/commands-test.gts — one new regression test.
  • docs/cs-11045-host-event-id-drift-plan.md — research/scoping notes.

Out of scope (intentional, per the approved plan)

  • Orphan-tool_use defensive sweep in buildPromptForModel — held back unless drift recurs in another shape.
  • Rewrite of getAggregatedReplacement / getEffectiveEventId / streaming m.replace handling.
  • Restructuring _messageCache keying so a single Message represents the bot message across stream-vs-normalized identities (the "real" structural fix; much larger scope; B1 sidesteps it).
  • getCodePatchResults (no unique id; remains event_id-only).
  • executeReadyCodePatches / patchCode (currently picks the right id post-sync; touching it could regress the working path).

🤖 Generated with Claude Code

… event_id from current room state

The ai-bot's prompt builder strictly filtered tool_results by
m.relates_to.event_id matching the bot message's event_id. The host
captured the bot message's streaming/original event_id when constructing
MessageCommand; once a later m.replace event in room.events owned the
commandRequest, that event's id is what the matrix server's /messages
view (which ai-bot reads via getRoomEvents) showed. The result: a
commandResult linked to the streaming id while ai-bot saw the bot
message at the normalized id, the result got dropped, and Anthropic
rejected the request with "unexpected tool_use_id" — leaving the room
unable to make further progress.

Two layered changes:

1. ai-bot (primary, robust to host timing): getCommandResults now also
   matches by commandRequestId in addition to m.relates_to.event_id.
   commandRequestIds are uuids generated per tool_call, so cross-message
   contamination isn't a real risk.

2. host (defense in depth): command-service.ts run task now sources the
   bot-message event_id from current room state via a newly added
   getCurrentEventIdForCommandRequest helper that walks roomResource.events
   newest-first, falling back to the snapshot when nothing matches.

Tests:
- ai-bot: three new prompt-construction tests including one that
  reproduces the production transcript shape exactly (switch-submode +
  write-text-file with drifted event_id + codePatchCorrectness with
  checkCorrectness).
- host: one new commands-test asserting commandResult's
  m.relates_to.event_id resolves to the latest bot-message event_id in
  room.events, not the streaming snapshot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Preview deployments

Host Test Results

    1 files      1 suites   1h 51m 20s ⏱️
2 465 tests 2 448 ✅ 15 💤 1 ❌ 1 🔥
2 484 runs  2 466 ✅ 15 💤 2 ❌ 1 🔥

Results for commit 95fc0bb.

For more details on these errors, see this check.

Realm Server Test Results

    1 files  ±0      1 suites  ±0   19m 1s ⏱️ + 1m 40s
1 262 tests +1  1 262 ✅ +1  0 💤 ±0  0 ❌ ±0 
1 340 runs  +1  1 340 ✅ +1  0 💤 ±0  0 ❌ ±0 

Results for commit 95fc0bb. ± Comparison against earlier commit 928da82.

FadhlanR and others added 2 commits May 7, 2026 18:01
…t casts)

- Break long TOOL_CALL_CHECK_CORRECTNESS string per prettier.
- Cast each m.room.message event element with `as DiscreteMatrixEvent`.
  Without the cast, the inner `as CardMessageContent` widens content.msgtype
  to the union 'app.boxel.message' | 'app.boxel.codePatchCorrectness', and
  TS can no longer narrow the discriminated MatrixEvent union for the
  array element.

Pre-existing ../base/*.gts type errors are unrelated and unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The shipped scope is now captured by the PR description, the Linear
ticket, and the code itself. The in-branch plan doc was a pre-impl
scoping artifact; keeping it would just risk drifting out of sync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@FadhlanR FadhlanR marked this pull request as ready for review May 7, 2026 12:33
@FadhlanR FadhlanR requested a review from a team May 7, 2026 12:35
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 95fc0bb171

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +565 to +568
this.getCurrentEventIdForCommandRequest(
command.message.roomId,
commandRequestId,
) ?? command.eventId;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep commandResult linked to message cache effective event id

Using getCurrentEventIdForCommandRequest(... ) here can switch invokedToolFromEventId to the latest m.replace event id, but host message state is keyed by the effective/original id for replaced messages (getEffectiveEventId in room.ts resolves m.replace to the parent id, and updateMessageCommandResult then does _messageCache.get(effectiveEventId)). In the replace-flow this mismatch means the command result event no longer updates the corresponding MessageCommand to applied, so the UI can stay in a re-applicable state and lose persisted applied status after reload. This regression appears when commands come from streamed messages that later emit tool calls via m.replace.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a production failure mode where Anthropic rejects the prompt (unexpected tool_use_id) when a commandResult can’t be paired to its originating tool call due to Matrix event_id drift between streaming/original events and normalized /messages output.

Changes:

  • runtime-common prompt construction: pair commandResult tool results by commandRequestId as a fallback when m.relates_to.event_id doesn’t match.
  • host command execution: derive the invokedToolFromEventId from current room state (latest event that “owns” the commandRequestId) instead of a potentially stale snapshot.
  • Adds regression tests covering both the ai-bot pairing behavior and the host event_id sourcing behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
packages/runtime-common/ai/prompt.ts Adds commandRequestId-based fallback matching for command results during prompt construction.
packages/host/app/services/command-service.ts Looks up the current bot-message event_id for a commandRequestId at execution time and uses it when emitting command results.
packages/ai-bot/tests/prompt-construction-test.ts Adds regression tests reproducing the production transcript shape and validating tool_call/tool_result pairing.
packages/host/tests/integration/components/ai-assistant-panel/commands-test.gts Adds a host-side regression test ensuring commandResult linkage uses the current room event_id (m.replace-aware).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +564 to +568
let eventId =
this.getCurrentEventIdForCommandRequest(
command.message.roomId,
commandRequestId,
) ?? command.eventId;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants