fix(router): toolCallId must only match when tool message is the last turn#148
Merged
fix(router): toolCallId must only match when tool message is the last turn#148
Conversation
… turn Previously, `match.toolCallId` used `getLastMessageByRole(messages, "tool")`, which scans the entire history. Once a conversation contained any prior tool result, every subsequent request still had a "last tool message" buried in history, and a stale `toolCallId` fixture could win before any `userMessage` matcher was evaluated. Concretely: in CopilotKit's beautiful-chat showcase, clicking the first suggestion (pie chart) succeeded; clicking a different suggestion afterward caused aimock to keep returning the prior chart's "Pie chart rendered above — …" content fixture instead of producing a new tool call, because the stale pie-chart `tool_call_id` from history shadowed the new `userMessage` match. A `toolCallId` fixture answers the model's response to a tool result, which by API contract only happens when the conversation's LAST message is a tool result. Tighten the matcher to require exactly that, and add a regression test for the "new user turn after tool message" case.
- CHANGELOG entry for the toolCallId stale-history fix - Bump version to 1.16.4 (patch) - Additional regression test: toolCallId must not match when the last message is an assistant content reply (post-tool-result follow-up state) - Update the integration multi-turn test's leading comment, which previously documented the buggy "fixture order matters" workaround as if it were the intended invariant — under the fix, fixture order between toolCallId and userMessage entries no longer matters because they apply to disjoint turns.
commit: |
jpr5
approved these changes
Apr 30, 2026
5 tasks
jpr5
added a commit
to CopilotKit/CopilotKit
that referenced
this pull request
Apr 30, 2026
## Summary Bumps `@copilotkit/aimock` to 1.16.4 to pick up the router fix from <CopilotKit/aimock#148>. After this lands and Railway rebuilds `ghcr.io/copilotkit/aimock:latest`, the showcase demos pick up the fix automatically — no fixture changes required. **The bug it fixes:** `match.toolCallId` was scanning the entire conversation for the most recent `tool` message, so once any prior tool result was in history, every subsequent request still had a "last tool message" buried in the array. A stale `toolCallId` fixture could win and shadow `userMessage` matchers for new user turns. **User-visible symptom:** in `beautiful-chat`, clicking the first suggestion (e.g. pie chart) worked, but clicking any subsequent suggestion replayed the previous chart's "Pie chart rendered above — Electronics is the largest slice…" content fixture instead of producing a new tool call. Looked broken to anyone clicking through demos. **Upstream fix:** the matcher now requires the tool message to be the **last** message in the request — the only state in which the LLM is being asked to respond to a tool result. Two regression tests in aimock cover the "new user turn after tool" and "assistant content reply after tool" cases. ## Changes - `pnpm-lock.yaml` — refresh resolutions for `@copilotkit/runtime`'s `@copilotkit/aimock` devDep (`1.16.2` → `1.16.4`). Mostly mechanical; lockfile is slightly more compact than before. - `showcase/scripts/package-lock.json` — refresh to `1.16.4` (was `1.14.3`). - `.github/workflows/test_e2e-showcase-on-demand.yml` — bump the known-good floor from `@copilotkit/aimock@^1.14.3` to `@copilotkit/aimock@^1.16.4` so the `/test-aimock <slug>` PR-comment workflow always installs a build that contains the fix. `packages/runtime/package.json` and `showcase/scripts/package.json` keep their `"latest"` specifier per existing convention; the lockfile pins are the reproducibility layer. ## Test plan - [x] `pnpm nx run @copilotkit/runtime:test` — 1412/1412 pass (covers `LLMock` / `MCPMock` imports from `@copilotkit/aimock` in the v2 MCP integration tests) - [x] `npm test -- aimock-fixtures` in `showcase/scripts` — 18/18 pass (loadFixtureFile + validateFixtures schema validation) - [x] Lefthook pre-commit (`check-binaries`, `sync-lockfile`, `lint-fix`, `test-and-check-packages`) green - [x] commitlint conventional-commit format green - [ ] After merge: confirm Railway picks up the new `ghcr.io/copilotkit/aimock:latest` image on next service restart and beautiful-chat suggestions work end-to-end The 71 unrelated test failures in `showcase/scripts` are pre-existing Windows path-separator issues on `main` (audit CLI subprocess tests, integration registry tests) — verified identical counts on origin/main without this branch's changes. Out of scope for this bump. ## Related - aimock PR: <CopilotKit/aimock#148> - aimock release: <https://www.npmjs.com/package/@copilotkit/aimock/v/1.16.4>
pull Bot
pushed a commit
to bhardwajRahul/CopilotKit
that referenced
this pull request
Apr 30, 2026
Picks up the router fix from CopilotKit/aimock#148 — `toolCallId` matchers now only fire when the tool message is the *last* message in the request, preventing stale tool_call_ids from history shadowing `userMessage` matchers on new user turns. Surfaced as: in beautiful-chat, clicking a second suggestion replayed the prior chart's "Pie chart rendered above…" content fixture instead of producing a new tool call. Once Railway rebuilds `ghcr.io/copilotkit/aimock:latest` and restarts the service, demos will pick up the fix automatically. - Refresh `pnpm-lock.yaml` resolutions (workspace `@copilotkit/runtime` devDep) - Refresh `showcase/scripts/package-lock.json` to 1.16.4 - Bump the floor in `test_e2e-showcase-on-demand.yml` from `^1.14.3` → `^1.16.4` so the `/test-aimock` PR-comment workflow always installs a build that contains the fix
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
match.toolCallIdwas matching against the most recenttoolrole message anywhere in history, not the last message in the request. Once a conversation contained any prior tool result, every subsequent request still had a "last tool message" buried in history, and a staletoolCallIdfixture could win before anyuserMessagematcher was even evaluated.Reproduction (CopilotKit beautiful-chat showcase)
userMessage: "revenue distribution by category"matches → returns tool call `call_fp_pie_chart_001` → frontend renders chart → tool result returns → fixture `toolCallId: "call_fp_pie_chart_001"` matches → content `"Pie chart rendered above — Electronics is the largest slice…"`. ✅```
[user1, asst(tool_call), tool(call_fp_pie_chart_001), asst("Pie chart rendered above…"), user2]
```
Fix
A `toolCallId` fixture answers the model's response to a tool result, which by API contract only happens when the conversation's LAST message is a tool result. Tighten the matcher to require exactly that:
```ts
// before
const msg = getLastMessageByRole(effective.messages, "tool");
if (!msg || msg.tool_call_id !== match.toolCallId) continue;
// after
const last = effective.messages[effective.messages.length - 1];
if (!last || last.role !== "tool" || last.tool_call_id !== match.toolCallId) continue;
```
When the last message is `user` (or anything other than a fresh `tool` result), `toolCallId` matchers are now correctly skipped so `userMessage` matchers can run.
Test plan