Skip to content

fix(router): toolCallId must only match when tool message is the last turn#148

Merged
jpr5 merged 2 commits intomainfrom
fix/toolcall-id-stale-match
Apr 30, 2026
Merged

fix(router): toolCallId must only match when tool message is the last turn#148
jpr5 merged 2 commits intomainfrom
fix/toolcall-id-stale-match

Conversation

@AlemTuzlak
Copy link
Copy Markdown
Contributor

Summary

match.toolCallId was matching against the most recent tool role message anywhere in history, not the last message in the request. Once a conversation contained any prior tool result, every subsequent request still had a "last tool message" buried in history, and a stale toolCallId fixture could win before any userMessage matcher was even evaluated.

Reproduction (CopilotKit beautiful-chat showcase)

  1. Click suggestion 1 (pie chart) → fixture userMessage: "revenue distribution by category" matches → returns tool call `call_fp_pie_chart_001` → frontend renders chart → tool result returns → fixture `toolCallId: "call_fp_pie_chart_001"` matches → content `"Pie chart rendered above — Electronics is the largest slice…"`. ✅
  2. Click suggestion 2 (bar chart). Conversation is now:
    ```
    [user1, asst(tool_call), tool(call_fp_pie_chart_001), asst("Pie chart rendered above…"), user2]
    ```
    • aimock iterates fixtures top-down. `toolCallId` fixtures sit at the top of `feature-parity.json`.
    • `getLastMessageByRole(messages, "tool")` returns the stale tool result from suggestion 1.
    • `toolCallId: "call_fp_pie_chart_001"` matches and returns `"Pie chart rendered above — Electronics…"` again — the bar-chart `userMessage` matcher is never reached, no new tool call is made. ❌

Fix

A `toolCallId` fixture answers the model's response to a tool result, which by API contract only happens when the conversation's LAST message is a tool result. Tighten the matcher to require exactly that:

```ts
// before
const msg = getLastMessageByRole(effective.messages, "tool");
if (!msg || msg.tool_call_id !== match.toolCallId) continue;

// after
const last = effective.messages[effective.messages.length - 1];
if (!last || last.role !== "tool" || last.tool_call_id !== match.toolCallId) continue;
```

When the last message is `user` (or anything other than a fresh `tool` result), `toolCallId` matchers are now correctly skipped so `userMessage` matchers can run.

Test plan

  • New regression test `does not match when a new user turn follows the tool message` — fails on `main`, passes with the fix
  • All 73 `router.test.ts` cases pass
  • Full suite: 2610 passed (the one unrelated failure on Windows is the pre-existing `defaultCacheRoot honors XDG_CACHE_HOME` path-separator test in `fixtures-remote.test.ts`)
  • `pnpm lint` clean
  • `pnpm format:check` clean

… turn

Previously, `match.toolCallId` used `getLastMessageByRole(messages, "tool")`,
which scans the entire history. Once a conversation contained any prior tool
result, every subsequent request still had a "last tool message" buried in
history, and a stale `toolCallId` fixture could win before any `userMessage`
matcher was evaluated.

Concretely: in CopilotKit's beautiful-chat showcase, clicking the first
suggestion (pie chart) succeeded; clicking a different suggestion afterward
caused aimock to keep returning the prior chart's "Pie chart rendered above —
…" content fixture instead of producing a new tool call, because the stale
pie-chart `tool_call_id` from history shadowed the new `userMessage` match.

A `toolCallId` fixture answers the model's response to a tool result, which
by API contract only happens when the conversation's LAST message is a tool
result. Tighten the matcher to require exactly that, and add a regression
test for the "new user turn after tool message" case.
- CHANGELOG entry for the toolCallId stale-history fix
- Bump version to 1.16.4 (patch)
- Additional regression test: toolCallId must not match when the last message
  is an assistant content reply (post-tool-result follow-up state)
- Update the integration multi-turn test's leading comment, which previously
  documented the buggy "fixture order matters" workaround as if it were the
  intended invariant — under the fix, fixture order between toolCallId and
  userMessage entries no longer matters because they apply to disjoint turns.
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Apr 30, 2026

Open in StackBlitz

npm i https://pkg.pr.new/@copilotkit/aimock@148

commit: 56955bc

@jpr5 jpr5 merged commit d6392c8 into main Apr 30, 2026
22 checks passed
@jpr5 jpr5 deleted the fix/toolcall-id-stale-match branch April 30, 2026 14:09
jpr5 added a commit to CopilotKit/CopilotKit that referenced this pull request Apr 30, 2026
## Summary

Bumps `@copilotkit/aimock` to 1.16.4 to pick up the router fix from
<CopilotKit/aimock#148>. After this lands and
Railway rebuilds `ghcr.io/copilotkit/aimock:latest`, the showcase demos
pick up the fix automatically — no fixture changes required.

**The bug it fixes:** `match.toolCallId` was scanning the entire
conversation for the most recent `tool` message, so once any prior tool
result was in history, every subsequent request still had a "last tool
message" buried in the array. A stale `toolCallId` fixture could win and
shadow `userMessage` matchers for new user turns.

**User-visible symptom:** in `beautiful-chat`, clicking the first
suggestion (e.g. pie chart) worked, but clicking any subsequent
suggestion replayed the previous chart's "Pie chart rendered above —
Electronics is the largest slice…" content fixture instead of producing
a new tool call. Looked broken to anyone clicking through demos.

**Upstream fix:** the matcher now requires the tool message to be the
**last** message in the request — the only state in which the LLM is
being asked to respond to a tool result. Two regression tests in aimock
cover the "new user turn after tool" and "assistant content reply after
tool" cases.

## Changes

- `pnpm-lock.yaml` — refresh resolutions for `@copilotkit/runtime`'s
`@copilotkit/aimock` devDep (`1.16.2` → `1.16.4`). Mostly mechanical;
lockfile is slightly more compact than before.
- `showcase/scripts/package-lock.json` — refresh to `1.16.4` (was
`1.14.3`).
- `.github/workflows/test_e2e-showcase-on-demand.yml` — bump the
known-good floor from `@copilotkit/aimock@^1.14.3` to
`@copilotkit/aimock@^1.16.4` so the `/test-aimock <slug>` PR-comment
workflow always installs a build that contains the fix.

`packages/runtime/package.json` and `showcase/scripts/package.json` keep
their `"latest"` specifier per existing convention; the lockfile pins
are the reproducibility layer.

## Test plan

- [x] `pnpm nx run @copilotkit/runtime:test` — 1412/1412 pass (covers
`LLMock` / `MCPMock` imports from `@copilotkit/aimock` in the v2 MCP
integration tests)
- [x] `npm test -- aimock-fixtures` in `showcase/scripts` — 18/18 pass
(loadFixtureFile + validateFixtures schema validation)
- [x] Lefthook pre-commit (`check-binaries`, `sync-lockfile`,
`lint-fix`, `test-and-check-packages`) green
- [x] commitlint conventional-commit format green
- [ ] After merge: confirm Railway picks up the new
`ghcr.io/copilotkit/aimock:latest` image on next service restart and
beautiful-chat suggestions work end-to-end

The 71 unrelated test failures in `showcase/scripts` are pre-existing
Windows path-separator issues on `main` (audit CLI subprocess tests,
integration registry tests) — verified identical counts on origin/main
without this branch's changes. Out of scope for this bump.

## Related

- aimock PR: <CopilotKit/aimock#148>
- aimock release:
<https://www.npmjs.com/package/@copilotkit/aimock/v/1.16.4>
pull Bot pushed a commit to bhardwajRahul/CopilotKit that referenced this pull request Apr 30, 2026
Picks up the router fix from CopilotKit/aimock#148 — `toolCallId` matchers
now only fire when the tool message is the *last* message in the request,
preventing stale tool_call_ids from history shadowing `userMessage`
matchers on new user turns.

Surfaced as: in beautiful-chat, clicking a second suggestion replayed the
prior chart's "Pie chart rendered above…" content fixture instead of
producing a new tool call. Once Railway rebuilds `ghcr.io/copilotkit/aimock:latest`
and restarts the service, demos will pick up the fix automatically.

- Refresh `pnpm-lock.yaml` resolutions (workspace `@copilotkit/runtime` devDep)
- Refresh `showcase/scripts/package-lock.json` to 1.16.4
- Bump the floor in `test_e2e-showcase-on-demand.yml` from `^1.14.3` → `^1.16.4`
  so the `/test-aimock` PR-comment workflow always installs a build that
  contains the fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants