Skip to content

fix: estimate input/output token split for copilot-cli provider#684

Merged
christso merged 5 commits intomainfrom
fix/copilot-cli-token-usage-estimation
Mar 20, 2026
Merged

fix: estimate input/output token split for copilot-cli provider#684
christso merged 5 commits intomainfrom
fix/copilot-cli-token-usage-estimation

Conversation

@christso
Copy link
Collaborator

@christso christso commented Mar 19, 2026

Summary

  • Copilot CLI's ACP usage_update only reports cumulative context window tokens (used), not separate input/output counts — output was hardcoded to 0
  • Tracks characters flowing as input (prompt + tool results) vs output (agent message chunks) and pro-rates used tokens proportionally
  • Zero performance impact — just incrementing two counters on events already being processed

Closes #683

Test plan

  • Full test suite passes (1472/1472, matches main)
  • Typecheck clean
  • Lint clean
  • Pre-push hooks all green (Build, Typecheck, Lint, Test)
  • E2e manual test with copilot CLI target

🤖 Generated with Claude Code

The ACP usage_update event only reports cumulative context window tokens
(used), not separate input/output counts. Previously output was hardcoded
to 0, making token_usage misleading for copilot-cli targets.

This change tracks characters flowing in each direction (prompt + tool
results as input, agent message chunks as output) and pro-rates the
total used tokens proportionally to estimate the split.

Closes #683

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Mar 19, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: d02905e
Status: ✅  Deploy successful!
Preview URL: https://51667ae5.agentv.pages.dev
Branch Preview URL: https://fix-copilot-cli-token-usage.agentv.pages.dev

View logs

Copilot CLI does not currently emit usage_update events via ACP — the
usage data is tracked internally but marked ephemeral and not sent to
clients (see github/copilot-cli#1152). Previously this meant token_usage
was always undefined for copilot-cli targets.

This change estimates token usage from observed character counts:
- Input chars: prompt text + tool result payloads flowing to the agent
- Output chars: agent_message_chunk text flowing from the agent
- Both converted to tokens using a ~4 chars/token heuristic

When/if copilot CLI starts emitting usage_update events, the provider
will prefer those values (with char-based output estimation to split
the cumulative `used` count into input/output).

Closes #683

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@christso christso marked this pull request as ready for review March 19, 2026 23:05
@christso
Copy link
Collaborator Author

Note on estimation accuracy:

  • Output tokens: Reasonable estimate — ~4 chars/token holds well for English/code LLM output
  • Input tokens: Significantly underestimated — we only see the user prompt and tool results, not copilot's internal system prompt, tool definitions, and context window setup (easily 2,000-10,000+ tokens). The reported input value is a lower bound, not a true count

This is an inherent limitation of the ACP protocol not exposing token data. Accurate counts are blocked on github/copilot-cli#1152.

christso and others added 3 commits March 19, 2026 23:43
The ACP PromptResponse includes a Usage field with inputTokens,
outputTokens, thoughtTokens, and cachedReadTokens. Although copilot
CLI v1.0.9 doesn't populate this yet (marked @experimental/UNSTABLE
in the ACP spec), this positions the provider to use accurate token
counts as soon as copilot starts returning them.

Token usage resolution order:
1. PromptResponse.usage (accurate, from ACP — not yet populated)
2. usage_update session events (not yet emitted via ACP)
3. Character-based estimation (~4 chars/token heuristic)

Also makes raceWithTimeout generic so it preserves the PromptResponse
return value instead of discarding it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rces

Char-based estimation produced misleading values (e.g. input: 65 when
real input was 3000+ tokens) because we can't observe copilot's system
prompt and internal context. Better to report nothing than to mislead.

What remains:
- PromptResponse.usage capture (accurate when copilot populates it)
- usage_update event handler with cost accumulation
- Generic raceWithTimeout preserving PromptResponse return value
- Code comments documenting why token_usage is currently undefined

Blocked on github/copilot-cli#1152 for accurate token reporting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@christso christso merged commit 4c67ebc into main Mar 20, 2026
1 check passed
@christso christso deleted the fix/copilot-cli-token-usage-estimation branch March 20, 2026 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

copilot-cli provider: token_usage missing output tokens for target agent

1 participant