feat(runtime): surface cost, usage breakdown, and available commands on status events and getStatus#345
Conversation
…on status events and getStatus The ACP wire protocol carries richer context-window and command data than the runtime currently exposes. Three additive surface changes: - usage_update events now carry the agent-reported cost and a typed per-turn breakdown (input/output/cachedRead/cachedWrite/thought/total tokens) normalized from the wire payload's _meta.usage. Previously only the top-level used and size numbers survived into AcpRuntimeEvent. - available_commands_update events now carry the full availableCommands list (name, description, hasInput flag) instead of dropping it to a one-line summary, so clients can detect /compact, /clear, and similar agent-advertised commands. - AcpRuntimeStatus.usage and AcpRuntimeStatus.availableCommands now expose the cumulative + per-request token breakdowns and command list that the session reducer already persists onto the record. Pure addition: every new field on the AcpRuntimeEvent status variant and on AcpRuntimeStatus is optional. The text payloads on the existing events are unchanged for the empty / unknown cases, and the persisted record schema is untouched. New types: AcpRuntimeUsageCost, AcpRuntimeUsageBreakdown, AcpRuntimeAvailableCommand, AcpRuntimeSessionUsage.
|
Codex review: needs real behavior proof before merge. Reviewed May 26, 2026, 3:13 AM ET / 07:13 UTC. Summary Reproducibility: not applicable. this is an additive feature PR rather than a bug report. Source inspection confirms current main lacks the requested public runtime fields, while the PR body provides partial live-output proof for availableCommands. Review metrics: 2 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance: Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Merge the focused runtime plumbing only after maintainers accept the public field shape and either obtain populated real-agent usage proof or explicitly accept unit coverage for adapters the contributor cannot run. Do we have a high-confidence way to reproduce the issue? Not applicable: this is an additive feature PR rather than a bug report. Source inspection confirms current main lacks the requested public runtime fields, while the PR body provides partial live-output proof for availableCommands. Is this the best way to solve the issue? Unclear: the patch is a narrow way to preserve structured ACP data for downstream consumers, but the normalized public field names and hasInput abstraction need maintainer acceptance before they become stable API. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against fea7ee6e1456. Label changesLabel justifications:
Evidence reviewedWhat I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
… and drop CHANGELOG edit Two follow-ups from ClawSweeper's review: - Restore the legacy 'available commands updated' (no count) text when the wire list is missing or empty. The structured availableCommands: [] field is still attached either way, so text-matching consumers see no change while structured consumers still get the new field. - Revert the direct CHANGELOG.md edit. Feature PRs in this repo leave release-note authoring to the release commit by convention.
|
@clawsweeper re-review Added a folded "Real-agent verification" section to the PR description with the harness script, redacted codex output, and a field-by-field validation matrix covering each new surface and how it generalises to other adapters. |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
Add AcpRuntimeAvailableCommand, AcpRuntimeUsageBreakdown,
AcpRuntimeUsageCost, and AcpRuntimeSessionUsage to the export type {}
block in src/runtime.ts so downstream consumers can reach them via
'acpx/runtime' alongside the existing types. Without this they were
declared in contract.ts but only reachable via the deeper
'acpx/runtime/public/contract' import path.
Surfaces fields the wire protocol already carries but the runtime currently throws away. Pure addition — every new field is optional and existing event shapes are preserved.
Motivation
Downstream consumers building chat UIs on top of
acpxcan't render a live context-window bar or detect agent-advertised slash commands today, because the runtime drops the relevant data on the floor between the wire payload and the public event/status surfaces.usage_update: ACP carriescost, plus a per-turn token breakdown under_meta.usage(Claude Code populates this; Codex partially does). The runtime currently keeps onlyusedandsize.available_commands_update: ACP carriesAvailableCommand[]with name/description/input. The runtime currently emits a one-line summary string and discards the list.getStatus(): the session reducer already persistscumulative_token_usage,request_token_usage, andacpx.available_commandsonto the record, but the publicAcpRuntimeStatusshape doesn't expose any of them.This PR plumbs all three through.
What changed
src/runtime/public/contract.tsNew types:
AcpRuntimeUsageCost = { amount?, currency? }AcpRuntimeUsageBreakdown = { inputTokens?, outputTokens?, cachedReadTokens?, cachedWriteTokens?, thoughtTokens?, totalTokens? }AcpRuntimeAvailableCommand = { name, description?, hasInput }AcpRuntimeSessionUsage = { cumulative?, perRequest? }Extends the
statusvariant ofAcpRuntimeEventwith optionalcost,breakdown,availableCommands. ExtendsAcpRuntimeStatuswith optionalusage,availableCommands.src/runtime/public/events.tsusageUpdateEventnow readspayload.costandpayload._meta.usagedefensively, normalizes both, and forwards them on the event. Both fields are optional and absent for adapters (e.g. gemini-cli today) that don't report them.availableCommandsUpdateEventreplaces the previous one-line-summary mapping. Normalizes each entry into{ name, description?, hasInput }, dropping entries that lack a usablename. The wireinputpayload (anAvailableCommandInputschema) is intentionally collapsed into thehasInputboolean — picker UIs only need the binary "does it want an argument?" signal, and we don't want to lock in the input schema here.src/runtime/engine/manager.tsgetStatus()now includesusageandavailableCommandswhen the persisted record carries the underlying data. The session reducer already stashes both, so no reducer changes are needed.tokenUsageToBreakdown,buildUsageField,buildAvailableCommandsFieldnext to the existingbuildModelsField.The persisted record format (
SessionTokenUsageuses snake_case + Claude-style key names; the reducer atconversation-model.ts:421only retains 4 of the 6 SDK token fields) is preserved as-is.tokenUsageToBreakdownmaps:input_tokensinputTokensoutput_tokensoutputTokenscache_read_input_tokenscachedReadTokenscache_creation_input_tokenscachedWriteTokensthoughtTokensandtotalTokensaren't persisted today (only seen on live events from_meta.usage). Live events from the new event path carry the full 6-field shape; status reads from the persisted record carry the 4-field subset. The asymmetry is documented in the type doc-comments.buildAvailableCommandsFieldreads names fromrecord.acpx.available_commands(which the reducer persists asstring[]) and surfaces them as{ name, hasInput: false }. Live events from the new event path carry the richer{ name, description, hasInput }shape. Same asymmetry, same reason — kept the reducer's persisted shape untouched to avoid a session-record migration.Tests
test/runtime-events.test.ts— new tests forusage_updatewithcost+_meta.usageround-trip, partial-cost handling,_metawithout ausagerecord being ignored, andavailableCommandsenrichment (description, hasInput flag, dropped invalid entries). Existing tests updated to feed the realistic object form (the old test fed bare strings — non-spec).test/runtime-manager.test.ts— new tests coveringgetStatus()populating both fields when the record carries them, and omitting both when the record is empty.CHANGELOG.mdThree bullets under "Unreleased / Changes". No breaking entries.
Test plan
pnpm typecheckclean.pnpm lintclean.pnpm format:checkclean (against tracked files; thecoverage/andreports/artifacts thatpnpm checkproduces are pre-existing repo behavior, not from this PR).pnpm test— 734 / 734 pass, 0 fail.pnpm check(format → typecheck → lint → build → viewer:typecheck → viewer:build → test:coverage → mutate) runs end-to-end with mutation score 91.07 ≥ 80 threshold.Compatibility
used/sizefromusage_update, or read only the existing fields onAcpRuntimeStatus, continue to work unchanged.available_commands_updateevent changes from"available commands updated (N)"(or"available commands updated"for empty/missing list) to always include a count —"available commands updated (N)"with N >= 0. The two test cases that pinned the old text have been updated.Out of scope (deliberate)
AvailableCommandInputschema isn't plumbed through — we only surfacehasInput: boolean. If/when picker UIs need typed input args, that's a separate addition.compact()convenience on top of the new event channel by sending the agent's reported/compactname as astartTurntext.hasInput). That's whygetStatus().availableCommandscarriesdescription = undefined, hasInput = falsefor entries sourced from the persisted record, while live events from the wire carry the full data. Migrating the persisted shape is a follow-up if downstream consumers need historical detail.Real-agent verification (click to expand)
What was tested
A minimal harness that exercises every new surface this PR adds and
dumps the result as JSON. The full script is reproducible from a clean
checkout of this branch with
pnpm install && pnpm build, thennode proof.mjswith the file below.Captured output (codex, redacted)
Ran against
npx -y @agentclientprotocol/codex-acp@^0.0.44(resolved by the runtime's built-in registry under
agent: "codex"),local codex CLI authenticated via ChatGPT. Session UUIDs, pids,
tmpdir paths, and locally-installed skill names have been redacted;
the four codex built-in commands (
mcp,skills,status,logout)are kept verbatim as they are part of codex's public command surface.
{ "turnResult": { "status": "completed", "stopReason": "end_turn" }, "usageUpdateEventCount": 1, "usageUpdateEventFirst": { "type": "status", "text": "usage updated: 19985/258400", "tag": "usage_update", "used": 19985, "size": 258400 }, "availableCommandsUpdateEventCount": 1, "availableCommandsUpdateEvents": [ { "type": "status", "text": "available commands updated (38)", "tag": "available_commands_update", "availableCommands": [ { "name": "mcp", "description": "List configured Model Context Protocol (MCP) tools.", "hasInput": false }, { "name": "skills", "description": "List available skills.", "hasInput": false }, { "name": "status", "description": "Display session configuration and token usage.", "hasInput": false }, { "name": "logout", "description": "Sign out of Codex...", "hasInput": false }, { "name": "$<redacted-local-skill>", "description": "<truncated>", "hasInput": false }, { "name": "$<redacted-local-skill>", "description": "<truncated>", "hasInput": false }, { "_redacted": "32 additional locally-installed skills omitted" } ] } ], "getStatusResponse": { "summary": "session=proof backendSessionId=<redacted> pid=<redacted> open", "models": { "currentModelId": "gpt-5.5[medium]", "availableModelIds": [ "gpt-5.5[low]", "gpt-5.5[medium]", "gpt-5.5[high]", "gpt-5.5[xhigh]", "gpt-5.4[low]", "gpt-5.4[medium]", "gpt-5.4[high]", "gpt-5.4[xhigh]", "gpt-5.4-mini[low]", "...", "gpt-5.2[xhigh]" ] }, "availableCommands": [ { "name": "mcp", "hasInput": false }, { "name": "skills", "hasInput": false }, { "name": "status", "hasInput": false }, { "name": "logout", "hasInput": false }, { "_redacted": "34 locally-installed skills omitted" } ], "details": { "cwd": "<redacted-tmpdir>", "lastUsedAt": "...", "closed": false } } }What this validates
available_commands_update.availableCommandsrich list (the previously-dropped field){ name, description, hasInput }AcpRuntimeStatus.availableCommandsongetStatus(){ name, hasInput: false }as the reducer only persists namesusage_update.used/usage_update.size(preserved)19985 / 258400available_commands_updatefallback text when list is emptyparsePromptEventLine covers status and tool summary fallbacksusage_update.costused/size-only consumers see no changeusage_update.breakdown_meta.usageparsePromptEventLine surfaces cost and _meta.usage breakdown on usage_updatecovers the populated caseAcpRuntimeStatus.usage_meta.usageto persistThe two new payload surfaces this PR adds (rich
availableCommandsonthe event and on
getStatus()) both round-trip correctly against areal codex turn. The two surfaces codex doesn't emit (
cost,breakdown) demonstrate the normalizer's defensive-omit behavior —the runtime never fabricates empty objects, so consumers that read
only
used/sizesee byte-identical events tomain.How this generalises to other agents
The harness is agent-neutral — only the
agent: "codex"stringchanges when pointing at a different adapter. Same script, same
assertions, just a different built-in spawn command resolved by the
agent registry.
@agentclientprotocol/claude-agent-acp) — Claude Codepopulates
_meta.usageper-turn, so the same proof againstagent: "claude"will surfaceusage_update.breakdownpopulatedwith
inputTokens / outputTokens / cachedReadTokens / cachedWriteTokens / thoughtTokens / totalTokens. Thecostfield is also expected ifthe adapter emits it. The
availableCommandslist is shorter(Claude Code's command set) but the shape is identical.
gemini --acp) — gemini-cli today doesn't implementavailable_commands_updateorusage_updatewith rich payloads.The runtime correctly emits zero events on those surfaces and
getStatus().availableCommands/getStatus().usageare bothomitted. No crash, no synthetic empty payloads.
UsageUpdateandAvailableCommandsUpdateautomatically benefitsfrom the same plumbing — the changes are wire-level normalizers,
not adapter-specific code paths.
Reviewers wanting to verify against a different adapter can drop in
their own
agent:string (any name the built-in registry resolves) orregister a custom command via
createAgentRegistry({ overrides: { ... } }).