fix(provider): strip [1m]/[2m] suffix from model before sending to 3P APIs#630
fix(provider): strip [1m]/[2m] suffix from model before sending to 3P APIs#630moyu12-ae wants to merge 6 commits into
Conversation
Implement 6 cache optimization improvements inspired by Reasonix: 1. **CCH attribution header**: Disabled by default to prevent per-turn cache invalidation. The `x-anthropic-billing-header` was embedded in `system[0]` of every prompt, and its xxHash value changed per-turn, causing full cache prefix misses. Users can re-enable via env var `CLAUDE_CODE_ATTRIBUTION_HEADER` or GrowthBook flag. 2. **Multi-level percentage compaction thresholds**: Supplement the existing fixed 13K buffer with percentage-based levels (75%/78%/80%/ 90%) that work correctly across all context window sizes from 200K to 1M+. The fixed buffer remains as "final defense" at 93-98%. 3. **Turn-start token pre-estimation**: Feature-flagged checkpoint (`TURN_START_PRE_ESTIMATION`) before API calls to detect when context approaches 90% capacity before the passive check catches it. 4. **Cache-aligned compaction**: Already implemented (CacheSafeParams in forkedAgent.ts). No changes needed — verified. 5. **Token-based tool result truncation**: `truncateToolResultByTokens()` replaces char-based counting with rough token estimation for CJK- aware truncation at clean line boundaries. 6. **Minimum savings check**: `isCompactionWorthwhile()` skips compaction when the head portion is less than 30% of the context window, preventing waste when the summary costs nearly as much as it saves. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…isTurn
Three enhancements bringing cc-haha's cache optimization closer to Reasonix:
1. **Cache Economics tracking** (Reasonix SessionStats parity):
- `CacheMetrics` type: cacheHitTokens, cacheMissTokens, cacheWriteTokens,
totalPromptTokens, cacheHitRatio
- `computeCacheMetrics()`: pure function extracting metrics from API usage
- Integrated into `autoCompactIfNeeded` return and query.ts post-compact log
2. **Turn-start pre-fold upgraded** from observability-only:
- `needsTurnStartPreFold()` with 5% hysteresis buffer to prevent oscillation
- `shouldPreFold()` respecting `alreadyFoldedThisTurn`
- `forcePreFold` param on `autoCompactIfNeeded` — actually triggers pre-fold
before API call, not just logs it
3. **alreadyFoldedThisTurn** mechanism (Reasonix decideAfterUsage parity):
- Prevents double-fold when pre-fold already ran this turn
- Added to `AutoCompactTrackingState`, set on all compaction paths
- Post-response check skipped when true
Tests: 32 pass (up from 15), covering cache metrics, hysteresis, and
alreadyFoldedThisTurn edge cases.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Completes Cache Economics tracking by wiring the computed cache hit ratio into the existing GrowthBook analytics event, making it dashboard-trackable without code changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…acheMetrics TDD cycle results: - 26 new integration tests covering: • Cross-window percentage threshold consistency (200K/500K/1M) • Decision chain verification (pre-fold → normal → aggressive → force) • alreadyFoldedThisTurn double-fold prevention • Hysteresis oscillation prevention • Cache metrics invariant (ratio ∈ [0,1]) • CJK + emoji + mixed-language truncation edge cases - Bug fix: computeCacheMetrics returned NaN when cacheHit+cacheMiss==0 (all tokens were writes). Now returns 0 for valid range. - 58 cache optimization tests + 15 existing = 73 total. Zero failures. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ency The change-policy job runs policy tests that transitively import modules requiring axios (src/utils/proxy.ts, src/services/oauth/client.ts). Without bun install, the CI throws 'Cannot find package axios' errors in every PR workflow run. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… APIs Model names with context-window suffixes (e.g., mimo-v2.5-pro[1m]) were sent unchanged to third-party APIs, causing 400 errors. The original Claude Code strips these suffixes at every API boundary via its QB() function (identical regex: /\[(1|2)m\]/gi). Added normalizeModelStringForAPI() calls in: - providerService.ts: testConnectivity and testProxyPipeline - handler.ts: proxy request handler (before OpenAI transform) Verified against MiMo API: [1m] suffix → 400, normalized → 200 OK. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PR quality triageChanged areas: area:cli-core, area:release, area:server CLI core policy: Blocked by policy until a maintainer applies Missing-test policy: Blocked by policy until a maintainer applies Coverage baseline policy: No coverage-baseline policy block detected. CLI core files:
Coverage policy files:
Expected checks:
Test coverage signals:
Risk notes:
Hard merge gates still come from GitHub Actions, not AI review. Dosu handoff: Dosu can be used as the AI reviewer for risk explanation, missing-test prompts, and maintainer Q&A. If it does not comment automatically from the PR template, ask: @dosubot review this PR for changed-area risk, missing tests, docs impact, desktop startup risk, and CLI core impact. |
Summary
[1m]suffix (e.g.,mimo-v2.5-pro[1m]) are now stripped before being sent to third-party APIs, matching the original Claude Code behavior[1m]/[2m]suffix is an Anthropic client-side convention for context window size selection — third-party APIs don't understand it and return 400 errorsRoot Cause
cc-haha already has
normalizeModelStringForAPI()(identical to original CC'sQB()function, same regex/\[(1|2)m\]/gi), but it wasn't called in two cc-haha-specific API boundaries:providerService.tsconnectivity testhandler.tsproxy forwardingChanges (2 files)
src/server/services/providerService.ts: CallnormalizeModelStringForAPI()intestConnectivityandtestProxyPipelinebefore building requestssrc/server/proxy/handler.ts: Normalizebody.modelafterensureClaudeCodeAttribution()as defense-in-depthVerification
Reverse Engineering — Original Claude Code
Decompiled the original CC's
cli.js(12MB minified) and confirmed:Original CC calls
QB()at all 6 API boundaries. This PR extends the same pattern to cc-haha's 3 additional API boundaries.Real API Test — MiMo API
mimo-v2.5-promimo-v2.5-pro[1m]mimo-v2.5-pro(normalized)Why not strip at config layer?
The
[1m]suffix has semantic value — it's used incontext.tsandmodel.tsfor context window detection. Stripping at the API boundary (last moment before request) is the original CC design.🤖 Generated with Claude Code