feat(cache): multi-level compaction thresholds, CCH optimization, cache economics#628
feat(cache): multi-level compaction thresholds, CCH optimization, cache economics#628moyu12-ae wants to merge 5 commits into
Conversation
Implement 6 cache optimization improvements inspired by Reasonix: 1. **CCH attribution header**: Disabled by default to prevent per-turn cache invalidation. The `x-anthropic-billing-header` was embedded in `system[0]` of every prompt, and its xxHash value changed per-turn, causing full cache prefix misses. Users can re-enable via env var `CLAUDE_CODE_ATTRIBUTION_HEADER` or GrowthBook flag. 2. **Multi-level percentage compaction thresholds**: Supplement the existing fixed 13K buffer with percentage-based levels (75%/78%/80%/ 90%) that work correctly across all context window sizes from 200K to 1M+. The fixed buffer remains as "final defense" at 93-98%. 3. **Turn-start token pre-estimation**: Feature-flagged checkpoint (`TURN_START_PRE_ESTIMATION`) before API calls to detect when context approaches 90% capacity before the passive check catches it. 4. **Cache-aligned compaction**: Already implemented (CacheSafeParams in forkedAgent.ts). No changes needed — verified. 5. **Token-based tool result truncation**: `truncateToolResultByTokens()` replaces char-based counting with rough token estimation for CJK- aware truncation at clean line boundaries. 6. **Minimum savings check**: `isCompactionWorthwhile()` skips compaction when the head portion is less than 30% of the context window, preventing waste when the summary costs nearly as much as it saves. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…isTurn
Three enhancements bringing cc-haha's cache optimization closer to Reasonix:
1. **Cache Economics tracking** (Reasonix SessionStats parity):
- `CacheMetrics` type: cacheHitTokens, cacheMissTokens, cacheWriteTokens,
totalPromptTokens, cacheHitRatio
- `computeCacheMetrics()`: pure function extracting metrics from API usage
- Integrated into `autoCompactIfNeeded` return and query.ts post-compact log
2. **Turn-start pre-fold upgraded** from observability-only:
- `needsTurnStartPreFold()` with 5% hysteresis buffer to prevent oscillation
- `shouldPreFold()` respecting `alreadyFoldedThisTurn`
- `forcePreFold` param on `autoCompactIfNeeded` — actually triggers pre-fold
before API call, not just logs it
3. **alreadyFoldedThisTurn** mechanism (Reasonix decideAfterUsage parity):
- Prevents double-fold when pre-fold already ran this turn
- Added to `AutoCompactTrackingState`, set on all compaction paths
- Post-response check skipped when true
Tests: 32 pass (up from 15), covering cache metrics, hysteresis, and
alreadyFoldedThisTurn edge cases.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Completes Cache Economics tracking by wiring the computed cache hit ratio into the existing GrowthBook analytics event, making it dashboard-trackable without code changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…acheMetrics TDD cycle results: - 26 new integration tests covering: • Cross-window percentage threshold consistency (200K/500K/1M) • Decision chain verification (pre-fold → normal → aggressive → force) • alreadyFoldedThisTurn double-fold prevention • Hysteresis oscillation prevention • Cache metrics invariant (ratio ∈ [0,1]) • CJK + emoji + mixed-language truncation edge cases - Bug fix: computeCacheMetrics returned NaN when cacheHit+cacheMiss==0 (all tokens were writes). Now returns 0 for valid range. - 58 cache optimization tests + 15 existing = 73 total. Zero failures. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PR quality triageChanged areas: area:cli-core, area:release CLI core policy: Blocked by policy until a maintainer applies Missing-test policy: Blocked by policy until a maintainer applies Coverage baseline policy: No coverage-baseline policy block detected. CLI core files:
Coverage policy files:
Expected checks:
Test coverage signals:
Risk notes:
Hard merge gates still come from GitHub Actions, not AI review. Dosu handoff: Dosu can be used as the AI reviewer for risk explanation, missing-test prompts, and maintainer Q&A. If it does not comment automatically from the PR template, ask: @dosubot review this PR for changed-area risk, missing tests, docs impact, desktop startup risk, and CLI core impact. |
…ency The change-policy job runs policy tests that transitively import modules requiring axios (src/utils/proxy.ts, src/services/oauth/client.ts). Without bun install, the CI throws 'Cannot find package axios' errors in every PR workflow run. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
基于 #614 的系统化缓存优化方案,参考 Reasonix 架构设计,实现 7 项缓存改进:
CCH 归因头默认关闭(1 行改动)
isAttributionHeaderEnabled()默认值true→falsesystem[0]中每轮变化 → cache prefix 全量失效 → 每轮 cache miss多级百分比 Compaction 阈值
getCompactionLevel()多级决策函数tengu_multi_level_compact(默认 true)Turn-Start Token 预估算
estimateTurnStartUsage()+needsTurnStartPreFold()含 5% hysteresis 防振荡forcePreFold参数 → API 调用前真正触发 pre-foldTURN_START_PRE_ESTIMATIONalreadyFoldedThisTurn 防双折
AutoCompactTrackingState新增字段Token 粒度工具结果截断
truncateToolResultByTokens()使用 roughTokenCount + clean boundary 截断最小节省检查
isCompactionWorthwhile()<30% 头占比则跳过 compactionCache Economics 追踪
computeCacheMetrics()+CacheMetrics类型compactionCacheHitRatio加入tengu_auto_compact_succeeded事件测试
cacheSavingsEstimate.test.ts确定性缓存节约模拟Closes #614