Skip to content

feat(cache): multi-level compaction thresholds, CCH optimization, cache economics#628

Open
moyu12-ae wants to merge 5 commits into
NanmiCoder:mainfrom
moyu12-ae:feat/cache-optimization
Open

feat(cache): multi-level compaction thresholds, CCH optimization, cache economics#628
moyu12-ae wants to merge 5 commits into
NanmiCoder:mainfrom
moyu12-ae:feat/cache-optimization

Conversation

@moyu12-ae
Copy link
Copy Markdown
Contributor

Summary

基于 #614 的系统化缓存优化方案,参考 Reasonix 架构设计,实现 7 项缓存改进:

CCH 归因头默认关闭(1 行改动)

  • isAttributionHeaderEnabled() 默认值 truefalse
  • CCH header 在 system[0] 中每轮变化 → cache prefix 全量失效 → 每轮 cache miss
  • 关闭后 system prompt + tools 跨轮保持稳定 → cache hit 从 0% → 85-98%
  • 模拟估算:20 轮对话节省 66% 总成本(~$2.90/会话)

多级百分比 Compaction 阈值

  • 新增 75%/78%/80%/90% 四级阈值(补充固定 13K buffer,不替代)
  • getCompactionLevel() 多级决策函数
  • GrowthBook 开关:tengu_multi_level_compact(默认 true)

Turn-Start Token 预估算

  • estimateTurnStartUsage() + needsTurnStartPreFold() 含 5% hysteresis 防振荡
  • forcePreFold 参数 → API 调用前真正触发 pre-fold
  • Feature flag:TURN_START_PRE_ESTIMATION

alreadyFoldedThisTurn 防双折

  • AutoCompactTrackingState 新增字段
  • Pre-fold → 标记 → post-response 检查跳过

Token 粒度工具结果截断

  • truncateToolResultByTokens() 使用 roughTokenCount + clean boundary 截断

最小节省检查

  • isCompactionWorthwhile() <30% 头占比则跳过 compaction

Cache Economics 追踪

  • computeCacheMetrics() + CacheMetrics 类型
  • compactionCacheHitRatio 加入 tengu_auto_compact_succeeded 事件

测试

  • 73 tests, 0 failures(32 单元 + 26 集成 + 15 已有测试)
  • cacheSavingsEstimate.test.ts 确定性缓存节约模拟

Closes #614

moyu12-ae and others added 4 commits May 27, 2026 14:40
Implement 6 cache optimization improvements inspired by Reasonix:

1. **CCH attribution header**: Disabled by default to prevent per-turn
   cache invalidation. The `x-anthropic-billing-header` was embedded in
   `system[0]` of every prompt, and its xxHash value changed per-turn,
   causing full cache prefix misses. Users can re-enable via env var
   `CLAUDE_CODE_ATTRIBUTION_HEADER` or GrowthBook flag.

2. **Multi-level percentage compaction thresholds**: Supplement the
   existing fixed 13K buffer with percentage-based levels (75%/78%/80%/
   90%) that work correctly across all context window sizes from 200K
   to 1M+. The fixed buffer remains as "final defense" at 93-98%.

3. **Turn-start token pre-estimation**: Feature-flagged checkpoint
   (`TURN_START_PRE_ESTIMATION`) before API calls to detect when
   context approaches 90% capacity before the passive check catches it.

4. **Cache-aligned compaction**: Already implemented (CacheSafeParams
   in forkedAgent.ts). No changes needed — verified.

5. **Token-based tool result truncation**: `truncateToolResultByTokens()`
   replaces char-based counting with rough token estimation for CJK-
   aware truncation at clean line boundaries.

6. **Minimum savings check**: `isCompactionWorthwhile()` skips compaction
   when the head portion is less than 30% of the context window,
   preventing waste when the summary costs nearly as much as it saves.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…isTurn

Three enhancements bringing cc-haha's cache optimization closer to Reasonix:

1. **Cache Economics tracking** (Reasonix SessionStats parity):
   - `CacheMetrics` type: cacheHitTokens, cacheMissTokens, cacheWriteTokens,
     totalPromptTokens, cacheHitRatio
   - `computeCacheMetrics()`: pure function extracting metrics from API usage
   - Integrated into `autoCompactIfNeeded` return and query.ts post-compact log

2. **Turn-start pre-fold upgraded** from observability-only:
   - `needsTurnStartPreFold()` with 5% hysteresis buffer to prevent oscillation
   - `shouldPreFold()` respecting `alreadyFoldedThisTurn`
   - `forcePreFold` param on `autoCompactIfNeeded` — actually triggers pre-fold
     before API call, not just logs it

3. **alreadyFoldedThisTurn** mechanism (Reasonix decideAfterUsage parity):
   - Prevents double-fold when pre-fold already ran this turn
   - Added to `AutoCompactTrackingState`, set on all compaction paths
   - Post-response check skipped when true

Tests: 32 pass (up from 15), covering cache metrics, hysteresis, and
alreadyFoldedThisTurn edge cases.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Completes Cache Economics tracking by wiring the computed cache hit ratio
into the existing GrowthBook analytics event, making it dashboard-trackable
without code changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…acheMetrics

TDD cycle results:
- 26 new integration tests covering:
  • Cross-window percentage threshold consistency (200K/500K/1M)
  • Decision chain verification (pre-fold → normal → aggressive → force)
  • alreadyFoldedThisTurn double-fold prevention
  • Hysteresis oscillation prevention
  • Cache metrics invariant (ratio ∈ [0,1])
  • CJK + emoji + mixed-language truncation edge cases
- Bug fix: computeCacheMetrics returned NaN when cacheHit+cacheMiss==0
  (all tokens were writes). Now returns 0 for valid range.
- 58 cache optimization tests + 15 existing = 73 total. Zero failures.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels May 27, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 27, 2026

PR quality triage

Changed areas: area:cli-core, area:release

CLI core policy: Blocked by policy until a maintainer applies allow-cli-core-change and approves the PR.

Missing-test policy: Blocked by policy until a maintainer applies allow-missing-tests or matching tests are added.

Coverage baseline policy: No coverage-baseline policy block detected.

CLI core files:

  • src/utils/toolResultStorage.ts

Coverage policy files:

  • none

Expected checks:

  • change-policy
  • server-checks
  • coverage-checks

Test coverage signals:

  • BLOCKING unless allow-missing-tests is applied: Agent/runtime product files changed without a tools/utils test file in the PR.
  • Agent/model runtime path changed: use mock/request-shape tests in PR and maintainer live-model smoke before release.

Risk notes:

  • CI/policy changed: inspect workflow behavior itself, not just application tests.

Hard merge gates still come from GitHub Actions, not AI review.

Dosu handoff: Dosu can be used as the AI reviewer for risk explanation, missing-test prompts, and maintainer Q&A. If it does not comment automatically from the PR template, ask:

@dosubot review this PR for changed-area risk, missing tests, docs impact, desktop startup risk, and CLI core impact.

…ency

The change-policy job runs policy tests that transitively import modules
requiring axios (src/utils/proxy.ts, src/services/oauth/client.ts).
Without bun install, the CI throws 'Cannot find package axios' errors
in every PR workflow run.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:cli-core area:release enhancement New feature or request needs-maintainer-approval size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

尝试给 CC—haha 设计一套更加全面的缓存优化路线方案

1 participant