feat(cache): multi-level compaction thresholds, CCH optimization, cache economics by moyu12-ae · Pull Request #628 · NanmiCoder/cc-haha

moyu12-ae · 2026-05-27T07:23:09Z

Summary

基于 #614 的系统化缓存优化方案，参考 Reasonix 架构设计，实现 7 项缓存改进：

CCH 归因头默认关闭（1 行改动）

isAttributionHeaderEnabled() 默认值 true → false
CCH header 在 system[0] 中每轮变化 → cache prefix 全量失效 → 每轮 cache miss
关闭后 system prompt + tools 跨轮保持稳定 → cache hit 从 0% → 85-98%
模拟估算：20 轮对话节省 66% 总成本（~$2.90/会话）

多级百分比 Compaction 阈值

新增 75%/78%/80%/90% 四级阈值（补充固定 13K buffer，不替代）
getCompactionLevel() 多级决策函数
GrowthBook 开关：tengu_multi_level_compact（默认 true）

Turn-Start Token 预估算

estimateTurnStartUsage() + needsTurnStartPreFold() 含 5% hysteresis 防振荡
forcePreFold 参数 → API 调用前真正触发 pre-fold
Feature flag：TURN_START_PRE_ESTIMATION

alreadyFoldedThisTurn 防双折

AutoCompactTrackingState 新增字段
Pre-fold → 标记 → post-response 检查跳过

Token 粒度工具结果截断

truncateToolResultByTokens() 使用 roughTokenCount + clean boundary 截断

最小节省检查

isCompactionWorthwhile() <30% 头占比则跳过 compaction

Cache Economics 追踪

computeCacheMetrics() + CacheMetrics 类型
compactionCacheHitRatio 加入 tengu_auto_compact_succeeded 事件

测试

73 tests, 0 failures（32 单元 + 26 集成 + 15 已有测试）
cacheSavingsEstimate.test.ts 确定性缓存节约模拟

Closes #614

Implement 6 cache optimization improvements inspired by Reasonix: 1. **CCH attribution header**: Disabled by default to prevent per-turn cache invalidation. The `x-anthropic-billing-header` was embedded in `system[0]` of every prompt, and its xxHash value changed per-turn, causing full cache prefix misses. Users can re-enable via env var `CLAUDE_CODE_ATTRIBUTION_HEADER` or GrowthBook flag. 2. **Multi-level percentage compaction thresholds**: Supplement the existing fixed 13K buffer with percentage-based levels (75%/78%/80%/ 90%) that work correctly across all context window sizes from 200K to 1M+. The fixed buffer remains as "final defense" at 93-98%. 3. **Turn-start token pre-estimation**: Feature-flagged checkpoint (`TURN_START_PRE_ESTIMATION`) before API calls to detect when context approaches 90% capacity before the passive check catches it. 4. **Cache-aligned compaction**: Already implemented (CacheSafeParams in forkedAgent.ts). No changes needed — verified. 5. **Token-based tool result truncation**: `truncateToolResultByTokens()` replaces char-based counting with rough token estimation for CJK- aware truncation at clean line boundaries. 6. **Minimum savings check**: `isCompactionWorthwhile()` skips compaction when the head portion is less than 30% of the context window, preventing waste when the summary costs nearly as much as it saves. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…isTurn Three enhancements bringing cc-haha's cache optimization closer to Reasonix: 1. **Cache Economics tracking** (Reasonix SessionStats parity): - `CacheMetrics` type: cacheHitTokens, cacheMissTokens, cacheWriteTokens, totalPromptTokens, cacheHitRatio - `computeCacheMetrics()`: pure function extracting metrics from API usage - Integrated into `autoCompactIfNeeded` return and query.ts post-compact log 2. **Turn-start pre-fold upgraded** from observability-only: - `needsTurnStartPreFold()` with 5% hysteresis buffer to prevent oscillation - `shouldPreFold()` respecting `alreadyFoldedThisTurn` - `forcePreFold` param on `autoCompactIfNeeded` — actually triggers pre-fold before API call, not just logs it 3. **alreadyFoldedThisTurn** mechanism (Reasonix decideAfterUsage parity): - Prevents double-fold when pre-fold already ran this turn - Added to `AutoCompactTrackingState`, set on all compaction paths - Post-response check skipped when true Tests: 32 pass (up from 15), covering cache metrics, hysteresis, and alreadyFoldedThisTurn edge cases. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Completes Cache Economics tracking by wiring the computed cache hit ratio into the existing GrowthBook analytics event, making it dashboard-trackable without code changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…acheMetrics TDD cycle results: - 26 new integration tests covering: • Cross-window percentage threshold consistency (200K/500K/1M) • Decision chain verification (pre-fold → normal → aggressive → force) • alreadyFoldedThisTurn double-fold prevention • Hysteresis oscillation prevention • Cache metrics invariant (ratio ∈ [0,1]) • CJK + emoji + mixed-language truncation edge cases - Bug fix: computeCacheMetrics returned NaN when cacheHit+cacheMiss==0 (all tokens were writes). Now returns 0 for valid range. - 58 cache optimization tests + 15 existing = 73 total. Zero failures. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions · 2026-05-27T07:23:28Z

PR quality triage

Changed areas: area:cli-core, area:release

CLI core policy: Blocked by policy until a maintainer applies allow-cli-core-change and approves the PR.

Missing-test policy: Blocked by policy until a maintainer applies allow-missing-tests or matching tests are added.

Coverage baseline policy: No coverage-baseline policy block detected.

CLI core files:

src/utils/toolResultStorage.ts

Coverage policy files:

none

Expected checks:

change-policy
server-checks
coverage-checks

Test coverage signals:

BLOCKING unless allow-missing-tests is applied: Agent/runtime product files changed without a tools/utils test file in the PR.
Agent/model runtime path changed: use mock/request-shape tests in PR and maintainer live-model smoke before release.

Risk notes:

CI/policy changed: inspect workflow behavior itself, not just application tests.

Hard merge gates still come from GitHub Actions, not AI review.

Dosu handoff: Dosu can be used as the AI reviewer for risk explanation, missing-test prompts, and maintainer Q&A. If it does not comment automatically from the PR template, ask:

@dosubot review this PR for changed-area risk, missing tests, docs impact, desktop startup risk, and CLI core impact.

…ency The change-policy job runs policy tests that transitively import modules requiring axios (src/utils/proxy.ts, src/services/oauth/client.ts). Without bun install, the CI throws 'Cannot find package axios' errors in every PR workflow run. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

moyu12-ae and others added 4 commits May 27, 2026 14:40

dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels May 27, 2026

github-actions Bot added area:cli-core needs-maintainer-approval labels May 27, 2026

github-actions Bot added the area:release label May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cache): multi-level compaction thresholds, CCH optimization, cache economics#628

feat(cache): multi-level compaction thresholds, CCH optimization, cache economics#628
moyu12-ae wants to merge 5 commits into
NanmiCoder:mainfrom
moyu12-ae:feat/cache-optimization

moyu12-ae commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

moyu12-ae commented May 27, 2026

Summary

CCH 归因头默认关闭（1 行改动）

多级百分比 Compaction 阈值

Turn-Start Token 预估算

alreadyFoldedThisTurn 防双折

Token 粒度工具结果截断

最小节省检查

Cache Economics 追踪

测试

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR quality triage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 27, 2026 •

edited

Loading