Skip to content

Add cost estimation and context budget forecasting#1

Merged
yanpgwang merged 2 commits into
mainfrom
feature/cost-and-forecast
May 19, 2026
Merged

Add cost estimation and context budget forecasting#1
yanpgwang merged 2 commits into
mainfrom
feature/cost-and-forecast

Conversation

@yanpgwang
Copy link
Copy Markdown
Owner

Summary

Two new features that make context-profiler reports immediately actionable:

1. Cost Estimation (pricing.py)

Adds $/request cost estimates based on model pricing tables.

  • Supports Claude (Opus, Sonnet, Haiku), GPT-4o, GPT-4o-mini, GPT-4-turbo
  • Loose model name matching (e.g., "claude-3-5-sonnet-20240620" → Claude 3.5 Sonnet)
  • Cost appears in token_counter summary and diagnosis JSON
  • Gracefully skips if model is unknown

2. Budget Forecasting (session_insights.py)

Predicts when a session will overflow the context window.

  • Linear extrapolation from turn-over-turn token growth
  • Model-aware context window sizes (Claude 200K, GPT-4o 128K)
  • New budget_forecast field in session_insights:
    • growth_rate_per_turn
    • current_utilization
    • estimated_overflow_turn
    • context_window_tokens
  • New diagnostic issue: CONTEXT_OVERFLOW_RISK
    • Triggers when overflow is projected within 2x current turn count
    • Severity: critical (>80% utilized), warning (>50%)

Testing

  • 92 tests passing
  • New test file: tests/test_budget_forecast.py (20 test cases)
  • All existing tests unaffected (no regressions)

Example output

{
  "budget_forecast": {
    "growth_rate_per_turn": 10000.0,
    "current_utilization": 0.3906,
    "estimated_overflow_turn": 12,
    "context_window_tokens": 128000,
    "model": "gpt-4o"
  },
  "cost": {
    "estimated_input_cost_usd": 0.001978,
    "estimated_model": "GPT-4o"
  }
}

yanpgwang added 2 commits May 19, 2026 18:46
Introduce per-request and per-session cost estimates based on model
pricing. Adds pricing.py with a lookup table for Claude and GPT models,
integrates cost into the token_counter analyzer summary, CLI reporter
output, and diagnosis JSON under a top-level "cost" key.

Unknown models are silently skipped (no errors, no cost field).
Predict when a session will hit the context window limit based on
turn-over-turn token growth rate. Adds CONTEXT_OVERFLOW_RISK diagnostic
issue when overflow is projected within 2x the current turn count.

New fields in session_insights.budget_forecast:
- growth_rate_per_turn
- current_utilization
- estimated_overflow_turn
- context_window_tokens
- model (matched family)
@yanpgwang yanpgwang merged commit 1f7d131 into main May 19, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant