Skip to content

feat: environment-aware model routing with fallback for restricted environments #37

@bdfinst

Description

@bdfinst

Problem

The plugin's model routing table (in plugins/agentic-dev-team/CLAUDE.md) assigns specific tier aliases (haiku, sonnet, opus) to each agent. The harness resolves those to snapshot IDs like claude-haiku-4-5-20251001. This works when an API key can reach all three tiers, but fails silently in two real scenarios:

  1. Corporate proxies with restricted model allowlists. Some users access Anthropic through a company proxy that only exposes a subset of models. When a haiku-tier subagent is dispatched, it returns model_not_available and the dispatch fails. The user sees an opaque error or — worse — a fallback that loses the work entirely.
  2. Anthropic snapshot deprecation. When Anthropic retires claude-haiku-4-5-20251001 (or any pinned snapshot), every agent that frontmatters model: haiku either breaks or silently changes behavior, depending on how the harness resolves stale IDs.

This blocks OSS users behind any restrictive proxy and creates a maintenance landmine across all haiku-tier review agents (naming-review, complexity-review, claude-setup-review, token-efficiency-review, a11y-review, svelte-review, js-fp-review, progress-guardian).

Constraints

The plugin is OSS. The fix must:

  • Zero-config by default. Fresh install on a standard Anthropic API key works without any setup file.
  • Self-discovering. Detect what's available rather than requiring users to maintain a list.
  • Graceful degradation. Never hard-fail when a model is missing; fall back transparently.
  • No leaked corporate config. Internal proxy URLs and allowlists cannot ship in plugin defaults.

Spec

1. knowledge/model-routing.json — ship defaults only

Single source of truth for tier-to-snapshot resolution. Replaces what's currently scattered across agent frontmatter and CLAUDE.md prose. When Anthropic rotates a snapshot ID, this is the one file that changes.

{
  \"haiku\":  \"claude-haiku-4-5-20251001\",
  \"sonnet\": \"claude-sonnet-4-6-20260201\",
  \"opus\":   \"claude-opus-4-8\"
}

Audit step: grep the codebase for any pinned snapshot ID (claude-*-\\d{8}) and migrate them to read from this file. Net result is one place to update across deprecations.

2. .claude/model-overrides.json — per-user, gitignored, generated

Never edited by hand. Acts as a cache of the user's environment, not as user-facing config. The plugin's .gitignore includes this path so corporate config never leaks into PRs.

{
  \"tier_aliases\": { \"haiku\": \"sonnet\" },
  \"generated_at\": \"<iso8601>\",
  \"available_models\": [\"claude-sonnet-4-6-20260201\", \"claude-opus-4-8\"],
  \"reason\": \"haiku tier not in /v1/models response\"
}

3. Dispatch-side fallback — primary mechanism, no user config needed

This is the change that matters most for OSS users. Inside any code path that passes model: <tier> to a subagent dispatch:

resolve T → snapshot via routing.json (and overrides if present)
try dispatch
  on model_not_available:
    if T == haiku  → retry with sonnet
    if T == sonnet → retry with opus
    if T == opus   → fail with actionable error naming both attempted IDs
  on success: record which tier actually ran for telemetry

Users with full Anthropic API access never trigger this. Users with restricted access silently get bumped up. The retry path is logged so users can see when degradation is happening.

4. /init-dev-team — opt-in probe step

A new sub-step (not automatic) that asks:

Probe $ANTHROPIC_BASE_URL/v1/models to detect which model tiers are available and cache the result? (y/N)

Useful if you're behind a corporate proxy or using AWS Bedrock / Vertex AI with restricted model access. Skip if you're using the standard Anthropic API.

On accept, runs the probe and writes .claude/model-overrides.json. On decline, dispatch fallback (#3) still handles missing models — the probe just makes the first failure faster.

Opt-in matters because some users have read-only API keys that can't hit /v1/models, some hit different endpoints (Bedrock, Vertex), and some don't want unnecessary network calls.

5. /model-routing-check slash command

Diagnostic-only, idempotent, no side effects. Prints:

  • Current resolved tier → snapshot map (defaults + any overrides applied)
  • Whether .claude/model-overrides.json exists and what it contains
  • Result of probing /v1/models (if reachable)
  • The most-recent dispatch tier-bumps from the session log (if any)

When something breaks on a user's machine, this one command surfaces everything an issue triage needs to know.

6. Documentation

plugins/agentic-dev-team/docs/model-routing.md explains:

  • The contract (tier aliases → snapshots → dispatch → fallback)
  • When and why the fallback fires
  • How to interpret .claude/model-overrides.json
  • How to add a new model tier if Anthropic ships one
  • Troubleshooting for Bedrock / Vertex / proxy users

What ships vs. stays local

Component Ships in plugin Per-user
knowledge/model-routing.json ✅ defaults only
Dispatch fallback logic ✅ in orchestrator
/init-dev-team probe step ✅ opt-in prompt
/model-routing-check command ✅ diagnostic-only
docs/model-routing.md
.claude/model-overrides.json ❌ gitignored ✅ generated locally
Proxy URLs / allowlists ❌ never ✅ user env vars only

Acceptance criteria

  • No pinned snapshot IDs in agent frontmatter or CLAUDE.md; all resolution flows through knowledge/model-routing.json.
  • Plugin installs and runs with zero configuration on a standard Anthropic API key.
  • When a tier is unavailable, dispatch transparently retries one tier up; only fails after exhausting the chain.
  • Failed dispatches log which tier was attempted and what was tried as fallback (visible via /model-routing-check).
  • .claude/model-overrides.json is created only when explicitly requested (probe step) or when the user writes it.
  • .claude/model-overrides.json is in the plugin's gitignore.
  • /model-routing-check works as a no-side-effect diagnostic.
  • Bats coverage for the fallback chain (mock dispatch returning model_not_available for haiku, assert it retries sonnet, etc.).
  • Documentation covers Bedrock, Vertex, corporate proxy, and standard API scenarios.

OSS principle (made concrete)

The plugin should work for three populations without environment-specific config in the repo:

  1. Home user on personal API key — full Anthropic model list. No override file. Routing table works as-shipped.
  2. Corporate user on proxy with restricted models — dispatch fallback handles missing haiku transparently; optional probe makes the first call faster.
  3. AWS Bedrock / Vertex AI user — different ANTHROPIC_BASE_URL, different available IDs. Probe + override handles it the same way.

Same code, three environments, no environment-specific config in the repo.

Related work

Effort

Roughly the same scope as the CodeGraph integration work — 4-6 days of build by the same pattern, with ~3 of those in dispatch wiring + tests and the rest in docs and the diagnostic command.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions