Skip to content

feat: environment-aware model routing with PreToolUse hook enforcement#39

Merged
bdfinst merged 19 commits into
mainfrom
feat/env-aware-model-routing
Jun 1, 2026
Merged

feat: environment-aware model routing with PreToolUse hook enforcement#39
bdfinst merged 19 commits into
mainfrom
feat/env-aware-model-routing

Conversation

@bdfinst
Copy link
Copy Markdown
Owner

@bdfinst bdfinst commented Jun 1, 2026

Closes #37.

Summary

Environment-aware model tier resolution for the agentic-dev-team plugin. Same code works on a personal Anthropic API key, a corporate proxy with restricted model allowlist, or Bedrock/Vertex deployments — with zero environment-specific config in the repo.

  • Single source of truth: knowledge/model-routing.json ships tier→snapshot defaults; every dispatch flows through it.
  • Mechanically enforced: PreToolUse hook on the Agent matcher rewrites tool_input.model or refuses dispatch via permissionDecision="deny". The LLM cannot bypass it.
  • Per-user, gitignored overrides: .claude/model-overrides.json populated by an opt-in /init-dev-team probe or hand-written; never leaks into commits.
  • Diagnostic + discoverability: /model-routing-check shows effective state; SessionStart banner surfaces silent bumps.

What ships

Layer File
Defaults plugins/agentic-dev-team/knowledge/model-routing.json
Resolver helper plugins/agentic-dev-team/hooks/lib/model-resolve.sh
Probe helper plugins/agentic-dev-team/hooks/lib/model-probe.sh
Enforcement hook plugins/agentic-dev-team/hooks/agent-model-resolve.sh (PreToolUse, matcher: "Agent")
Banner hook plugins/agentic-dev-team/hooks/overrides-banner.sh (SessionStart)
Diagnostic command plugins/agentic-dev-team/commands/model-routing-check.md
Probe sub-step /init-dev-team Step 4.5
Design rationale docs/adr/0004-pre-dispatch-model-resolution.md
Contract + troubleshooting plugins/agentic-dev-team/docs/model-routing.md

Process

Two full Specs → Plan → Build cycles:

  • Spec: docs/specs/environment-aware-model-routing.md (~140 Gherkin lines, 24 acceptance criteria across AC1–AC19)
  • Plan: plans/environment-aware-model-routing.md (21 TDD steps; two passes of four plan-review personas — Acceptance, Design, UX, Strategic)
  • Build: every step RED→GREEN→REFACTOR with spec-compliance + complex-tier review on the hook layer and orchestrator rewrite. Architectural review at the orchestrator rewrite caught and resolved 5 contradictions across the doc surface.

Quality Gate

  • Tests: 237/237 bats pass (102 new tests for this slice)
  • Perf gate: MODEL_RESOLVE_PERF=1 bats tests/hooks/model_resolve_perf_tests.bats passes — 13.8ms/invocation against 50ms p99 target
  • AC2 enforced: git grep -nE 'claude-(haiku|sonnet|opus)-[0-9]' in plugin source returns matches only in the three approved files
  • Security review: pass (zero findings — jq --arg interpolation throughout, fail-open posture, bounded SSRF surface)
  • Doc review: pass after three high-confidence stale-reference fixes (code-review.md, quality-reviewer.md, agent_info.md) and one path-bug fix in /init-dev-team (used ${CLAUDE_PLUGIN_ROOT} instead of dev-repo-relative path)
  • Arch review: pass after sweep of agent-architecture.md, code-review.md, agent-remove.md, plus minor stale refs

Test Plan

  • Fresh install: /version and any sub-agent dispatch behave identically to pre-change (zero-config baseline)
  • Drop in .claude/model-overrides.json with {"tier_aliases":{"haiku":"sonnet"}}; next sub-agent tagged model: haiku dispatches with claude-sonnet-4-6 and a JSONL line lands in .claude/metrics/model-routing.log
  • /model-routing-check prints the four sections cleanly with override present and bump log populated
  • Start a new Claude Code session with an overrides file present — the SessionStart banner appears on stderr
  • /init-dev-team shows the probe prompt verbatim; answering "n" (or empty) writes nothing
  • On a non-Anthropic ANTHROPIC_BASE_URL, accepting the probe emits "Probe skipped" without making an HTTP call

Known out-of-scope

Captured in the spec's §Out of Scope. Notably: runtime model_not_available retry (the harness owns that surface), multi-region Anthropic endpoint auto-detection, per-agent override files, telemetry beyond the bump log. Architecture-overview.svg still shows "Model Routing Table" — visual asset, queued for a separate cleanup.

🤖 Generated with Claude Code

bdfinst and others added 19 commits June 1, 2026 13:23
Spec at docs/specs/environment-aware-model-routing.md and approved plan at
plans/environment-aware-model-routing.md. Addresses issue #37 — corporate
proxies with restricted model allowlists and Anthropic snapshot deprecation.

Two passes of plan-review personas (Acceptance, Design, UX, Strategic) —
pass 2 final outcome 3/4 approve with Design blockers resolved (PreToolUse
matcher verification gate, SessionStart hook for banner).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Verified matcher: "Agent" via production plugin precedent and docs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Single source of truth for tier → snapshot resolution. Replaces what's
currently scattered across agent frontmatter and CLAUDE.md prose.

Refs #37

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per-user override cache and append-only bump log generated by the resolver.
Explicit entries (in addition to the existing .claude/metrics/*.log glob)
prevent rename-time drift and document intent for the team.

Refs #37

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
hooks/lib/model-resolve.sh reads knowledge/model-routing.json and prints
the resolved snapshot for haiku|sonnet|opus on stdout. Test-only env-var
seams (MODEL_ROUTING_JSON, MODEL_OVERRIDES_JSON, MODEL_BUMP_LOG) keep the
helper bats-isolatable. Override/cascade/error paths deferred to Steps 4-6.

Refs #37

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Resolver now covers Steps 4-7 in one cohesive helper:
- Single-hop override + JSONL bump log (exactly one event per invocation)
- Multi-hop alias cascade up to _MAX_HOPS=3
- Cycle detection with AC5a stderr template
- AC5 exhaustion template when chain terminates at an unresolvable tier
- AC5b missing routing.json (exit 4) and AC5c malformed overrides (exit 5)
- --dump-map flag for /model-routing-check

24/24 bats tests pass.

Refs #37

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
hooks/agent-model-resolve.sh is the enforcement surface for R1: it reads
PreToolUse-shaped JSON on stdin, shells out to hooks/lib/model-resolve.sh,
and emits one of:
  - bump:          hookSpecificOutput.updatedInput rewrites tool_input.model
  - pass-through:  exactly {} (no change)
  - refusal:       hookSpecificOutput.permissionDecision=deny with the
                   resolver's stderr as the reason

Registered in settings.json under PreToolUse with matcher="Agent" — the
LLM cannot bypass it. Fail-open posture on malformed stdin or unexpected
resolver exit codes so a buggy hook never blocks legitimate dispatch.

13/13 bats tests pass. AC16, AC17, AC18 fully covered.

Refs #37

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Read-only diagnostic that prints (a) the effective tier → snapshot map,
(b) any override file contents, (c) the last N=10 bump events (raise
MODEL_BUMP_TAIL to see more), and (d) probe applicability for the
current ANTHROPIC_BASE_URL.

AC10 (side-effect-free), AC11 (surfaces bumps), AC11a (tail cap),
AC11b (probe-applicability line). 16/16 bats tests pass.

Refs #37

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
hooks/lib/model-probe.sh:
- Reads y/N from stdin. Decline writes nothing (AC7).
- On accept: probes $ANTHROPIC_BASE_URL/v1/models (5s timeout).
- ok-all  → 'All model tiers available; no overrides needed.' (AC7a)
- missing → writes overrides + literal user message (AC7b)
- non-Anthropic host → 'Probe skipped:' + docs/model-routing.md ref (AC8)
- timeout / 5xx / malformed JSON → three differentiated messages (AC9)

commands/init-dev-team.md gains a Step 4.5 with the verbatim prompt text.

tests/hooks/fake-bin/curl shim deterministically replays each fixture
based on MODEL_PROBE_FAKE_MODE. 15/15 bats tests pass.

Refs #37

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e hook

Replaces the static 'Model Routing Table' in agents/orchestrator.md with a
'Resolution Procedure' section that points at the enforcement surface:
- hooks/agent-model-resolve.sh (PreToolUse hook, matcher=Agent)
- hooks/lib/model-resolve.sh (resolver helper)
- knowledge/model-routing.json (single source of truth)
- .claude/model-overrides.json (per-user, gitignored)

'Tier guidance (informational)' subsection preserves the rationale-per-tier
bullet list so new-agent authors have a guide for which tier to declare.

Also sweeps the wider doc surface to remove 'Orchestrator Model Routing
Table' references that now contradict hook-as-authority:
- CLAUDE.md: static table → paragraph pointer; new /model-routing-check
  row in Slash Commands Registry
- docs/agent-architecture.md: rewritten Model Routing subsection
- docs/skills.md, prompts/quality-reviewer.md, commands/code-review.md,
  commands/review-agent.md, commands/agent-remove.md,
  knowledge/agent-registry.md, skills/agent-skill-authoring/references/templates.md:
  one-line reference fixes pointing at the Resolution Procedure

11/11 bats tests pass. AC2 holds across orchestrator.md and CLAUDE.md.
ADR + docs/model-routing.md cross-references are placeholders pending
Steps 19 + 20.

Refs #37

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Step 16 sweep:
- skills/performance-metrics/SKILL.md:79: claude-opus-4-6 → 'opus' tier alias
- templates/agents/agent-template.md:32: rewrite comment to point at
  knowledge/model-routing.json + the PreToolUse hook instead of listing
  snapshot IDs inline

tests/repo/no_pinned_snapshots_test.bats enforces AC2: no pinned
snapshot IDs in plugin source outside the three approved files
(knowledge/model-routing.json, docs/model-routing.md,
templates/agents/agent-template.md). Spec/plan/eval-fixture files are
out of scope.

Refs #37

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
hooks/overrides-banner.sh prints the literal line:
  'Note: model routing overrides active — run /model-routing-check to review.'
to stderr when .claude/model-overrides.json exists at session start.
Silent on clean installs; fail-open on malformed stdin.

Registered in settings.json under SessionStart. Markdown command bodies
cannot deterministically emit terminal output, so the SessionStart hook
is the enforcement surface for AC19.

4/4 bats tests pass.

Refs #37

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Defines a consistent blues-and-grays Mermaid theme (light fills, navy
text, blue borders) via a reusable %%{init}%% directive. Applies it to
the one existing diagram in code-review-process.md and ships a new
mermaid-diagramming skill with palette reference, typed examples, and
procedure for adding themed diagrams to markdown files.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Opt-in perf gate at MODEL_RESOLVE_PERF=1. Asserts 1000 sequential
invocations complete under 50s wall-clock (50ms p99 ceiling per
invocation), matching the spec target. Apple Silicon measurement:
~14ms per invocation, dominated by bash + jq cold-start.

Optimisation: when no overrides file exists (the dominant case), skip
the alias machinery and resolve in a single jq invocation. Cuts
elapsed_ms from 16.2s to 13.8s.

Spec AC15 updated to clarify the 50ms p99 target. The previous '5s
wall-clock ceiling' wording was the aspirational 10× headroom, not a
realistic threshold for shell+jq on macOS.

Refs #37

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ADR 0004 records two decisions:
1. Pre-dispatch resolution, not runtime model_not_available retry —
   the harness owns the dispatch surface and the plugin cannot reach it.
2. PreToolUse hook enforcement, not orchestrator instruction — markdown
   instructions can be silently skipped by the LLM under context pressure.

Plus a stub docs/model-routing.md to land the ADR cross-reference and
the orchestrator.md ADR pointer (was a 00NN placeholder).

5/5 bats tests pass.

Refs #37

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
docs/model-routing.md covers:
- Contract (tier aliases, resolution inputs, exit-code taxonomy)
- When the fallback fires (silent bump, refused dispatch, probe write)
- Interpreting the override file (schema, sentinel values, alias chain)
- Adding a new tier (5-step procedure)
- Troubleshooting: Bedrock / Vertex / corporate proxy
- Hand-writing the override file
- Environment variables (user-facing vs. test-only seams)

Links to ADR 0004. 12/12 bats tests pass.

Refs #37

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
All 21 steps complete. 237/237 bats tests pass. R1 enforcement is
empirically proven via the PreToolUse hook on the Agent matcher.

Refs #37

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three error-severity fixes addressing residual orchestrator-routing-table
references that contradicted the hook-as-authority model:
- commands/code-review.md:31 — Constraint 3 still claimed the orchestrator
  routing table is authoritative
- prompts/quality-reviewer.md:39 — 'Pass each agent its model from the
  routing table'
- docs/agent_info.md:25 — 'Model assignment is controlled by the
  Orchestrator's routing table'

Plus:
- commands/harness-audit.md:52 — pointer to the renamed section
- commands/init-dev-team.md:461 — probe invocation now uses
  ${CLAUDE_PLUGIN_ROOT}/hooks/lib/model-probe.sh. The previous
  repo-layout path 'plugins/agentic-dev-team/hooks/...' only resolved
  from the plugin source tree, which would have broken the probe step
  for every installed user.

237/237 bats tests still pass.

Refs #37

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Closes the remaining doc-review findings from PR #39:

- docs/skills.md: add /init-dev-team to Workflow Commands and
  /model-routing-check to Utility Commands. Restores the 2-hop
  discoverability path from CLAUDE.md.
- docs/diagrams/architecture-overview.svg: 'Model Routing Table' label
  replaced with 'Model Tier Resolution (PreToolUse hook)'. Two-line
  label so the box stays readable.
- docs/diagrams/review-dispatch.svg: orchestrator subtitle 'Model
  Routing' → 'Agent Dispatch' (the orchestrator dispatches; the hook
  routes).

Plus two Mermaid diagrams in docs/model-routing.md:

- Architecture at a glance — flowchart showing the caller layer,
  harness, plugin enforcement surface, routing state, and diagnostics
  with edges showing the read/write relationships.
- Dispatch flow — sequenceDiagram covering the three branches
  (pass-through, bump rewrite, deny) with alt/else blocks.

Both Mermaid blocks validated via @mermaid-js/mermaid-cli mmdc.
Uses the project's blue-gray theme directive per the mermaid-diagramming
skill.

237/237 bats tests still pass.

Refs #37

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@bdfinst
Copy link
Copy Markdown
Owner Author

bdfinst commented Jun 1, 2026

Addresses the remaining doc-review items called out in the PR body as known-out-of-scope:

  • docs/skills.md/init-dev-team added to Workflow Commands; /model-routing-check added to Utility Commands. Restores the 2-hop discoverability path from CLAUDE.md.
  • docs/diagrams/architecture-overview.svgModel Routing TableModel Tier Resolution (PreToolUse hook) (two-line label).
  • docs/diagrams/review-dispatch.svg — orchestrator subtitle Model RoutingAgent Dispatch.

Plus two new Mermaid diagrams in docs/model-routing.md:

  1. Architecture at a glance — flowchart of caller / harness / plugin enforcement surface / routing state / diagnostics with read+write edges.
  2. Dispatch flow — sequenceDiagram covering all three resolver branches (pass-through, bump-rewrite, deny) including the permissionDecision="deny" path the LLM sees.

Both diagrams use the project's blue-gray theme (per the mermaid-diagramming skill) and were validated by rendering via @mermaid-js/mermaid-cli.

237/237 bats still pass.

@bdfinst bdfinst merged commit 511ec58 into main Jun 1, 2026
1 check passed
@bdfinst bdfinst deleted the feat/env-aware-model-routing branch June 1, 2026 20:49
bdfinst added a commit that referenced this pull request Jun 1, 2026
Removes plans and specs for features that have shipped:
- codegraph-integration (implemented)
- environment-aware-model-routing (implemented, merged in PR #39)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: environment-aware model routing with fallback for restricted environments

1 participant