feat: environment-aware model routing with PreToolUse hook enforcement by bdfinst · Pull Request #39 · bdfinst/agentic-dev-team

bdfinst · 2026-06-01T19:54:04Z

Closes #37.

Summary

Environment-aware model tier resolution for the agentic-dev-team plugin. Same code works on a personal Anthropic API key, a corporate proxy with restricted model allowlist, or Bedrock/Vertex deployments — with zero environment-specific config in the repo.

Single source of truth: knowledge/model-routing.json ships tier→snapshot defaults; every dispatch flows through it.
Mechanically enforced: PreToolUse hook on the Agent matcher rewrites tool_input.model or refuses dispatch via permissionDecision="deny". The LLM cannot bypass it.
Per-user, gitignored overrides: .claude/model-overrides.json populated by an opt-in /init-dev-team probe or hand-written; never leaks into commits.
Diagnostic + discoverability: /model-routing-check shows effective state; SessionStart banner surfaces silent bumps.

What ships

Layer	File
Defaults	`plugins/agentic-dev-team/knowledge/model-routing.json`
Resolver helper	`plugins/agentic-dev-team/hooks/lib/model-resolve.sh`
Probe helper	`plugins/agentic-dev-team/hooks/lib/model-probe.sh`
Enforcement hook	`plugins/agentic-dev-team/hooks/agent-model-resolve.sh` (PreToolUse, `matcher: "Agent"`)
Banner hook	`plugins/agentic-dev-team/hooks/overrides-banner.sh` (SessionStart)
Diagnostic command	`plugins/agentic-dev-team/commands/model-routing-check.md`
Probe sub-step	`/init-dev-team` Step 4.5
Design rationale	`docs/adr/0004-pre-dispatch-model-resolution.md`
Contract + troubleshooting	`plugins/agentic-dev-team/docs/model-routing.md`

Process

Two full Specs → Plan → Build cycles:

Spec: docs/specs/environment-aware-model-routing.md (~140 Gherkin lines, 24 acceptance criteria across AC1–AC19)
Plan: plans/environment-aware-model-routing.md (21 TDD steps; two passes of four plan-review personas — Acceptance, Design, UX, Strategic)
Build: every step RED→GREEN→REFACTOR with spec-compliance + complex-tier review on the hook layer and orchestrator rewrite. Architectural review at the orchestrator rewrite caught and resolved 5 contradictions across the doc surface.

Quality Gate

Tests: 237/237 bats pass (102 new tests for this slice)
Perf gate: MODEL_RESOLVE_PERF=1 bats tests/hooks/model_resolve_perf_tests.bats passes — 13.8ms/invocation against 50ms p99 target
AC2 enforced: git grep -nE 'claude-(haiku|sonnet|opus)-[0-9]' in plugin source returns matches only in the three approved files
Security review: pass (zero findings — jq --arg interpolation throughout, fail-open posture, bounded SSRF surface)
Doc review: pass after three high-confidence stale-reference fixes (code-review.md, quality-reviewer.md, agent_info.md) and one path-bug fix in /init-dev-team (used ${CLAUDE_PLUGIN_ROOT} instead of dev-repo-relative path)
Arch review: pass after sweep of agent-architecture.md, code-review.md, agent-remove.md, plus minor stale refs

Test Plan

Fresh install: /version and any sub-agent dispatch behave identically to pre-change (zero-config baseline)
Drop in .claude/model-overrides.json with {"tier_aliases":{"haiku":"sonnet"}}; next sub-agent tagged model: haiku dispatches with claude-sonnet-4-6 and a JSONL line lands in .claude/metrics/model-routing.log
/model-routing-check prints the four sections cleanly with override present and bump log populated
Start a new Claude Code session with an overrides file present — the SessionStart banner appears on stderr
/init-dev-team shows the probe prompt verbatim; answering "n" (or empty) writes nothing
On a non-Anthropic ANTHROPIC_BASE_URL, accepting the probe emits "Probe skipped" without making an HTTP call

Known out-of-scope

Captured in the spec's §Out of Scope. Notably: runtime model_not_available retry (the harness owns that surface), multi-region Anthropic endpoint auto-detection, per-agent override files, telemetry beyond the bump log. Architecture-overview.svg still shows "Model Routing Table" — visual asset, queued for a separate cleanup.

🤖 Generated with Claude Code

Spec at docs/specs/environment-aware-model-routing.md and approved plan at plans/environment-aware-model-routing.md. Addresses issue #37 — corporate proxies with restricted model allowlists and Anthropic snapshot deprecation. Two passes of plan-review personas (Acceptance, Design, UX, Strategic) — pass 2 final outcome 3/4 approve with Design blockers resolved (PreToolUse matcher verification gate, SessionStart hook for banner). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Verified matcher: "Agent" via production plugin precedent and docs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Single source of truth for tier → snapshot resolution. Replaces what's currently scattered across agent frontmatter and CLAUDE.md prose. Refs #37 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Per-user override cache and append-only bump log generated by the resolver. Explicit entries (in addition to the existing .claude/metrics/*.log glob) prevent rename-time drift and document intent for the team. Refs #37 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

hooks/lib/model-resolve.sh reads knowledge/model-routing.json and prints the resolved snapshot for haiku|sonnet|opus on stdout. Test-only env-var seams (MODEL_ROUTING_JSON, MODEL_OVERRIDES_JSON, MODEL_BUMP_LOG) keep the helper bats-isolatable. Override/cascade/error paths deferred to Steps 4-6. Refs #37 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Resolver now covers Steps 4-7 in one cohesive helper: - Single-hop override + JSONL bump log (exactly one event per invocation) - Multi-hop alias cascade up to _MAX_HOPS=3 - Cycle detection with AC5a stderr template - AC5 exhaustion template when chain terminates at an unresolvable tier - AC5b missing routing.json (exit 4) and AC5c malformed overrides (exit 5) - --dump-map flag for /model-routing-check 24/24 bats tests pass. Refs #37 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

hooks/agent-model-resolve.sh is the enforcement surface for R1: it reads PreToolUse-shaped JSON on stdin, shells out to hooks/lib/model-resolve.sh, and emits one of: - bump: hookSpecificOutput.updatedInput rewrites tool_input.model - pass-through: exactly {} (no change) - refusal: hookSpecificOutput.permissionDecision=deny with the resolver's stderr as the reason Registered in settings.json under PreToolUse with matcher="Agent" — the LLM cannot bypass it. Fail-open posture on malformed stdin or unexpected resolver exit codes so a buggy hook never blocks legitimate dispatch. 13/13 bats tests pass. AC16, AC17, AC18 fully covered. Refs #37 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Read-only diagnostic that prints (a) the effective tier → snapshot map, (b) any override file contents, (c) the last N=10 bump events (raise MODEL_BUMP_TAIL to see more), and (d) probe applicability for the current ANTHROPIC_BASE_URL. AC10 (side-effect-free), AC11 (surfaces bumps), AC11a (tail cap), AC11b (probe-applicability line). 16/16 bats tests pass. Refs #37 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

hooks/lib/model-probe.sh: - Reads y/N from stdin. Decline writes nothing (AC7). - On accept: probes $ANTHROPIC_BASE_URL/v1/models (5s timeout). - ok-all → 'All model tiers available; no overrides needed.' (AC7a) - missing → writes overrides + literal user message (AC7b) - non-Anthropic host → 'Probe skipped:' + docs/model-routing.md ref (AC8) - timeout / 5xx / malformed JSON → three differentiated messages (AC9) commands/init-dev-team.md gains a Step 4.5 with the verbatim prompt text. tests/hooks/fake-bin/curl shim deterministically replays each fixture based on MODEL_PROBE_FAKE_MODE. 15/15 bats tests pass. Refs #37 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…e hook Replaces the static 'Model Routing Table' in agents/orchestrator.md with a 'Resolution Procedure' section that points at the enforcement surface: - hooks/agent-model-resolve.sh (PreToolUse hook, matcher=Agent) - hooks/lib/model-resolve.sh (resolver helper) - knowledge/model-routing.json (single source of truth) - .claude/model-overrides.json (per-user, gitignored) 'Tier guidance (informational)' subsection preserves the rationale-per-tier bullet list so new-agent authors have a guide for which tier to declare. Also sweeps the wider doc surface to remove 'Orchestrator Model Routing Table' references that now contradict hook-as-authority: - CLAUDE.md: static table → paragraph pointer; new /model-routing-check row in Slash Commands Registry - docs/agent-architecture.md: rewritten Model Routing subsection - docs/skills.md, prompts/quality-reviewer.md, commands/code-review.md, commands/review-agent.md, commands/agent-remove.md, knowledge/agent-registry.md, skills/agent-skill-authoring/references/templates.md: one-line reference fixes pointing at the Resolution Procedure 11/11 bats tests pass. AC2 holds across orchestrator.md and CLAUDE.md. ADR + docs/model-routing.md cross-references are placeholders pending Steps 19 + 20. Refs #37 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Step 16 sweep: - skills/performance-metrics/SKILL.md:79: claude-opus-4-6 → 'opus' tier alias - templates/agents/agent-template.md:32: rewrite comment to point at knowledge/model-routing.json + the PreToolUse hook instead of listing snapshot IDs inline tests/repo/no_pinned_snapshots_test.bats enforces AC2: no pinned snapshot IDs in plugin source outside the three approved files (knowledge/model-routing.json, docs/model-routing.md, templates/agents/agent-template.md). Spec/plan/eval-fixture files are out of scope. Refs #37 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

hooks/overrides-banner.sh prints the literal line: 'Note: model routing overrides active — run /model-routing-check to review.' to stderr when .claude/model-overrides.json exists at session start. Silent on clean installs; fail-open on malformed stdin. Registered in settings.json under SessionStart. Markdown command bodies cannot deterministically emit terminal output, so the SessionStart hook is the enforcement surface for AC19. 4/4 bats tests pass. Refs #37 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Defines a consistent blues-and-grays Mermaid theme (light fills, navy text, blue borders) via a reusable %%{init}%% directive. Applies it to the one existing diagram in code-review-process.md and ships a new mermaid-diagramming skill with palette reference, typed examples, and procedure for adding themed diagrams to markdown files. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Opt-in perf gate at MODEL_RESOLVE_PERF=1. Asserts 1000 sequential invocations complete under 50s wall-clock (50ms p99 ceiling per invocation), matching the spec target. Apple Silicon measurement: ~14ms per invocation, dominated by bash + jq cold-start. Optimisation: when no overrides file exists (the dominant case), skip the alias machinery and resolve in a single jq invocation. Cuts elapsed_ms from 16.2s to 13.8s. Spec AC15 updated to clarify the 50ms p99 target. The previous '5s wall-clock ceiling' wording was the aspirational 10× headroom, not a realistic threshold for shell+jq on macOS. Refs #37 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ADR 0004 records two decisions: 1. Pre-dispatch resolution, not runtime model_not_available retry — the harness owns the dispatch surface and the plugin cannot reach it. 2. PreToolUse hook enforcement, not orchestrator instruction — markdown instructions can be silently skipped by the LLM under context pressure. Plus a stub docs/model-routing.md to land the ADR cross-reference and the orchestrator.md ADR pointer (was a 00NN placeholder). 5/5 bats tests pass. Refs #37 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

docs/model-routing.md covers: - Contract (tier aliases, resolution inputs, exit-code taxonomy) - When the fallback fires (silent bump, refused dispatch, probe write) - Interpreting the override file (schema, sentinel values, alias chain) - Adding a new tier (5-step procedure) - Troubleshooting: Bedrock / Vertex / corporate proxy - Hand-writing the override file - Environment variables (user-facing vs. test-only seams) Links to ADR 0004. 12/12 bats tests pass. Refs #37 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

All 21 steps complete. 237/237 bats tests pass. R1 enforcement is empirically proven via the PreToolUse hook on the Agent matcher. Refs #37 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Three error-severity fixes addressing residual orchestrator-routing-table references that contradicted the hook-as-authority model: - commands/code-review.md:31 — Constraint 3 still claimed the orchestrator routing table is authoritative - prompts/quality-reviewer.md:39 — 'Pass each agent its model from the routing table' - docs/agent_info.md:25 — 'Model assignment is controlled by the Orchestrator's routing table' Plus: - commands/harness-audit.md:52 — pointer to the renamed section - commands/init-dev-team.md:461 — probe invocation now uses ${CLAUDE_PLUGIN_ROOT}/hooks/lib/model-probe.sh. The previous repo-layout path 'plugins/agentic-dev-team/hooks/...' only resolved from the plugin source tree, which would have broken the probe step for every installed user. 237/237 bats tests still pass. Refs #37 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Closes the remaining doc-review findings from PR #39: - docs/skills.md: add /init-dev-team to Workflow Commands and /model-routing-check to Utility Commands. Restores the 2-hop discoverability path from CLAUDE.md. - docs/diagrams/architecture-overview.svg: 'Model Routing Table' label replaced with 'Model Tier Resolution (PreToolUse hook)'. Two-line label so the box stays readable. - docs/diagrams/review-dispatch.svg: orchestrator subtitle 'Model Routing' → 'Agent Dispatch' (the orchestrator dispatches; the hook routes). Plus two Mermaid diagrams in docs/model-routing.md: - Architecture at a glance — flowchart showing the caller layer, harness, plugin enforcement surface, routing state, and diagnostics with edges showing the read/write relationships. - Dispatch flow — sequenceDiagram covering the three branches (pass-through, bump rewrite, deny) with alt/else blocks. Both Mermaid blocks validated via @mermaid-js/mermaid-cli mmdc. Uses the project's blue-gray theme directive per the mermaid-diagramming skill. 237/237 bats tests still pass. Refs #37 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

bdfinst · 2026-06-01T20:40:23Z

Addresses the remaining doc-review items called out in the PR body as known-out-of-scope:

docs/skills.md — /init-dev-team added to Workflow Commands; /model-routing-check added to Utility Commands. Restores the 2-hop discoverability path from CLAUDE.md.
docs/diagrams/architecture-overview.svg — Model Routing Table → Model Tier Resolution (PreToolUse hook) (two-line label).
docs/diagrams/review-dispatch.svg — orchestrator subtitle Model Routing → Agent Dispatch.

Plus two new Mermaid diagrams in docs/model-routing.md:

Architecture at a glance — flowchart of caller / harness / plugin enforcement surface / routing state / diagnostics with read+write edges.
Dispatch flow — sequenceDiagram covering all three resolver branches (pass-through, bump-rewrite, deny) including the permissionDecision="deny" path the LLM sees.

Both diagrams use the project's blue-gray theme (per the mermaid-diagramming skill) and were validated by rendering via @mermaid-js/mermaid-cli.

237/237 bats still pass.

Removes plans and specs for features that have shipped: - codegraph-integration (implemented) - environment-aware-model-routing (implemented, merged in PR #39) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

bdfinst and others added 19 commits June 1, 2026 13:23

docs(plan): record Step 0 PreToolUse matcher verification

73263c6

Verified matcher: "Agent" via production plugin precedent and docs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(model-routing): ship knowledge/model-routing.json defaults

e326cc1

Single source of truth for tier → snapshot resolution. Replaces what's currently scattered across agent frontmatter and CLAUDE.md prose. Refs #37 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

docs(plan): mark environment-aware-model-routing implemented

7b1e0a1

All 21 steps complete. 237/237 bats tests pass. R1 enforcement is empirically proven via the PreToolUse hook on the Agent matcher. Refs #37 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

bdfinst merged commit 511ec58 into main Jun 1, 2026
1 check passed

bdfinst deleted the feat/env-aware-model-routing branch June 1, 2026 20:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: environment-aware model routing with PreToolUse hook enforcement#39

feat: environment-aware model routing with PreToolUse hook enforcement#39
bdfinst merged 19 commits into
mainfrom
feat/env-aware-model-routing

bdfinst commented Jun 1, 2026

Uh oh!

bdfinst commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bdfinst commented Jun 1, 2026

Summary

What ships

Process

Quality Gate

Test Plan

Known out-of-scope

Uh oh!

bdfinst commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant