Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
7ea2c49
docs: spec + plan for environment-aware model routing
bdfinst Jun 1, 2026
73263c6
docs(plan): record Step 0 PreToolUse matcher verification
bdfinst Jun 1, 2026
e326cc1
feat(model-routing): ship knowledge/model-routing.json defaults
bdfinst Jun 1, 2026
b9b33f5
chore: gitignore .claude/model-overrides.json and routing bump log
bdfinst Jun 1, 2026
7affb2b
feat(model-resolve): happy-path tier→snapshot resolution
bdfinst Jun 1, 2026
3557378
feat(model-resolve): overrides, cascade, cycle, exhaustion, dump-map
bdfinst Jun 1, 2026
ff937a2
feat(hooks): PreToolUse Agent hook enforces pre-dispatch resolution
bdfinst Jun 1, 2026
5aa05fc
feat(commands): add /model-routing-check diagnostic
bdfinst Jun 1, 2026
d3fc9ec
feat(init-dev-team): opt-in probe of /v1/models with three failure modes
bdfinst Jun 1, 2026
66bca9f
refactor(orchestrator): relocate model routing authority to PreToolUs…
bdfinst Jun 1, 2026
1d5f133
chore: remove pinned snapshot IDs outside routing.json
bdfinst Jun 1, 2026
9150c43
feat(ux): SessionStart hook surfaces routing overrides banner
bdfinst Jun 1, 2026
f909895
feat(skills): add mermaid-diagramming skill with blue-gray theme
bdfinst Jun 1, 2026
069cfb6
feat(model-resolve): perf gate + happy-path fast-path
bdfinst Jun 1, 2026
aa52c37
docs(adr): pre-dispatch model resolution + hook enforcement decisions
bdfinst Jun 1, 2026
dc4bd03
docs: model routing contract and troubleshooting guide
bdfinst Jun 1, 2026
7b1e0a1
docs(plan): mark environment-aware-model-routing implemented
bdfinst Jun 1, 2026
0701731
docs: complete the hook-as-authority sweep + fix probe invocation path
bdfinst Jun 1, 2026
2ab3725
docs: complete the routing-doc cleanup + add architecture diagrams
bdfinst Jun 1, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@ node_modules/
# Performance logs (generated at runtime)
.claude/metrics/*.json
.claude/metrics/*.log

# Environment-aware model routing (per-user, never checked in)
# - override cache: generated by /init-dev-team probe or hand-written by users behind restricted endpoints
# - bump log: append-only JSONL recording tier bumps from the resolver
.claude/model-overrides.json
.claude/metrics/model-routing.log
UPDATE.md

sync-to-aci.sh
Expand Down
129 changes: 129 additions & 0 deletions docs/adr/0004-pre-dispatch-model-resolution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# 4. Pre-dispatch model tier resolution enforced by a PreToolUse hook

Date: 2026-06-01

## Status

Accepted

## Context

The plugin's agents declare a tier alias (`haiku`, `sonnet`, `opus`) in their
`model:` frontmatter. The Claude Code harness resolves each alias to a fixed
snapshot ID such as `claude-haiku-4-5-20251001`. This works on a personal
Anthropic API key but fails for two real user populations:

1. **Corporate proxies with restricted model allowlists** — `haiku` may not
be reachable, and the dispatch fails opaquely.
2. **Anthropic snapshot deprecation** — pinned snapshot IDs scattered across
agent frontmatter, CLAUDE.md, and the orchestrator's routing table all
need updates whenever Anthropic retires a snapshot.

Two related design questions had non-obvious answers and shape this ADR.

### Q1. Pre-dispatch resolution vs. runtime `model_not_available` retry

The original issue (#37) suggested a "dispatch-side fallback" that catches
`model_not_available` at runtime and retries up the haiku → sonnet → opus
chain. Plan-review (pass 1) rejected this:

- The harness owns the dispatch surface. The plugin runs as hooks and
markdown commands — neither can intercept the harness's `Agent` tool
error path. There is no `OnToolError` hook contract; `PostToolUse` fires
on completion, not on failure-with-retry.
- A "transparent" runtime fallback would have to live inside the harness
itself, which the plugin cannot modify.

Two alternatives remained:

- **Pre-dispatch resolution.** Resolve the tier alias to a concrete snapshot
*before* the `Agent` tool is invoked. Walks the cascade based on a
per-user override cache populated either by a probe or by hand.
- **Document-only.** Ship clearer error messages and recovery docs, but no
automatic resolution.

### Q2. Where does pre-dispatch resolution live?

Two locations were considered:

- **Orchestrator markdown instruction.** Add a "Resolution Procedure"
section to `agents/orchestrator.md` that instructs the LLM to shell out
to a resolver helper before each `Agent` call.
- **PreToolUse hook on the `Agent` matcher.** Register a hook in
`settings.json` that intercepts every `Agent` tool call, reads
`tool_input.model`, shells out to the resolver, and rewrites the model
field (or refuses dispatch) via `hookSpecificOutput`.

Plan-review (pass 2) flagged the markdown-instruction approach as a
re-statement of the R1 problem the change was meant to solve: a model
under context pressure may skip the procedure, and the override silently
becomes a no-op.

## Decision

1. **Pre-dispatch resolution, not runtime retry.** Resolution happens
before any `Agent` tool call reaches the harness. The plugin does not
attempt to catch `model_not_available` at runtime — that error surface
belongs to the harness, which the plugin cannot reach. When the
resolver cannot satisfy a request (exhausted cascade, cycle, missing
routing.json, malformed overrides), the hook refuses dispatch with
`permissionDecision: "deny"` and an actionable
`permissionDecisionReason`.

2. **Enforce via a PreToolUse hook, not orchestrator instruction.**
`hooks/agent-model-resolve.sh` is registered in `settings.json` under
`matcher: "Agent"`. It runs on every `Agent` tool call regardless of
what the LLM is doing or whether the LLM remembers the procedure. The
orchestrator markdown becomes documentation of behavior, not the
enforcement surface.

3. **Single source of truth in `knowledge/model-routing.json`.** All tier
→ snapshot mappings ship in one shipped JSON file. Per-user overrides
live in `.claude/model-overrides.json` (gitignored, populated by an
opt-in probe in `/init-dev-team` or hand-written for restricted
endpoints).

The resolver helper `hooks/lib/model-resolve.sh` is the single bash + jq
implementation called by both the PreToolUse hook and the
`/model-routing-check` diagnostic command.

## Consequences

**Positive.**

- Restricted-endpoint users get transparent tier resolution with no manual
intervention — the same plugin code works on a personal Anthropic key, a
corporate proxy, or Bedrock/Vertex.
- Future snapshot deprecations are a one-file change
(`knowledge/model-routing.json`).
- Enforcement is mechanical; the LLM cannot bypass it under context
pressure.
- The resolver is bats-testable in isolation via env-var path overrides
(`MODEL_ROUTING_JSON`, `MODEL_OVERRIDES_JSON`, `MODEL_BUMP_LOG`).

**Negative.**

- Adds bash + jq cold-start overhead to every sub-agent dispatch
(~10-15ms on Apple Silicon). The AC15 perf gate caps this at 50ms p99.
- The PreToolUse hook contract for `updatedInput` is not fully spelled out
in the public Claude Code docs (the relevant section was truncated when
we fetched it during Step 0 verification). We rely on the established
pattern used by two other production plugins
(`agentic-security-assessment`, `agentic-security-review`).
- The harness's runtime `model_not_available` path is still uncaught — if
routing.json points at a snapshot the harness can't reach AND no
override redirects the tier, the dispatch still fails. We accept this
because: (a) plugin defaults track current Anthropic snapshots, and
(b) corporate-proxy users have the override-cache escape hatch.

**Out of scope (recorded as future work).**

- Runtime retry inside the harness — would require harness changes
outside the plugin's control.
- Probe support for non-Anthropic-shape endpoints (Bedrock, Vertex).
Users on those endpoints write `.claude/model-overrides.json` by hand;
documented in `docs/model-routing.md`.

See: `agents/orchestrator.md` → Resolution Procedure for the algorithm,
`docs/model-routing.md` for the contract and troubleshooting,
`commands/model-routing-check.md` for the diagnostic.
Loading
Loading