bdfinst · bdfinst · Jun 1, 2026 · Jun 1, 2026 · Jun 1, 2026
diff --git a/plugins/agentic-dev-team/CLAUDE.md b/plugins/agentic-dev-team/CLAUDE.md
@@ -43,11 +43,11 @@ Full registry tables with token counts, model tiers, and used-by mappings are in
 
 **Review agents** (19): spec-compliance-review, a11y-review, arch-review, claude-setup-review, complexity-review, concurrency-review, doc-review, domain-review, js-fp-review, naming-review, performance-review, security-review, structure-review, svelte-review, test-review, token-efficiency-review, refactoring-review, progress-guardian, data-flow-tracer
 
-**Skills** (33): Context Loading Protocol, Context Summarization, Feedback & Learning, Human Oversight Protocol, Performance Metrics, Quality Gate Pipeline, Governance & Compliance, Agent & Skill Authoring, Hexagonal Architecture, Domain-Driven Design, Domain Analysis, Specs, Threat Modeling, API Design, Legacy Code, Mutation Testing, Test-Driven Development, Systematic Debugging, Design Doc, Branch Workflow, CI Debugging, Test Design Reviewer, Browser Testing, Competitive Analysis, Design Interrogation, Design It Twice, Static Analysis Integration, Feature File Validation, Docker Image Create, Docker Image Audit, Performance Benchmark, ADR Tools, Mermaid Diagramming
+**Skills** (34): Context Loading Protocol, Context Summarization, Feedback & Learning, Human Oversight Protocol, Performance Metrics, Quality Gate Pipeline, Governance & Compliance, Agent & Skill Authoring, Hexagonal Architecture, Domain-Driven Design, Domain Analysis, Specs, Threat Modeling, API Design, Legacy Code, Mutation Testing, Test-Driven Development, Systematic Debugging, Design Doc, Branch Workflow, CI Debugging, Test Design Reviewer, Browser Testing, Competitive Analysis, Design Interrogation, Design It Twice, Static Analysis Integration, Feature File Validation, Docker Image Create, Docker Image Audit, Performance Benchmark, ADR Tools, Mermaid Diagramming, Ubiquitous Language
 
 **Subagent prompt templates** (8): `prompts/implementer.md`, `prompts/spec-reviewer.md`, `prompts/quality-reviewer.md`, `prompts/plan-reviewer.md`, `prompts/plan-review-acceptance.md`, `prompts/plan-review-design.md`, `prompts/plan-review-ux.md`, `prompts/plan-review-strategic.md`
 
-**Knowledge files** (6): agent-registry, review-template, review-rubric, owasp-detection, domain-modeling, architecture-assessment
+**Knowledge files** (11): agent-registry, review-template, review-rubric, owasp-detection, domain-modeling, architecture-assessment, exploratory-testing-field-guide, adversarial-review-protocol, design-smells, object-calisthenics, testability-patterns
 
 **Agent templates** (9): ts-enforcer, esm-enforcer, react-testing, front-end-testing, twelve-factor-audit, python-quality, go-quality, csharp-quality, angular-testing (in `templates/agents/`, scaffolded by `/setup`)
 

diff --git a/plugins/agentic-dev-team/agents/arch-review.md b/plugins/agentic-dev-team/agents/arch-review.md
@@ -96,6 +96,10 @@ Grep for patterns that architecture documentation explicitly bans:
 - Direct `fetch`/`axios`/`HttpClient` calls outside designated HTTP adapter layer
 - Direct DB client calls outside designated repository layer
 
+## Self-Challenge
+
+After producing findings, run the adversarial challenge pass from `knowledge/adversarial-review-protocol.md` (arch-review challenge questions). Append confidence level (High/Medium/Low) to the `summary` field.
+
 ## Ignore
 
 Code style, naming conventions, test coverage, domain modeling correctness (handled by other agents)

diff --git a/plugins/agentic-dev-team/agents/complexity-review.md b/plugins/agentic-dev-team/agents/complexity-review.md
@@ -20,6 +20,10 @@ Confidence: high=threshold violation (function >N lines, nesting >N levels); med
 Model tier: small
 Context needs: full-file
 
+## Knowledge Files
+
+Read `knowledge/object-calisthenics.md` before analysis. The nine rules provide design pressure thresholds that complement the numeric limits below (especially rule 1: one indentation level, rule 7: small entities, rule 2: no else).
+
 ## Skip
 
 Return `{"status": "skip", "issues": [], "summary": "No code files in target"}` when:
@@ -61,6 +65,10 @@ Cognitive load:
 - Too many concepts per function
 - Non-obvious control flow
 
+## Self-Challenge
+
+After producing findings, run the adversarial challenge pass from `knowledge/adversarial-review-protocol.md`. Use the structure-review challenge questions (the nearest applicable section — no complexity-specific section exists). Append confidence level (High/Medium/Low) to the `summary` field.
+
 ## Ignore
 
 Domain modeling, naming, tests (handled by other agents)
diff --git a/plugins/agentic-dev-team/agents/domain-review.md b/plugins/agentic-dev-team/agents/domain-review.md
@@ -1,7 +1,7 @@
 ---
 name: domain-review
 description: Domain boundaries, abstraction leaks, business logic placement
-tools: Read, Grep, Glob
+tools: Read, Grep, Glob, Skill
 model: opus
 ---
 
@@ -81,6 +81,14 @@ Anemic domain model:
 - Entities or aggregates that are pure data holders (only getters/setters, no behavior) while all logic lives in services — suggest moving invariant enforcement and state transitions onto the entity
 - Entities that allow external callers to set internal state directly instead of through intention-revealing methods (e.g., setting a status field directly rather than calling a method like `markPaid()` or `Submit()`)
 
+## Skills
+
+- [Ubiquitous Language](../skills/ubiquitous-language/SKILL.md) — invoke when the user asks to "build the glossary", "extract domain terms", or "document the ubiquitous language". Also invoke when domain-review findings show pervasive terminology inconsistency (3+ different names for the same concept across the codebase).
+
+## Self-Challenge
+
+After producing findings, run the adversarial challenge pass from `knowledge/adversarial-review-protocol.md` (domain-review challenge questions). Append confidence level (High/Medium/Low) to the `summary` field.
+
 ## Ignore
 
 Code structure, naming style, tests (handled by other agents)
diff --git a/plugins/agentic-dev-team/agents/naming-review.md b/plugins/agentic-dev-team/agents/naming-review.md
@@ -20,6 +20,10 @@ Confidence: high=mechanical (add is/has prefix, extract magic value to constant)
 Model tier: small
 Context needs: diff-only
 
+## Knowledge Files
+
+Read the "Naming Offender Catalog" section of `knowledge/design-smells.md` before analysis. It contains: abbreviation anti-patterns with fix pairs, generic verb offenders, misleading name patterns, and type-encoded name examples — as well as the "What NOT to flag" list to avoid false positives.
+
 ## Skip
 
 Return `{"status": "skip", "issues": [], "summary": "No code files with nameable symbols"}` when:

diff --git a/plugins/agentic-dev-team/agents/security-review.md b/plugins/agentic-dev-team/agents/security-review.md
@@ -97,7 +97,7 @@ Return `{"status": "skip", "issues": [], "summary": "No source files with securi
 
 Every review run examines these file classes in addition to the primary source tree, because security-relevant content in them often escapes the `src/` tree walk:
 
-- CI/CD workflow files: `.github/workflows/*.{yml,yaml}`, `.gitlab-ci.yml`, `.gitlab/**/*.{yml,yaml}`, `.circleci/config.yml`, `azure-pipelines.yml`, `bitbucket-pipelines.yml`, `Jenkinsfile`, `jenkinsfile.d/**`. Check each for: `printenv` / `env | ` in `run:` blocks, `continue-on-error: true` on security-scanning steps, excessive `permissions:` (especially `contents: write` + `id-token: write` combined), hardcoded PAT / API-key patterns, `npm audit` / `pip audit` behind `continue-on-error`, auto-version commit steps with write permissions.
+- CI/CD workflow files: `.github/workflows/*.{yml,yaml}`, `.gitlab-ci.yml`, `.gitlab/**/*.{yml,yaml}`, `.circleci/config.yml`, `azure-pipelines.yml`, `bitbucket-pipelines.yml`, `Jenkinsfile`, `jenkinsfile.d/**`. Check each for: `printenv` / `env |` in `run:` blocks, `continue-on-error: true` on security-scanning steps, excessive `permissions:` (especially `contents: write` + `id-token: write` combined), hardcoded PAT / API-key patterns, `npm audit` / `pip audit` behind `continue-on-error`, auto-version commit steps with write permissions.
 - Dockerfiles: `Dockerfile`, `Dockerfile.*`, `*.dockerfile`. Check for: final-stage `USER` directive absent, unpinned base images (no `@sha256:` or `:<version>`), secrets COPYed from build context, `--trusted-host *` in pip invocations, apt-get / curl pipelines running as root.
 - Infrastructure manifests: `docker-compose*.yml`, `helm/**/*.yaml`, `k8s/**/*.yaml`, `terraform/**/*.tf`. Check for: hardcoded credentials, overly permissive RBAC, missing resource limits, missing NetworkPolicy, container security context (privileged, allowPrivilegeEscalation).
 
@@ -166,6 +166,10 @@ Input:
 - Insecure deserialization
 - Open redirects
 
+## Self-Challenge
+
+After producing findings, run the adversarial challenge pass from `knowledge/adversarial-review-protocol.md` (security-review challenge questions). Append confidence level (High/Medium/Low) to the `summary` field.
+
 ## Ignore
 
 Code style, naming, tests, complexity (handled by other agents)
diff --git a/plugins/agentic-dev-team/agents/spec-compliance-review.md b/plugins/agentic-dev-team/agents/spec-compliance-review.md
@@ -18,21 +18,25 @@ This agent answers one question: **does the code do what the spec says?** It run
 ## Detection Patterns
 
 ### Unmet acceptance criteria
+
 - Read acceptance criteria from the design doc and/or feature file
 - For each criterion, locate the implementation that satisfies it
 - For each criterion, locate the test that validates it
 - Flag criteria with no implementation or no test
 
 ### Uncovered scenarios
+
 - Read BDD scenarios from feature files
 - For each scenario, locate the corresponding test
 - Flag scenarios with no test or with a test that doesn't match the scenario steps
 
 ### Scope violations
+
 - Identify code changes not traceable to any acceptance criterion
 - Flag unrequested features, refactoring, or behavior changes beyond spec
 
 ### Plan deviation
+
 - Compare the implementation to the plan's file-change list
 - Flag files modified that aren't in the plan (unless trivially related)
 - Flag planned changes that weren't made
@@ -61,6 +65,13 @@ This agent answers one question: **does the code do what the spec says?** It run
 }
 ```
 
+## Skip
+
+Return `{"status": "skip", "issues": [], "summary": "No spec artifacts found"}` when:
+
+- No plan file, feature file, design doc, or acceptance criteria can be located for the target
+- Target is a standalone script or utility with no associated specification
+
 ## Severity Rules
 
 - Unmet acceptance criterion → `error` (always)

diff --git a/plugins/agentic-dev-team/agents/structure-review.md b/plugins/agentic-dev-team/agents/structure-review.md
@@ -20,6 +20,10 @@ Confidence: high=mechanical extraction (duplicate block → shared function); me
 Model tier: mid
 Context needs: full-file
 
+## Knowledge Files
+
+Read `knowledge/design-smells.md` and `knowledge/object-calisthenics.md` before analysis.
+
 ## Skip
 
 Return `{"status": "skip", "issues": [], "summary": "No multi-module code to analyze"}` when:
@@ -61,6 +65,15 @@ Organization:
   images, fonts) shipped in projects that serve only JSON/XML API
   responses with no UI
 
+Design smells:
+
+- For SRP violations and coupling issues, map to the smell → pattern table in `knowledge/design-smells.md`. Every finding should name the smell, quote the code, and include a refactor sketch.
+- For method-level issues (nesting, long methods, flag arguments), check Object Calisthenics rules 1-2 and 7 in `knowledge/object-calisthenics.md`.
+
+## Self-Challenge
+
+After producing findings, run the adversarial challenge pass from `knowledge/adversarial-review-protocol.md` (structure-review challenge questions). Append confidence level (High/Medium/Low) to the `summary` field.
+
 ## Ignore
 
 Test quality, naming, domain modeling (handled by other agents)
diff --git a/plugins/agentic-dev-team/agents/test-review.md b/plugins/agentic-dev-team/agents/test-review.md
@@ -20,6 +20,10 @@ Confidence: high=mechanical fix (add missing await, stub clock, extract constant
 Model tier: mid
 Context needs: full-file
 
+## Knowledge Files
+
+Read `knowledge/testability-patterns.md` before analysis. When flagging untestable code (missing interfaces, static factories, concrete class coupling), use the decision flow and anti-patterns table to specify the required production code change — never recommend a test workaround (reflection, InternalsVisibleTo, mocking concrete classes).
+
 ## Skills
 
 - [Feature File Validation](../skills/feature-file-validation/SKILL.md) - invoke when `.feature` files or step definition files are in the target; validates Gherkin quality, determinism, implementation independence, and test automation coverage
@@ -82,6 +86,16 @@ Test code quality:
 - Magic literal values in assertions with no explanation of their significance
 - Dead test utilities or helpers that are defined but never called
 
+Testability blockers:
+
+- Code under test that cannot be constructed with known values (static factories, singletons, no injectable constructor) — flag as error; per `knowledge/testability-patterns.md`, the production code must change, not the test approach
+- Mocking of concrete classes (not interfaces) — flag as warning; extract an interface for the dependency
+- Tests using reflection into private members as primary strategy — flag as warning; the public API surface needs expanding
+
+## Self-Challenge
+
+After producing findings, run the adversarial challenge pass from `knowledge/adversarial-review-protocol.md` (test-review challenge questions). Append confidence level (High/Medium/Low) to the `summary` field.
+
 ## Ignore
 
 Code style, naming conventions (handled by other agents)

diff --git a/plugins/agentic-dev-team/commands/build.md b/plugins/agentic-dev-team/commands/build.md
@@ -45,11 +45,13 @@ Read the plan file. If the status is not `approved`, ask the user: "This plan ha
 Before implementation begins, dispatch a spec-compliance-review subagent in **criteria verification mode** (see `prompts/spec-reviewer.md` § Pre-build criteria verification mode). Pass the plan's acceptance criteria and per-step test expectations.
 
 The reviewer evaluates each criterion for:
+
 - **Specificity**: Could two developers independently verify this criterion and agree on pass/fail?
 - **Testability**: Can this criterion be validated with a test or observable output?
 - **Completeness**: Are edge cases and error conditions addressed?
 
 If any criteria are flagged:
+
 1. Present the findings to the user with the reviewer's suggested improvements
 2. Ask: "Revise these criteria before building, or proceed anyway?"
 3. If the user overrides, log the override in the build output and continue
@@ -68,7 +70,11 @@ For each step in the plan, dispatch implementation following the implementer tem
    - **complex**: Run `/review-agent spec-compliance-review`, then the full quality agent suite including opus-tier agents (security-review, domain-review, arch-review). Same review-fix loop applies.
    - If no complexity is specified, default to **standard**.
    - **UI changes (any complexity)**: After quality review passes, run browser verification via `/browse` in automated smoke test mode. Skip with warning if the dev server is not running. See `agents/orchestrator.md` Stage 3.
-5. **Mark step done** — Update the plan file: check off the step's acceptance criteria, set the step as completed.
+5. **Mark step done** — Use the Edit tool to update the plan file's `## Build Progress` section on disk:
+   - Change `- [ ] Step N: <title>` to `- [x] Step N: <title>` for the completed step.
+   - For each acceptance criterion verified by this step, change `- [ ]` to `- [x]` in the Build Progress `### Acceptance Criteria` subsection.
+   - After all steps are `[x]`, change `**Status**: approved` to `**Status**: in-progress`.
+   - This disk write is the durable commit. If a `/clear` occurs, `/continue` reads `## Build Progress` to determine the resume point without needing conversation history.
 
 ### 5. Run full test suite
 
@@ -80,11 +86,12 @@ Run `/code-review` against all files modified during the build.
 
 ### 7. Update plan status
 
-Update the plan status to `implemented`. Briefly confirm completion and direct the user to `/pr`.
+Use the Edit tool to change `**Status**: in-progress` to `**Status**: implemented` in the plan file. Briefly confirm completion and direct the user to `/pr`.
 
 ## Escalation
 
 Stop and ask the user when:
+
 - A test fails for an unexpected reason after 3 attempts
 - The plan requires architectural decisions not covered by the plan
 - A review checkpoint fails after 2 correction iterations

diff --git a/plugins/agentic-dev-team/commands/plan.md b/plugins/agentic-dev-team/commands/plan.md
@@ -108,11 +108,26 @@ When in doubt, classify up (standard rather than trivial, complex rather than st
 ## Risks & Open Questions
 
 - <Risk or question, with mitigation or who should answer>
+
+## Build Progress
+
+This section is the machine-parseable recovery handle. `/build` updates checkboxes here via Edit tool so progress survives a `/clear` or session restart. `/continue` reads this section to determine the resume point.
+
+### Steps
+
+- [ ] Step 1: <title>
+- [ ] Step 2: <title>
+
+### Acceptance Criteria
+
+- [ ] <Criterion 1 — mirrors the Acceptance Criteria section above>
+- [ ] <Criterion 2>
+- [ ] <Criterion 3>
 ```
 
 ### 4. Create the plans directory
 
-Create `plans/` if it doesn't exist.
+Create `plans/` if it doesn't exist. When writing the plan file, populate the `## Build Progress` section by copying step titles from `## Steps` and criteria from `## Acceptance Criteria`. These are the checkboxes `/build` will update on disk as each step completes.
 
 ### 5. Run plan review personas