Rewrite /deepen-plan with context-managed map-reduce (v3)#178
Rewrite /deepen-plan with context-managed map-reduce (v3)#178Drewx-Design wants to merge 6 commits intoEveryInc:mainfrom
Conversation
…re (v3) Replace unbounded v1 agent output with phased file-based map-reduce pattern that keeps parent context under ~12k tokens. Adds plan manifest analysis, validation, judge phase with source attribution priority, and preservation checking. Aligns with plugin ecosystem conventions.
…mplementing The compound-docs skill already has a validated YAML schema and 7-step process. Instead of reimplementing it inside deepen-plan, offer the user the option to run /workflows:compound themselves.
The deepen-plan command deepens decisions but never challenges them. Real reviewer feedback showed it misses redundant tool params, YAGNI violations built despite being flagged, and misplaced business logic. Adds two new always-run agents: - agent-native-architecture-reviewer: routes to skill checklist, anti-patterns, and reference files (not generic prompt) - project-architecture-challenger: reads CLAUDE.md and challenges every decision against project-specific principles Also injects PROJECT ARCHITECTURE CONTEXT into all review/research agent prompts so they evaluate against project conventions.
…ents Validated across two real-world pipeline runs. Key changes: - Batched agent launches (max 4 pending) to prevent context overflow crashes - 200-char return cap on agent messages (all analysis in JSON files) - Version grounding: lockfile > package.json > plan text priority - Per-section judge parallelization (~21 min -> ~8-10 min) - Two-part output: Decision Record (reviewers) + Implementation Spec (developers) - Quality review phase (CoVe pattern) catches self-contradictions and code gaps - Enhancer resolves conditionals, verifies API versions, checks accessibility - fast_follow classification bucket for ticketable items - Convergence signals with [Strong Signal] markers - Task() failure recovery (retry once on infrastructure errors) - truncated_count field for judge convergence weighting - Pipeline checkpoint logging for diagnostics 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Three fixes from v3.4 test run: 1. Split merge judge into Data Prep Agent (haiku, mechanical I/O) + Merge Judge (reasoning only). Data prep reads 20+ files and compiles MERGE_INPUT.json. Merge judge reads one file, focuses on cross-section analysis. Fixes OOM/timeout failures where merge judge was spending half its budget on file reads. 2. Replace all ! operators in bash-embedded node -e scripts with === false and == null patterns. Bash history expansion escapes ! as \! which Node.js rejects as SyntaxError. 3. Add dash normalization to preservation check — em-dashes and en-dashes normalized before comparing section titles. Prevents false positives when enhancer normalizes typography. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Log analysis revealed learnings-researcher used string section_ids
("Phase-1-Types-Store") while manifest uses numeric ids (1, 2, 3).
Section judges filter on numeric equality, so 3 of 5 learnings recs
were silently dropped -- including high-value documented-learning
source recs for debounce, frame budgeting, and checkpoint handling.
Fixes:
- OUTPUT RULES now explicitly say "must be a numeric id like 1, 2, 3.
NOT a string like Phase-1"
- Instruction EveryInc#5 warns string IDs will be silently dropped
- Validation script warns on non-numeric section_ids
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
|
This is cool, still eciding how to incorporate since it does add a new pattern but will rty it out a bit |
kieranklaassen
left a comment
There was a problem hiding this comment.
Code Review: PR #178 — Rewrite /deepen-plan with context-managed map-reduce (v3)
Reviewers: architecture-strategist, code-simplicity-reviewer, pattern-recognition-specialist, learnings-researcher + manual analysis
Summary
The core idea — file-based map-reduce where agents write JSON to .deepen/ and return only a completion signal — is a genuine architectural improvement that solves a real context overflow problem. The cross-platform improvements (.deepen/ instead of /tmp/, Glob instead of find, Node.js instead of Python) are also welcome.
However, the PR wraps this good idea in significant overengineering. At 1,191 lines, it is 2x the next-largest command in the plugin (workflows:plan at 557 lines). An estimated ~670 lines (56%) could be removed while preserving the core innovation. Additionally, the PR needs a rebase — the version numbers are stale.
🔴 P1 — Critical (Blocks Merge)
1. Version Regression — Needs Rebase
Main is at v2.35.2. This PR bumps to v2.34.0 in both plugin.json and marketplace.json. The CHANGELOG adds a ## [2.34.0] entry, but that version already exists on main (Gemini CLI target from PR #190).
Fix: Rebase on main, bump to v2.36.0, and update the CHANGELOG entry accordingly.
2. Significant Overengineering (~670 removable lines)
The simplicity review identified 7 areas of unnecessary complexity that could be simplified without losing the core map-reduce innovation:
| Section | Lines | Why Remove |
|---|---|---|
| Judge pipeline (6a-6e) | ~230 | Enhancer agent can read agent JSON files directly and deduplicate as it goes. Three sub-agent types, two JSON schemas, and a validation script exist to produce input that the enhancer reads anyway. |
| Quality Review / CoVe (7b) | ~120 | This is a plan-deepening command, not code review. /plan_review already exists for plan quality. Checking "defensive stacking" in a plan doc is solving a problem the output format created. |
| Inline Node.js scripts | ~80 | Claude can read JSON and verify structure natively. Validation scripts add cross-platform concerns for no benefit over "Read each JSON file, verify required fields." |
| Checkpoint logging | ~60 | Diagnostic infrastructure for debugging the command itself. Users see failures in real-time. Not needed in the main flow. |
| Two-part output (Decision Record + Impl Spec) | ~100 | Downstream /workflows:work expects the simple ### Research Insights format from v1. The two-audience split has no consumer. |
| Batching boilerplate | ~30 | The 200-char return cap already solves the problem. One sentence ("Launch in batches of 4-6 if many agents matched") replaces 30 lines. |
| Over-specified manifest | ~30 | 8 boolean flags per section (has_code_examples, has_ui_components, etc.) are never referenced downstream. Drop them. |
Recommendation: Simplify to 5 phases (Parse → Discover → Research → Enhance → Present) matching the ~500-550 line range of other workflow commands. Merge validation/judging/quality review into the enhancer prompt.
For comparison:
| Command | Lines | JSON Schemas | Validation Scripts |
|---|---|---|---|
workflows:plan |
557 | 0 | 0 |
workflows:review |
528 | 0 | 0 |
deepen-plan v1 |
546 | 0 | 0 |
deepen-plan v3 |
1,191 | 5 | 3 |
🟡 P2 — Important (Should Fix)
3. Duplicate Step 8 Numbering (lines 319-320)
8. **truncated_count is REQUIRED (default 0).**
8. **CRITICAL — YOUR RETURN MESSAGE TO PARENT MUST BE UNDER 200 CHARACTERS.**
Two items numbered "8" in the OUTPUT_RULES block. The executing agent may deprioritize one.
Fix: Renumber to 8 and 9.
4. model: haiku Not Supported (line 690)
", model: haiku)
Claude Code's Task() tool does not accept a model: parameter. This will either be ignored or error. No other command in the plugin uses this.
Fix: Remove the parameter or add a comment that it's aspirational.
5. Step 5a/5b Behavior Inconsistency
- Step 5a says: "Re-launch missing agents before proceeding"
- Step 5b silently deletes invalid files with
fs.unlinkSync(fp)without re-launching
These should follow the same strategy. Currently, missing files get retried but corrupt files get silently dropped.
6. + SHARED_CONTEXT + OUTPUT_RULES Pseudocode
Lines 373, 383, 393, 410, 437, 461 use string concatenation syntax:
" + SHARED_CONTEXT + OUTPUT_RULES)
This is pseudocode — Claude Code doesn't support string concatenation in Task() calls. The executing agent must mentally expand these references, which adds cognitive load and risk of omission. Consider inlining a shortened version or adding explicit expansion instructions.
🔵 P3 — Nice-to-Have
7. Token Budget Inconsistency
Line 30: "Parent context stays under ~15k tokens"
Appendix: "Total parent from agents: ~8,500-13,000"
Pick one source of truth. The appendix is more detailed and credible.
8. Missing AskUserQuestion Tool Reference
The post-action options (line 1165) don't mention using the AskUserQuestion tool, breaking the pattern used by workflows:plan and the old deepen-plan.
9. No compound-engineering.local.md Integration
Unlike workflows:review which reads review agents from settings, this command has a fixed agent list. Consider whether it should respect the same config.
10. Consider Command-to-Skill Conversion
Per project memory: "Commands do NOT support progressive disclosure." A 1,191-line prompt loads entirely on every invocation. If restructured as a skill with SKILL.md + references/ files for each phase's agent templates, the initial context load would be much smaller. The command becomes a thin wrapper that invokes the skill.
✅ What the PR Does Well
- Core innovation is sound: File-based map-reduce (
.deepen/directory) genuinely solves context overflow - Cross-platform safety:
.deepen/instead of/tmp/, Glob instead offind, Node.js instead of Python - Eliminates v1 conflict: Old file had contradictory "run ALL agents" (Section 5) vs. selective matching (Sections 1-4). PR resolves this entirely
- Version grounding: Reading lockfiles for actual versions instead of trusting plan text is a good idea
- Manifest-matched agent selection: Intelligent agent matching replaces brute-force "run 40+ agents"
- Frontmatter and naming conventions: Correctly preserved, follows all plugin patterns
- PR description quality: Thorough alignment audit, architecture comparison table, breaking changes analysis
Recommended Path Forward
- Rebase on main and fix version to 2.36.0
- Simplify to ~500-550 lines by removing judge pipeline, quality review, inline scripts, and checkpoint logging
- Fix the three P2 items (duplicate numbering, model:haiku, step inconsistency)
- The result: the core map-reduce innovation at a complexity level consistent with the rest of the plugin
🤖 Generated with Claude Code
|
This one is really good and works. It consumes significantly fewer tokens on the main agent. |
Summary
/deepen-planwith a phased file-based map-reduce architecture (same pattern as the review command v2 in Rewrite workflows:review with context-managed map-reduce architecture #157).deepen/on disk and return only ~100 token summaries to parent.deepen/instead of/tmp/Architecture Changes
find+head(bash, Windows-broken).deepen/, return 1 sentencedocs/solutions/files via compound-docs skillAlignment Audit: v3 vs Plugin Ecosystem
Performed a full audit of v3 against the compound-engineering plugin (v2.33.0) -- all agents, skills, commands, MCP servers, and conventions.
1. Agent Discovery Completeness -- Aligned
~/.claude/plugins/cache/**/agents/**/*.mdmatchesevery-marketplace/compound-engineering/2.33.0/agents/review/(14),research/(4),design/(3),docs/(1). SKIP:workflow/(5 orchestrators)security-sentinel.md,architecture-strategist.md,performance-oracle.mdall verifiedcode-simplicity-reviewer,agent-native-reviewer,pattern-recognition-specialist, language-specific reviewersFixed from v1: v1 said "run ALL agents" (40+ parallel). v3 uses intelligent selection: 3 always-run + manifest-matched, reducing from ~30 agents to 10-15 targeted ones.
2. Skill Discovery -- Aligned (with fixes applied)
~/.claude/plugins/cache/**/skills/*/SKILL.mdmatches all 14 skillskieran-rails-style(doesn't exist as a skill -- it's an agent). Added:andrew-kane-gem-writer,dspy-ruby,every-style-editor,create-agent-skillsreferences/,assets/,templates/subdirectoriesFixed from v1: v1 only read
SKILL.md. v3 agents also readreferences/*.md,assets/*,templates/*-- critical for skills likeagent-native-architecture(14 reference files) andcreate-agent-skills(11 references + templates + workflows).3. Command Pipeline -- Aligned (with fixes applied)
/workflows:planto/deepen-planplans/directory./deepen-planto/workflows:work// ENHANCED:comments. Work command reads it fine./deepen-planto/plan_review/workflows:review(code review, not plan review). v3 correctly offers/plan_review./lfgchainname: deepen-plan(notworkflows:deepen-plan) so/lfgand/slfgreferences to/compound-engineering:deepen-planstill work.compound-docsskill and its YAML schema for properly validated learning files.4. MCP Server Integration -- Aligned
resolve-library-idandquery-docsverified againstplugin.jsonmcpServers config5. Naming and Convention Alignment -- Aligned (with fixes applied)
name: deepen-plan(incommands/notcommands/workflows/). Matches existing plugin structure and/lfg//slfgreferences.docs/solutions/[category]/with compound-docs schema.6. Philosophy Alignment -- Aligned
skill > documented-learning > official-docs > community-webmatchesbest-practices-researcherPhase 1-2-3 order7. Edge Cases -- Compatible
.claude/agents/*.mdglob catches project-local agentsreferences/,assets/,templates/~/.claude/plugins/cache/**/agents/**/*.mdcatches all.deepen/, Node.js for validation (no Python/bash dependency)Key Fixes Applied (v1 to v3)
SKILL.md. v3 reads full skill tree including references, assets, templates.kieran-rails-styleghost skill -- v1 referenced a skill that doesn't exist. v3 maps to actual skill names./workflows:reviewvs/plan_review-- v1 offered code review for plan feedback. v3 correctly offers plan review.deepen-plan(notworkflows:deepen-plan) for/lfg//slfgcompatibility.compound-docsYAML schema..deepen/instead of/tmp/. Node.js instead of Python3. Glob/Read instead offind/head.tools_usedflagged, judge downweights confidence by 0.2.Test plan
/deepen-plan plans/test-plan.mdon a Rails plan -- verify.deepen/directory created with expected JSON files/lfgend-to-end to confirm/compound-engineering:deepen-planstill resolves correctlycompound-docsYAML schema.deepen/path and Node.js validation work cross-platformdocs/solutions/to confirm sparse discovery handles gracefullyBreaking Changes Risk
If the plugin updates these, v3 would need updating:
security-sentinel). If agents are renamed, the always-run tier and manifest-matched lists need updating.These are the same risks the existing
/workflows:reviewand/workflows:plancommands face -- nothing unique to/deepen-plan.