Clean up the agent's reading material#4753
Open
jurgenwerk wants to merge 2 commits intocs-11034-software-factory-replace-openrouter-backend-with-opencodefrom
Open
Clean up the agent's reading material#4753jurgenwerk wants to merge 2 commits intocs-11034-software-factory-replace-openrouter-backend-with-opencodefrom
jurgenwerk wants to merge 2 commits intocs-11034-software-factory-replace-openrouter-backend-with-opencodefrom
Conversation
…ools
Three references in `.agents/skills/boxel-development/references/`
are pre-loaded into every implementation-issue system prompt
(`factory-skill-loader.ts:106-113`). They still talked about
factory tools that have been retired:
- `dev-spec-usage.md` told the agent to call `create_catalog_spec`
and `write_file`. Now it tells the agent to call
`get_card_schema({module:'https://cardstack.com/base/spec', name:'Spec'})`
for the live schema and write the JSON natively. Added a
complete required-shape JSON example, a reminder about the
dotted `linkedExamples.0` key form, and an explicit "don't run
`run_instantiate` on the Spec itself — the prerender refuses
cross-origin module loads" note (the trap we hit in earlier runs).
- `dev-realm-search.md` framed everything around the retired
`search_realm` tool. Rewrote the intro to point at
`boxel search --realm <target-realm-url>` (target realm only)
and added a reinforcing "do not query other realms" line to back
up PR #4653's prompt-level rule. The "Discovering Available
Fields" section's `run_command` JSON payload became a one-line
`get_card_schema(module, name)` call.
- `dev-qunit-testing.md` had a single `read_file` mention for
TestRun inspection — swapped for native `Read` + `Glob`.
All query-format / spec-type / QUnit-pattern content is preserved
unchanged. Only the tool names changed.
Closes CS-10520's Step 6 / Skills-aligned acceptance item via
CS-10613 Phase C step 8 (audit always-loaded references). Full
CS-10613 scope (Phase A: new boxel-api skill; Phase B: dedupe
across factory / boxel-cli / skills-realm; Phase D: loader
keyword-map updates) is follow-up work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surfaced while auditing CS-10883 / CS-10613 leftovers — none of the following had a live consumer in src/ or tests/: - `FILE_ACTION_TYPES` (factory-agent/types.ts) — declared but never imported. - `FactoryAgent` interface (the original "declarative" agent shape) — only implementer was `MockFactoryAgent`, which itself had no callers outside its own definition and the barrel re-export. - `MockFactoryAgent` class (factory-agent/mocks.ts) and its re-export from factory-agent/index.ts. Removing those also lets the surrounding doc-comment in types.ts stop referencing `factory-agent.ts` / `factory-agent-tool-use.ts`, neither of which exist anymore. Also touched a few historical comments that name retired tools: - types.ts: model-pin docblock said "broke every `write_file`" — the pin still matters but the tool that hit the truncation is now native `Write`. - types.ts: `ClaudeCodeAgentConfig.workspaceDir` mentioned realm I/O going through `search_realm` / `run_command` MCP tools; both are retired. Replaced with the current list (`get_card_schema`, the five validators, the control signals). - claude-code.ts: two block comments mentioning `read_file` / `write_file` / `search_realm` as the "old shims" — rewrote in terms of the current native fs + MCP split. Still vestigially wired (not touched here): `VALID_ACTION_TYPES` / `VALID_REALMS` / `AgentActionType` / `ActionRealm` / `AgentAction` are all still consumed by the iterate prompt's `previousActions` slot. The slot is always passed `[]` today, but unwiring it touches `factory-prompt-loader`, the ticket-iterate.md template, and several tests — out of scope for a small dead-code sweep. Worth its own pass later. `pnpm test:node`: 330/330. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
lukemelia
approved these changes
May 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The factory agent reads a handful of "reference" docs at the start of every implementation issue — they're injected into the system prompt automatically. Three of those docs were giving the agent outdated advice: "call the
create_catalog_spectool," "usewrite_file," "shell out viarun_commandto fetch a schema." Those tools were all retired by CS-10883 / PR #4578. Some of them never existed in this PR's tool list at all.The agent has been quietly working around the bad advice — checking the actual tool list, ignoring the docs, asking the right factory tool by hand — but that costs tokens, and in one earlier run we watched the contradiction send it off into other realms hunting for example cards before it figured out what to do.
This PR rewrites the three always-loaded docs to match the agent's actual world: native filesystem tools, the
boxelCLI, and theget_card_schemafactory tool. All the genuinely useful content (query syntax, the Spec shape, QUnit testing patterns) stays. Only the wrong tool names change.What's in here
dev-spec-usage.md— used to start with "call thecreate_catalog_spectool." Now it points the agent atget_card_schema({module: 'https://cardstack.com/base/spec', name: 'Spec'})for the live shape, and shows a complete required-form JSON example so the agent doesn't have to reconstruct it from memory. Two new reminders that came out of real failure modes from earlier runs:linkedExamplesrelationship has to use dotted keys (linkedExamples.0, not an array)run_instantiate— its module lives in the base realm and the prerender refuses cross-origin module loads; the right move is to callrun_instantiatewithout a path, which discovers Specs and exercises the linked examples insteaddev-realm-search.md— was titled "how to use thesearch_realmtool." Now it's "how to construct a query forboxel search --realm <target-realm-url>" — same JSON query shape, just delivered via the shell instead of a retired tool. The intro also reinforces the "stay in your target realm" rule from PR #4653. The bottom "how to discover available fields" section used to have the agent build arun_commandpayload by hand — now it just says "callget_card_schema(module, name)."dev-qunit-testing.md— one bullet about reading TestRun cards viaread_file. Swapped for the nativeRead+Globequivalent.Plus a small dead-code sweep in
factory-agent/(commitce931df90a):FILE_ACTION_TYPES,MockFactoryAgent, and theFactoryAgent(declarative model) interface — no live consumers.types.tsandclaude-code.tsthat still named retired tools (read_file/write_file/search_realm/run_command).pnpm test:node: 330/330 pass after the sweep.VALID_ACTION_TYPES/VALID_REALMS/AgentActionand friends are still vestigially wired through the iterate prompt'spreviousActionsslot (always passed[]); unwiring that one touches a few more files and tests, so it's deliberately not in this PR.What's NOT in here
CS-10613's full scope is much bigger — moving boxel-development out of the factory entirely, deduplicating against the boxel-cli and skills-realm copies, building a new
boxel-apiskill, rewriting the loader. That work is its own dedicated PRs.This is just the targeted fix that closes CS-10520's "skills aligned" acceptance item and stops the agent from reading bad advice every iteration.
Why target this branch instead of main
Validating the change against a healthy baseline is much easier when the system prompt rule from PR #4653 ("stay in your target realm, the skills you've loaded are authoritative") is already in place. That rule is what closes the gap between the rewritten skills and the LLM's residual "compare against existing examples" reflex. Testing on main would show a worse baseline that doesn't reflect production.
Once PR #4653 merges, GitHub will automatically rebase this onto main.
Test plan
pnpm factory:go --agent claude --brief-url … --target-realm … --debugcompletes a sticky-note-style implementation issue.pnpm factory:go --agent openrouter --openrouter-api-key sk-or-… --brief-url … --target-realm … --debuglikewise.create_catalog_spec,write_file,read_file,search_realm, orrun_command(none exist).boxel file ls/boxel searchagainst any realm other than its target realm.pnpm test:node330/330 (after the dead-code sweep).🤖 Generated with Claude Code