Clean up the agent's reading material by jurgenwerk · Pull Request #4753 · cardstack/boxel

jurgenwerk · 2026-05-11T08:41:02Z

Summary

The factory agent reads a handful of "reference" docs at the start of every implementation issue — they're injected into the system prompt automatically. Three of those docs were giving the agent outdated advice: "call the create_catalog_spec tool," "use write_file," "shell out via run_command to fetch a schema." Those tools were all retired by CS-10883 / PR #4578. Some of them never existed in this PR's tool list at all.

The agent has been quietly working around the bad advice — checking the actual tool list, ignoring the docs, asking the right factory tool by hand — but that costs tokens, and in one earlier run we watched the contradiction send it off into other realms hunting for example cards before it figured out what to do.

This PR rewrites the three always-loaded docs to match the agent's actual world: native filesystem tools, the boxel CLI, and the get_card_schema factory tool. All the genuinely useful content (query syntax, the Spec shape, QUnit testing patterns) stays. Only the wrong tool names change.

What's in here

dev-spec-usage.md — used to start with "call the create_catalog_spec tool." Now it points the agent at get_card_schema({module: 'https://cardstack.com/base/spec', name: 'Spec'}) for the live shape, and shows a complete required-form JSON example so the agent doesn't have to reconstruct it from memory. Two new reminders that came out of real failure modes from earlier runs:

the linkedExamples relationship has to use dotted keys (linkedExamples.0, not an array)
don't pass the Spec card path to run_instantiate — its module lives in the base realm and the prerender refuses cross-origin module loads; the right move is to call run_instantiate without a path, which discovers Specs and exercises the linked examples instead

dev-realm-search.md — was titled "how to use the search_realm tool." Now it's "how to construct a query for boxel search --realm <target-realm-url>" — same JSON query shape, just delivered via the shell instead of a retired tool. The intro also reinforces the "stay in your target realm" rule from PR #4653. The bottom "how to discover available fields" section used to have the agent build a run_command payload by hand — now it just says "call get_card_schema(module, name)."

dev-qunit-testing.md — one bullet about reading TestRun cards via read_file. Swapped for the native Read + Glob equivalent.

Plus a small dead-code sweep in factory-agent/ (commit ce931df90a):

Deleted FILE_ACTION_TYPES, MockFactoryAgent, and the FactoryAgent (declarative model) interface — no live consumers.
Updated three historical comments in types.ts and claude-code.ts that still named retired tools (read_file / write_file / search_realm / run_command).
pnpm test:node: 330/330 pass after the sweep.

VALID_ACTION_TYPES / VALID_REALMS / AgentAction and friends are still vestigially wired through the iterate prompt's previousActions slot (always passed []); unwiring that one touches a few more files and tests, so it's deliberately not in this PR.

What's NOT in here

CS-10613's full scope is much bigger — moving boxel-development out of the factory entirely, deduplicating against the boxel-cli and skills-realm copies, building a new boxel-api skill, rewriting the loader. That work is its own dedicated PRs.

This is just the targeted fix that closes CS-10520's "skills aligned" acceptance item and stops the agent from reading bad advice every iteration.

Why target this branch instead of main

Validating the change against a healthy baseline is much easier when the system prompt rule from PR #4653 ("stay in your target realm, the skills you've loaded are authoritative") is already in place. That rule is what closes the gap between the rewritten skills and the LLM's residual "compare against existing examples" reflex. Testing on main would show a worse baseline that doesn't reflect production.

Once PR #4653 merges, GitHub will automatically rebase this onto main.

Test plan

pnpm factory:go --agent claude --brief-url … --target-realm … --debug completes a sticky-note-style implementation issue.
pnpm factory:go --agent openrouter --openrouter-api-key sk-or-… --brief-url … --target-realm … --debug likewise.
Agent does not attempt to call create_catalog_spec, write_file, read_file, search_realm, or run_command (none exist).
Agent does not run boxel file ls / boxel search against any realm other than its target realm.
No regression in validation pipeline outcomes vs. the runs we did on PR Replace OpenRouter HTTP backend with opencode agent #4653.
pnpm test:node 330/330 (after the dead-code sweep).

🤖 Generated with Claude Code

…ools Three references in `.agents/skills/boxel-development/references/` are pre-loaded into every implementation-issue system prompt (`factory-skill-loader.ts:106-113`). They still talked about factory tools that have been retired: - `dev-spec-usage.md` told the agent to call `create_catalog_spec` and `write_file`. Now it tells the agent to call `get_card_schema({module:'https://cardstack.com/base/spec', name:'Spec'})` for the live schema and write the JSON natively. Added a complete required-shape JSON example, a reminder about the dotted `linkedExamples.0` key form, and an explicit "don't run `run_instantiate` on the Spec itself — the prerender refuses cross-origin module loads" note (the trap we hit in earlier runs). - `dev-realm-search.md` framed everything around the retired `search_realm` tool. Rewrote the intro to point at `boxel search --realm <target-realm-url>` (target realm only) and added a reinforcing "do not query other realms" line to back up PR #4653's prompt-level rule. The "Discovering Available Fields" section's `run_command` JSON payload became a one-line `get_card_schema(module, name)` call. - `dev-qunit-testing.md` had a single `read_file` mention for TestRun inspection — swapped for native `Read` + `Glob`. All query-format / spec-type / QUnit-pattern content is preserved unchanged. Only the tool names changed. Closes CS-10520's Step 6 / Skills-aligned acceptance item via CS-10613 Phase C step 8 (audit always-loaded references). Full CS-10613 scope (Phase A: new boxel-api skill; Phase B: dedupe across factory / boxel-cli / skills-realm; Phase D: loader keyword-map updates) is follow-up work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Surfaced while auditing CS-10883 / CS-10613 leftovers — none of the following had a live consumer in src/ or tests/: - `FILE_ACTION_TYPES` (factory-agent/types.ts) — declared but never imported. - `FactoryAgent` interface (the original "declarative" agent shape) — only implementer was `MockFactoryAgent`, which itself had no callers outside its own definition and the barrel re-export. - `MockFactoryAgent` class (factory-agent/mocks.ts) and its re-export from factory-agent/index.ts. Removing those also lets the surrounding doc-comment in types.ts stop referencing `factory-agent.ts` / `factory-agent-tool-use.ts`, neither of which exist anymore. Also touched a few historical comments that name retired tools: - types.ts: model-pin docblock said "broke every `write_file`" — the pin still matters but the tool that hit the truncation is now native `Write`. - types.ts: `ClaudeCodeAgentConfig.workspaceDir` mentioned realm I/O going through `search_realm` / `run_command` MCP tools; both are retired. Replaced with the current list (`get_card_schema`, the five validators, the control signals). - claude-code.ts: two block comments mentioning `read_file` / `write_file` / `search_realm` as the "old shims" — rewrote in terms of the current native fs + MCP split. Still vestigially wired (not touched here): `VALID_ACTION_TYPES` / `VALID_REALMS` / `AgentActionType` / `ActionRealm` / `AgentAction` are all still consumed by the iterate prompt's `previousActions` slot. The slot is always passed `[]` today, but unwiring it touches `factory-prompt-loader`, the ticket-iterate.md template, and several tests — out of scope for a small dead-code sweep. Worth its own pass later. `pnpm test:node`: 330/330. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jurgenwerk marked this pull request as ready for review May 11, 2026 09:05

jurgenwerk requested a review from a team May 11, 2026 09:05

jurgenwerk mentioned this pull request May 11, 2026

factory:go: stop marking issues 'done' when the agent wrote nothing #4754

Draft

jurgenwerk mentioned this pull request May 11, 2026

Software factory: relocate skills + cleanup + rewrite for current factory tools #4756

Open

5 tasks

lukemelia approved these changes May 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up the agent's reading material#4753

Clean up the agent's reading material#4753
jurgenwerk wants to merge 2 commits intocs-11034-software-factory-replace-openrouter-backend-with-opencodefrom
cs-10613-align-skill-loading

jurgenwerk commented May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jurgenwerk commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in here

What's NOT in here

Why target this branch instead of main

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jurgenwerk commented May 11, 2026 •

edited

Loading