Skip to content

Clean up the agent's reading material#4753

Open
jurgenwerk wants to merge 2 commits intocs-11034-software-factory-replace-openrouter-backend-with-opencodefrom
cs-10613-align-skill-loading
Open

Clean up the agent's reading material#4753
jurgenwerk wants to merge 2 commits intocs-11034-software-factory-replace-openrouter-backend-with-opencodefrom
cs-10613-align-skill-loading

Conversation

@jurgenwerk
Copy link
Copy Markdown
Contributor

@jurgenwerk jurgenwerk commented May 11, 2026

Summary

The factory agent reads a handful of "reference" docs at the start of every implementation issue — they're injected into the system prompt automatically. Three of those docs were giving the agent outdated advice: "call the create_catalog_spec tool," "use write_file," "shell out via run_command to fetch a schema." Those tools were all retired by CS-10883 / PR #4578. Some of them never existed in this PR's tool list at all.

The agent has been quietly working around the bad advice — checking the actual tool list, ignoring the docs, asking the right factory tool by hand — but that costs tokens, and in one earlier run we watched the contradiction send it off into other realms hunting for example cards before it figured out what to do.

This PR rewrites the three always-loaded docs to match the agent's actual world: native filesystem tools, the boxel CLI, and the get_card_schema factory tool. All the genuinely useful content (query syntax, the Spec shape, QUnit testing patterns) stays. Only the wrong tool names change.

What's in here

dev-spec-usage.md — used to start with "call the create_catalog_spec tool." Now it points the agent at get_card_schema({module: 'https://cardstack.com/base/spec', name: 'Spec'}) for the live shape, and shows a complete required-form JSON example so the agent doesn't have to reconstruct it from memory. Two new reminders that came out of real failure modes from earlier runs:

  • the linkedExamples relationship has to use dotted keys (linkedExamples.0, not an array)
  • don't pass the Spec card path to run_instantiate — its module lives in the base realm and the prerender refuses cross-origin module loads; the right move is to call run_instantiate without a path, which discovers Specs and exercises the linked examples instead

dev-realm-search.md — was titled "how to use the search_realm tool." Now it's "how to construct a query for boxel search --realm <target-realm-url>" — same JSON query shape, just delivered via the shell instead of a retired tool. The intro also reinforces the "stay in your target realm" rule from PR #4653. The bottom "how to discover available fields" section used to have the agent build a run_command payload by hand — now it just says "call get_card_schema(module, name)."

dev-qunit-testing.md — one bullet about reading TestRun cards via read_file. Swapped for the native Read + Glob equivalent.

Plus a small dead-code sweep in factory-agent/ (commit ce931df90a):

  • Deleted FILE_ACTION_TYPES, MockFactoryAgent, and the FactoryAgent (declarative model) interface — no live consumers.
  • Updated three historical comments in types.ts and claude-code.ts that still named retired tools (read_file / write_file / search_realm / run_command).
  • pnpm test:node: 330/330 pass after the sweep.

VALID_ACTION_TYPES / VALID_REALMS / AgentAction and friends are still vestigially wired through the iterate prompt's previousActions slot (always passed []); unwiring that one touches a few more files and tests, so it's deliberately not in this PR.

What's NOT in here

CS-10613's full scope is much bigger — moving boxel-development out of the factory entirely, deduplicating against the boxel-cli and skills-realm copies, building a new boxel-api skill, rewriting the loader. That work is its own dedicated PRs.

This is just the targeted fix that closes CS-10520's "skills aligned" acceptance item and stops the agent from reading bad advice every iteration.

Why target this branch instead of main

Validating the change against a healthy baseline is much easier when the system prompt rule from PR #4653 ("stay in your target realm, the skills you've loaded are authoritative") is already in place. That rule is what closes the gap between the rewritten skills and the LLM's residual "compare against existing examples" reflex. Testing on main would show a worse baseline that doesn't reflect production.

Once PR #4653 merges, GitHub will automatically rebase this onto main.

Test plan

  • pnpm factory:go --agent claude --brief-url … --target-realm … --debug completes a sticky-note-style implementation issue.
  • pnpm factory:go --agent openrouter --openrouter-api-key sk-or-… --brief-url … --target-realm … --debug likewise.
  • Agent does not attempt to call create_catalog_spec, write_file, read_file, search_realm, or run_command (none exist).
  • Agent does not run boxel file ls / boxel search against any realm other than its target realm.
  • No regression in validation pipeline outcomes vs. the runs we did on PR Replace OpenRouter HTTP backend with opencode agent #4653.
  • pnpm test:node 330/330 (after the dead-code sweep).

🤖 Generated with Claude Code

…ools

Three references in `.agents/skills/boxel-development/references/`
are pre-loaded into every implementation-issue system prompt
(`factory-skill-loader.ts:106-113`). They still talked about
factory tools that have been retired:

- `dev-spec-usage.md` told the agent to call `create_catalog_spec`
  and `write_file`. Now it tells the agent to call
  `get_card_schema({module:'https://cardstack.com/base/spec', name:'Spec'})`
  for the live schema and write the JSON natively. Added a
  complete required-shape JSON example, a reminder about the
  dotted `linkedExamples.0` key form, and an explicit "don't run
  `run_instantiate` on the Spec itself — the prerender refuses
  cross-origin module loads" note (the trap we hit in earlier runs).

- `dev-realm-search.md` framed everything around the retired
  `search_realm` tool. Rewrote the intro to point at
  `boxel search --realm <target-realm-url>` (target realm only)
  and added a reinforcing "do not query other realms" line to back
  up PR #4653's prompt-level rule. The "Discovering Available
  Fields" section's `run_command` JSON payload became a one-line
  `get_card_schema(module, name)` call.

- `dev-qunit-testing.md` had a single `read_file` mention for
  TestRun inspection — swapped for native `Read` + `Glob`.

All query-format / spec-type / QUnit-pattern content is preserved
unchanged. Only the tool names changed.

Closes CS-10520's Step 6 / Skills-aligned acceptance item via
CS-10613 Phase C step 8 (audit always-loaded references). Full
CS-10613 scope (Phase A: new boxel-api skill; Phase B: dedupe
across factory / boxel-cli / skills-realm; Phase D: loader
keyword-map updates) is follow-up work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jurgenwerk jurgenwerk marked this pull request as ready for review May 11, 2026 09:05
@jurgenwerk jurgenwerk requested a review from a team May 11, 2026 09:05
Surfaced while auditing CS-10883 / CS-10613 leftovers — none of the
following had a live consumer in src/ or tests/:

- `FILE_ACTION_TYPES` (factory-agent/types.ts) — declared but never
  imported.
- `FactoryAgent` interface (the original "declarative" agent shape) —
  only implementer was `MockFactoryAgent`, which itself had no callers
  outside its own definition and the barrel re-export.
- `MockFactoryAgent` class (factory-agent/mocks.ts) and its
  re-export from factory-agent/index.ts.

Removing those also lets the surrounding doc-comment in types.ts
stop referencing `factory-agent.ts` / `factory-agent-tool-use.ts`,
neither of which exist anymore.

Also touched a few historical comments that name retired tools:
- types.ts: model-pin docblock said "broke every `write_file`" — the
  pin still matters but the tool that hit the truncation is now
  native `Write`.
- types.ts: `ClaudeCodeAgentConfig.workspaceDir` mentioned realm I/O
  going through `search_realm` / `run_command` MCP tools; both are
  retired. Replaced with the current list (`get_card_schema`, the
  five validators, the control signals).
- claude-code.ts: two block comments mentioning `read_file` /
  `write_file` / `search_realm` as the "old shims" — rewrote in
  terms of the current native fs + MCP split.

Still vestigially wired (not touched here):
`VALID_ACTION_TYPES` / `VALID_REALMS` / `AgentActionType` /
`ActionRealm` / `AgentAction` are all still consumed by the iterate
prompt's `previousActions` slot. The slot is always passed `[]`
today, but unwiring it touches `factory-prompt-loader`, the
ticket-iterate.md template, and several tests — out of scope for a
small dead-code sweep. Worth its own pass later.

`pnpm test:node`: 330/330.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants