Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ Add `data-test-*` attributes to card templates for stable test selectors:

When tests fail, the orchestrator feeds test failure details back to the agent. For more detail:

- **TestRun cards** live in the target realm's `Validations/` folder with a `test_` prefix (e.g., `Validations/test_issue-slug-1.json`). To find all test runs, search by the TestRun card type in the target realm. Each TestRun has a `sequenceNumber` that increases with each iteration. Use `read_file` on a specific TestRun for full details.
- **TestRun cards** live in the target realm's `Validations/` folder with a `test_` prefix (e.g., `Validations/test_issue-slug-1.json`). To find all test runs, run `Glob` over `Validations/test_*.json` or shell out via `Bash` to `boxel search --realm <url>` filtered on the TestRun card type. Each TestRun has a `sequenceNumber` that increases with each iteration. Use native `Read` on a specific TestRun for full details — paths are workspace-relative.

## Rules

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
# Realm Search Query Reference

How to use the `search_realm` tool to query cards in a realm. The query object follows the Boxel realm search API format.
How to construct queries for the Boxel realm search index. Run them from `Bash` against your **target realm** with:

```
boxel search --realm <target-realm-url> --query '<json-query>'
```

The query JSON below is what goes into `--query`. Do not query other realms (base, software-factory, experiments, catalog) — the skills you've loaded are authoritative for patterns; cross-realm exploration burns tokens without helping.

## Basic Structure

Expand Down Expand Up @@ -216,27 +222,18 @@ Descending order:

## Discovering Available Fields

You can only filter/sort on fields that exist on the card type. To find which fields a card type has:
You can only filter/sort on fields that exist on the card type. To find which fields a card type has, call the `get_card_schema` factory tool:

1. Use `run_command` to fetch the JSON schema for a card type:

```json
{
"command": "@cardstack/boxel-host/commands/get-card-type-schema/default",
"commandInput": {
"codeRef": {
"module": "http://localhost:4201/software-factory/darkfactory",
"name": "Issue"
}
}
}
```
get_card_schema({
module: 'http://localhost:4201/software-factory/darkfactory',
name: 'Issue'
})
```

2. The result contains `attributes.properties` listing all searchable fields (e.g., `status`, `summary`, `priority`).

3. Use those field names in your `eq`, `contains`, `range`, or `sort` with the matching `on` type.
The result contains `schema.attributes.properties` listing all searchable fields (e.g., `status`, `summary`, `priority`) plus their types and any enum values. Use those field names in your `eq`, `contains`, `range`, or `sort` with the matching `on` type.

The card tools (`update_project`, `update_issue`, `create_knowledge`, `create_catalog_spec`) also have dynamic JSON schemas in their parameters that list available fields.
`get_card_schema` is also how you learn the shape for writing tracker (Project / Issue / KnowledgeArticle) and Spec card JSON files — call it before writing the JSON so what you write matches the live `CardDef`.

### Inheritance

Expand Down
Original file line number Diff line number Diff line change
@@ -1,24 +1,52 @@
# Catalog Spec Card Instances

For each top-level card definition, create a Catalog Spec card instance in the target realm's `Spec/` folder using the `create_catalog_spec` tool. This makes the card discoverable in the Boxel catalog.
For each top-level card definition, write a Catalog Spec card instance in the target realm's `Spec/` folder. This makes the card discoverable in the Boxel catalog.

The `create_catalog_spec` tool has the authoritative JSON schema for Spec card fields — use its parameter definitions to know which attributes and relationships are available. The tool auto-constructs the document with the correct `adoptsFrom` (`https://cardstack.com/base/spec#Spec`).
Specs adopt from `https://cardstack.com/base/spec#Spec` — that module lives in the base realm, not your target realm. Fetch the authoritative schema by calling the `get_card_schema` factory tool:

## Usage
```
get_card_schema({ module: 'https://cardstack.com/base/spec', name: 'Spec' })
```

Use the `create_catalog_spec` tool to create a Spec card. The tool's parameters define the available fields dynamically from the card definition — consult the tool schema for the exact field names and types.
The result gives you the exact `attributes` and `relationships` shape. Write the JSON file with native `Write` (paths are workspace-relative, e.g. `Spec/sticky-note.json`); `boxel sync` pushes it to the realm between iterations.

## Required Shape

```json
{
"data": {
"type": "card",
"attributes": {
"specType": "card",
"ref": { "module": "../sticky-note", "name": "StickyNote" },
"readMe": "...",
"cardInfo": { "name": "Sticky Note", "summary": "..." }
},
"relationships": {
"linkedExamples.0": { "links": { "self": "../StickyNote/welcome-note" } }
},
"meta": {
"adoptsFrom": {
"module": "https://cardstack.com/base/spec",
"name": "Spec"
}
}
}
}
```

Key concepts:

- `ref` — a CodeRef pointing to the card definition (module path + exported class name). The module path is relative from the Spec card to the `.gts` file (e.g., `../sticky-note` from `Spec/sticky-note.json`).
- `specType` — `"card"` for CardDef, `"field"` for FieldDef, `"component"` for standalone components.
- `linkedExamples` — a relationship pointing to sample card instances. Create at least one sample instance and link it here.
- `linkedExamples` — a `linksToMany` relationship pointing to sample card instances. Use dotted keys (`linkedExamples.0`, `linkedExamples.1`, …) — the array form is rejected by the indexer. Create at least one sample instance and link it here.
- **Do NOT call `run_instantiate` on the Spec file itself.** Spec's module lives in the base realm; the prerender enforces same-origin module loads and the call always fails. To validate Specs, call `run_instantiate` WITHOUT a `path`; it discovers Specs in the target realm and exercises their `linkedExamples` against the card classes you wrote.

## Sample Card Instances

Create at least one sample instance with realistic data for each top-level card. Sample instances serve as both catalog examples and test fixtures.

Place sample instances in a folder named after the card type (e.g., `StickyNote/welcome-note.json`). Use `write_file` to create them. The `linkedExamples` relationship in the Spec card points to these using a relative path (e.g., `../StickyNote/welcome-note`).
Place sample instances in a folder named after the card type (e.g., `StickyNote/welcome-note.json`) and write them with native `Write`. The `linkedExamples` relationship in the Spec card points to these using a relative path without the `.json` suffix (e.g., `../StickyNote/welcome-note`).

---

Expand Down
21 changes: 9 additions & 12 deletions packages/software-factory/src/factory-agent/claude-code.ts
Original file line number Diff line number Diff line change
Expand Up @@ -66,11 +66,10 @@ const MAX_TOOL_USE_TURNS = 50;

/**
* Built-in Claude Code tools the factory exposes to the model on the
* Claude backend. These replace the custom `read_file` / `write_file`
* factory tools — they operate on the SDK query's `cwd` (the factory
* workspace), so the model uses native semantics for fs work and we
* keep MCP focused on operations that genuinely need realm runtime
* access (search_realm, validators, structured updates, signals).
* Claude backend. They operate on the SDK query's `cwd` (the factory
* workspace), so the model handles workspace files natively while MCP
* stays focused on what needs realm runtime access (`get_card_schema`,
* validators, control signals).
*/
const NATIVE_FS_TOOLS = ['Read', 'Write', 'Edit', 'Bash', 'Glob', 'Grep'];

Expand Down Expand Up @@ -298,14 +297,12 @@ export class ClaudeCodeFactoryAgent implements LoopAgent {
// Two tool surfaces are visible to the model on the Claude backend:
// 1. Native Claude Code tools (Read / Write / Edit / Bash / Glob /
// Grep) — anchored to the factory workspace via the SDK query's
// `cwd`. These replace the factory's old `read_file` /
// `write_file` shims; the model works on the local mirror of the
// target realm directly.
// `cwd`. The model works on the local mirror of the target realm
// directly; `boxel sync` pushes between iterations.
// 2. Factory tools exposed via an in-process MCP server, prefixed
// with `mcp__<server>__`. Used for everything that needs realm
// runtime access (search, validators, host commands, structured
// updates) and for control signals (signal_done /
// request_clarification).
// with `mcp__<server>__`. Used for realm-runtime operations
// (`get_card_schema`, the five validators) and for control
// signals (`signal_done`, `request_clarification`).
//
// The shared prompt template / skills reference factory operations by
// their plain names (e.g. `signal_done`). Append a short rename map
Expand Down
2 changes: 1 addition & 1 deletion packages/software-factory/src/factory-agent/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@ export * from './types';
export { OpencodeFactoryAgent } from './opencode';
export type { OpencodeAgentConfig } from './opencode';
export { ClaudeCodeFactoryAgent } from './claude-code';
export { MockFactoryAgent, MockLoopAgent } from './mocks';
export { MockLoopAgent } from './mocks';
45 changes: 4 additions & 41 deletions packages/software-factory/src/factory-agent/mocks.ts
Original file line number Diff line number Diff line change
@@ -1,51 +1,14 @@
/**
* Mock agent implementations for testing.
*
* These are deterministic agents that return pre-scripted responses,
* used by unit tests and smoke tests to verify orchestration logic
* without calling a real LLM.
* Deterministic agents that return pre-scripted responses, used by unit
* tests and smoke tests to verify orchestration logic without calling
* a real LLM.
*/

import type { AgentAction, AgentContext, FactoryAgent } from './types';
import type { LoopAgent, AgentRunResult } from './types';
import type { AgentContext, LoopAgent, AgentRunResult } from './types';
import type { FactoryTool } from '../factory-tool-builder';

// ---------------------------------------------------------------------------
// MockFactoryAgent — deterministic FactoryAgent for declarative model tests
// ---------------------------------------------------------------------------

export class MockFactoryAgent implements FactoryAgent {
private responses: AgentAction[][];
private callIndex = 0;

/** All AgentContext inputs received, in order. */
readonly receivedContexts: AgentContext[] = [];

constructor(responses: AgentAction[][]) {
this.responses = responses;
}

async plan(context: AgentContext): Promise<AgentAction[]> {
this.receivedContexts.push(context);

if (this.callIndex >= this.responses.length) {
throw new Error(
`MockFactoryAgent exhausted: called ${this.callIndex + 1} times ` +
`but only ${this.responses.length} response(s) were configured`,
);
}

let response = this.responses[this.callIndex];
this.callIndex++;
return response;
}

/** Number of times plan() has been called. */
get callCount(): number {
return this.callIndex;
}
}

// ---------------------------------------------------------------------------
// MockLoopAgent — deterministic LoopAgent for tool-use model tests
// ---------------------------------------------------------------------------
Expand Down
30 changes: 8 additions & 22 deletions packages/software-factory/src/factory-agent/types.ts
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
/**
* Shared types, interfaces, and constants for the factory agent system.
*
* This module contains all the data types used across the declarative agent
* (factory-agent.ts), the tool-use agent (factory-agent-tool-use.ts), and
* their consumers (loop, context builder, prompt loader, etc.).
* The runtime agents (`ClaudeCodeFactoryAgent` in `claude-code.ts`,
* `OpencodeFactoryAgent` in `opencode.ts`) implement the `LoopAgent`
* interface declared here; orchestration consumers (issue loop, context
* builder, prompt loader) share the data types declared below.
*/

// ---------------------------------------------------------------------------
Expand All @@ -17,7 +18,7 @@
* Pinned to `claude-opus-4-7` rather than the unversioned `claude-opus-4`
* alias. The alias route exhibited a deterministic mid-stream truncation
* on large tool-call arguments (`finish_reason: null`, `completion=1`)
* that broke every `write_file` for full `.gts` card definitions. Opus
* that broke every native `Write` for full `.gts` card definitions. Opus
* 4.7 on the pinned route returned clean `finish_reason: tool_calls`
* responses with completions up to ~4.7K tokens in a single turn, and
* ran an end-to-end factory loop to `outcome=all_issues_done` with no
Expand Down Expand Up @@ -54,14 +55,6 @@ export const VALID_ACTION_TYPES = [

export const VALID_REALMS = ['target', 'test'] as const;

// Action types that require path + content
export const FILE_ACTION_TYPES: ReadonlySet<string> = new Set([
'create_file',
'update_file',
'create_test',
'update_test',
]);

// ---------------------------------------------------------------------------
// Types
// ---------------------------------------------------------------------------
Expand Down Expand Up @@ -93,8 +86,9 @@ export interface ClaudeCodeAgentConfig {
* query's `cwd` so the model's native Read / Write / Edit / Bash / Glob /
* Grep tools operate against the factory workspace by default — paths like
* `sticky-note.gts` resolve inside the workspace, with no surprise hits
* against the user's filesystem. Realm I/O still goes through factory
* MCP tools (search_realm, run_command, validators, …).
* against the user's filesystem. Realm-runtime operations go through the
* factory MCP tools (get_card_schema, run_lint / run_parse / run_evaluate
* / run_instantiate / run_tests, signal_done, request_clarification).
*/
workspaceDir?: string;
}
Expand Down Expand Up @@ -259,14 +253,6 @@ export interface AgentAction {
toolArgs?: Record<string, unknown>;
}

// ---------------------------------------------------------------------------
// FactoryAgent interface (declarative model)
// ---------------------------------------------------------------------------

export interface FactoryAgent {
plan(context: AgentContext): Promise<AgentAction[]>;
}

// ---------------------------------------------------------------------------
// Message types (for LLM communication)
// ---------------------------------------------------------------------------
Expand Down