Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 14 additions & 5 deletions docs/development/e2e-assertions.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,25 +158,34 @@ This is important: we don't need to match the entire stdout, just one line.
| Config dir exists | `container.exec('test -d /root/.codex')` | Exit code `0` |
| Config file exists | `container.fileExists('/root/.codex/config.toml')` | `true` |
| Config: model_provider | parsed TOML | `"poe"` |
| Config: model | parsed TOML | Default: `gpt-5.2-codex` |
| Config: model | parsed TOML | Default: `gpt-5.5` |
| Config: model_reasoning_effort | parsed TOML | Default: `medium` |
| Config: model_verbosity | parsed TOML | Default: `medium` |
| Config: profiles.gpt-5.5.model | parsed TOML | `gpt-5.5` |
| Config: model_providers.poe.base_url | parsed TOML | Non-empty URL |
| Config: model_providers.poe.experimental_bearer_token | parsed TOML | Non-empty string (API key) |

**Expected config structure (TOML):**
```toml
model_provider = "poe"
model = "gpt-5.2-codex"
model = "gpt-5.5"
model_reasoning_effort = "medium"
model_verbosity = "medium"

[profiles."gpt-5.5"]
model = "gpt-5.5"
model_provider = "poe"
model_reasoning_effort = "medium"
model_verbosity = "medium"

[model_providers.poe]
name = "poe"
base_url = "https://api.poe.com"
base_url = "https://api.poe.com/v1"
wire_api = "responses"
experimental_bearer_token = "<api-key>"
```

**Default model:** `openai/gpt-5.2-codex` (stripped to `gpt-5.2-codex`)
**Default model:** `openai/gpt-5.5` (stripped to `gpt-5.5`)

#### opencode

Expand Down Expand Up @@ -246,7 +255,7 @@ max_context_size = 256000

[providers.poe]
type = "openai_legacy"
base_url = "https://api.poe.com"
base_url = "https://api.poe.com/v1"
api_key = "<api-key>"
```

Expand Down
25 changes: 23 additions & 2 deletions packages/acp-telemetry/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,31 @@
# @poe-code/acp-telemetry

Pure ACP -> trace converters plus Braintrust/OTEL emitters.
Pure ACP event-to-trace conversion plus Braintrust and OpenTelemetry emitters.

## Public Exports

- `@poe-code/acp-telemetry`
- `acpToTrace(ctx)` converts an `@poe-code/agent-spawn` ACP spawn context into an `AcpTrace`.
- `emitToBraintrust(trace, parent)` writes the trace as nested Braintrust task/tool spans.
- `emitToOtel(trace, tracer)` writes the trace as OpenTelemetry-style spans and attributes.
- `redact(value)` removes sensitive prompt, tool, and metadata fields before emission.
- Types: `AcpTrace`, `AcpTraceSpan`, `BraintrustSpanLike`, `OtelSpanLike`, `OtelTracerLike`.

## Trace shape

`acpToTrace` creates one root `agent:<agent>:<model>` span with redacted prompt input, accumulated assistant output, token/cost/duration metrics when present, session/thread metadata, and one child span per ACP tool call. Tool child spans include redacted inputs, assembled tool outputs, tool call metadata, and start/end timestamps when the ACP event metadata includes them.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acpToTrace does not currently include any cost metric from ctx.usage (it only maps token counts/cache counts and durationMs in buildMetrics), so documenting cost here is misleading.

Suggested change
`acpToTrace` creates one root `agent:<agent>:<model>` span with redacted prompt input, accumulated assistant output, token/cost/duration metrics when present, session/thread metadata, and one child span per ACP tool call. Tool child spans include redacted inputs, assembled tool outputs, tool call metadata, and start/end timestamps when the ACP event metadata includes them.
`acpToTrace` creates one root `agent:<agent>:<model>` span with redacted prompt input, accumulated assistant output, token/duration metrics when present, session/thread metadata, and one child span per ACP tool call. Tool child spans include redacted inputs, assembled tool outputs, tool call metadata, and start/end timestamps when the ACP event metadata includes them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acpToTrace still does not emit a cost metric: buildMetrics() maps token counts/cache counts and durationMs, but not costUsd/cost. Keeping cost here makes the trace contract inaccurate.

Suggested change
`acpToTrace` creates one root `agent:<agent>:<model>` span with redacted prompt input, accumulated assistant output, token/cost/duration metrics when present, session/thread metadata, and one child span per ACP tool call. Tool child spans include redacted inputs, assembled tool outputs, tool call metadata, and start/end timestamps when the ACP event metadata includes them.
`acpToTrace` creates one root `agent:<agent>:<model>` span with redacted prompt input, accumulated assistant output, token/duration metrics when present, session/thread metadata, and one child span per ACP tool call. Tool child spans include redacted inputs, assembled tool outputs, tool call metadata, and start/end timestamps when the ACP event metadata includes them.


```ts
import { acpToTrace, emitToOtel } from "@poe-code/acp-telemetry";

const trace = acpToTrace(spawnContext);
emitToOtel(trace, tracer);
```

## Emitters

Braintrust emission expects a parent span-like object with `startSpan`, `log`, and `end`. The root is emitted as a `task`; children are emitted as `tool` spans.

OpenTelemetry emission expects a tracer-like object with `startSpan`. Agent spans set `gen_ai.system`, request model, agent name, token usage, and Poe Code session/thread attributes. Tool spans set tool name and tool-call id attributes. Non-primitive inputs and outputs are serialized as JSON attributes.

## Configuration

Expand Down
8 changes: 8 additions & 0 deletions packages/agent-harness-tools/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,14 @@ Notes:
- Cancellation is only possible in the interactive prompt path. If `select(...)` returns a value that `isCancel(...)` recognizes, the function returns `{ cancelled: true }` and does not throw. Callers are responsible for showing the cancellation message and stopping the command cleanly.
- `pipeline`, `experiment`, `ralph`, and `superintendent` all route their single-agent loop selection through this function so the precedence stays aligned across commands.

## Plan document helpers

`openPlanList`, `discoverPlans`, and `archivePlan` expose numbered Markdown plan folders through `@poe-code/task-list`. They resolve the configured plan directory with the same cwd/home rules as workflow docs, open it as a `markdown-dir` single-list named `plans`, and use `frontmatterMode: "passthrough"` so plan-specific metadata survives task updates.

- `discoverPlans({ cwd, homeDir, planDirectory, kinds? })` returns plan ids, names, kinds, absolute paths, and display paths. Numeric filename prefixes such as `04-api-shape-providers.md` are stripped from ids.
- `archivePlan({ cwd, homeDir, planDirectory, id })` fires the task-list `archive` event, moves the document under `archive/`, and repacks active plan prefixes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

archivePlan() does not repack active prefixes; the current test explicitly expects archiving to leave 01-first.md and 03-third.md in place.

Suggested change
- `archivePlan({ cwd, homeDir, planDirectory, id })` fires the task-list `archive` event, moves the document under `archive/`, and repacks active plan prefixes.
- `archivePlan({ cwd, homeDir, planDirectory, id })` fires the task-list `archive` event and moves the document under `archive/` without renumbering active plan prefixes.

- `openPlanList(...)` returns the underlying `TaskList` for commands that need direct task operations.

## Configuration

This package does not read config files directly, but Poe Code callers commonly pass `configuredDefaultAgent` from merged `core.defaultAgent`.
Expand Down
33 changes: 30 additions & 3 deletions packages/agent-harness/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,40 @@
# @poe-code/agent-harness

Shared package for agent harness types and runtime orchestration APIs.
Shared harness loader, template, schema, and runtime orchestration APIs for `.md` + `.ajs` agent-script harness pairs.

This package is currently a scaffold. Loader, module, template, codegen, and CLI behavior will be added in later tasks.
## Public API

- `runHarnessPair(mdPath, options)` resolves the matching `.ajs` file, validates frontmatter against any exported `schema`, lints the script, and runs it through `@poe-code/agent-script`.
- `listBuiltinTemplates()` returns bundled template pairs: `ralph-demo`, `coverage-demo`, `experiment-demo`, `pipeline-demo`, and `superintendent-demo`.
- `extractSchema(source, filename)` reads a harness script's exported schema for frontmatter validation.
- `resolvePair(mdPath)` resolves the Markdown/script pair for a harness document.
- `LintError` wraps lint diagnostics raised before execution.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LintError is declared in loader/run.ts, but packages/agent-harness/src/index.ts does not re-export it and the package only exports dist/index.js. Either export it from the public entrypoint or drop this public API bullet.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LintError is still not exported from packages/agent-harness/src/index.ts, and the package only exposes that entrypoint. Please either export it or remove it from the documented public API.


## Harness pairs

A harness is a Markdown document plus a sibling `.ajs` script. The Markdown frontmatter configures the run; the body is passed to the harness import metadata. The `.ajs` file must export a default entry point and may export `schema` to validate frontmatter before execution.

`runHarnessPair` locks the Markdown file while it runs, injects the `schema` module, wraps host modules for deterministic replay across resumes, writes snapshots, and cleans up completed snapshots after successful runs.

## Snapshots and resume

Pass `snapshotPath` to control where snapshots are read and written. `resume` defaults to `true`; set `resume: false` to remove a completed snapshot and force a fresh run. If a snapshot exists, the underlying agent-script source hash must still match the `.ajs` source.

The CLI mirrors these options:

```sh
poe-code harness run harness.md --snapshot-path .poe-code/harnesses/demo/snapshot.json --resume
poe-code harness new coverage-demo coverage.md
```

## Built-in templates

`listBuiltinTemplates()` exposes template metadata with `kind`, `mdPath`, and `ajsPath`. `poe-code harness new <kind> <path>` copies both files into a new harness pair.

## Environment Variables

This package does not read any environment variables.

## Configuration

This package does not read any configuration options.
This package does not read package-level configuration. Runtime behavior is supplied through `runHarnessPair` options: `modulesFor`, `allowedGlobals`, `resume`, `signal`, and `snapshotPath`.
57 changes: 37 additions & 20 deletions packages/agent-spawn/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ const result = await spawn("codex", {
mode: "edit",
model: "openai/gpt-5.5",
mcpServers: {
fs: { command: "node", args: ["./mcp/fs.js"], timeout: 30 },
},
fs: { command: "node", args: ["./mcp/fs.js"], timeout: 30 }
}
});

console.log(result.exitCode, result.stdout);
Expand All @@ -23,11 +23,11 @@ console.log(listMcpSupportedAgents());

## Spawn modes

| Mode | Purpose |
|------|---------|
| `yolo` | Full automation for trusted tasks. |
| Mode | Purpose |
| ------ | ------------------------------------------------------------- |
| `yolo` | Full automation for trusted tasks. |
| `edit` | File-editing mode when the agent supports scoped permissions. |
| `read` | Read-only/research mode when the agent supports it. |
| `read` | Read-only/research mode when the agent supports it. |

Mode-specific args and env vars are declared in each agent config. Goose uses `GOOSE_MODE` internally for mode selection; callers do not need to set it manually.

Expand All @@ -39,6 +39,22 @@ Pass `mcpServers` as a map of server names to `{ command, args?, env?, timeout?

`spawnAutonomous(streamSpawn, options)` drives a streaming ACP spawn to completion, renders events through the design-system ACP writer, and retries activity timeouts. It is shared by SDK autonomous spawn flows and loop runners.

## ACP middlewares

Pass `middlewares` to `spawnStreaming` or `spawnAcp` to wrap the ACP session lifecycle. A middleware receives a mutable `SpawnContext` with session id, agent id, prompt/model/mode/cwd, accumulated events, usage, optional event stream, and any log file selected by the middleware. Middlewares must call `next()` at most once.

```ts
import { spawnStreaming, type AcpMiddleware } from "@poe-code/agent-spawn";

const telemetry: AcpMiddleware = async (ctx, next) => {
await next();
console.log(ctx.agent, ctx.threadId, ctx.usage);
};

const run = spawnStreaming({ agentId: "codex", prompt: "Summarize", middlewares: [telemetry] });
await run.done;
```

## Testing helper

The `./testing` export provides a Vitest helper for code that depends on `spawn`:
Expand All @@ -48,7 +64,7 @@ import { createSpawnMock } from "@poe-code/agent-spawn/testing";

const spawnMock = createSpawnMock({
spawnResult: { stdout: "ok" },
autonomousResult: { text: "done" },
autonomousResult: { text: "done" }
});

vi.mock("@poe-code/agent-spawn", spawnMock.factory);
Expand All @@ -58,19 +74,20 @@ vi.mock("@poe-code/agent-spawn", spawnMock.factory);

## Config options

| Option | Type | Description |
|--------|------|-------------|
| `prompt` | `string` | Prompt sent to the agent. |
| `cwd` | `string` | Working directory. Defaults to the caller's process cwd. |
| `model` | `string` | Optional model override. Provider prefixes are stripped or preserved per agent config. |
| `mode` | `"yolo" \| "edit" \| "read"` | Permission mode. Defaults are chosen by the caller. |
| `args` | `string[]` | Extra args forwarded to the agent process. |
| `mcpServers` | `Record<string, McpSpawnServer>` | MCP servers injected into the spawned agent. |
| `useStdin` | `boolean` | Send the prompt through stdin when the agent supports it. |
| `interactive` | `boolean` | Spawn the agent in interactive TUI mode. |
| `activityTimeoutMs` | `number` | Kill/retry inactive streaming processes after this many milliseconds. |
| `logPath` / `logDir` / `logFileName` | `string` | Persist spawn logs. `logPath` takes precedence. |
| Option | Type | Description |
| ------------------------------------ | -------------------------------- | -------------------------------------------------------------------------------------- |
| `prompt` | `string` | Prompt sent to the agent. |
| `cwd` | `string` | Working directory. Defaults to the caller's process cwd. |
| `model` | `string` | Optional model override. Provider prefixes are stripped or preserved per agent config. |
| `mode` | `"yolo" \| "edit" \| "read"` | Permission mode. Defaults are chosen by the caller. |
| `args` | `string[]` | Extra args forwarded to the agent process. |
| `mcpServers` | `Record<string, McpSpawnServer>` | MCP servers injected into the spawned agent. |
| `middlewares` | `AcpMiddleware[]` | Wrap `spawnStreaming`/`spawnAcp` execution for telemetry, logging, or post-processing. |
| `useStdin` | `boolean` | Send the prompt through stdin when the agent supports it. |
| `interactive` | `boolean` | Spawn the agent in interactive TUI mode. |
| `activityTimeoutMs` | `number` | Kill/retry inactive streaming processes after this many milliseconds. |
| `logPath` / `logDir` / `logFileName` | `string` | Persist spawn logs. `logPath` takes precedence. |

## Environment variables

This package does not expose public environment variables. It inherits `process.env` for child processes and may add agent-specific env overrides from declarative spawn config, such as `GOOSE_MODE` for Goose modes or `OPENCODE_CONFIG_CONTENT` for OpenCode MCP injection.
This package does not expose public environment variables. It inherits `process.env` for child processes and may add agent-specific env overrides from declarative spawn config, such as `GOOSE_MODE` for Goose modes, `GOOSE_DISABLE_KEYRING=1` for Goose file-backed credentials, or `OPENCODE_CONFIG_CONTENT` for OpenCode MCP injection.
16 changes: 14 additions & 2 deletions packages/config-extends/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,22 @@
# @poe-code/config-extends

Shared document-inheritance types for layered config resolution.
Shared document-inheritance utilities for layered config resolution.

## API

Currently this package exposes shared TypeScript types only.
- `resolve(chain, options)`: resolves exactly one document layer with surrounding data and base layers.
- `findBase(name, basePaths, fs)`: discovers a base document by name across configured base paths.
- `parseDocument(content, filePath)`: parses a document and separates inheritance metadata from data.
- `mergeLayers(layers)`: merges data layers and tracks the source of each resolved key.

## Resolution behavior

A chain must contain exactly one document layer. Data layers before and after the document are merged around the resolved document, and base layers define directories that can be inherited from.

- Documents that set `extends: true` must resolve a base and still report circular inheritance as an error.
- With `autoExtend: true`, documents that do not set `extends` try to inherit from matching bases automatically.
- Optional auto-extend discovery is ignored when it finds the document itself, so a document can safely live in a configured base directory without creating a circular extends error.
- Prompt values can compose with the `{{yield}}` token across resolved base layers.

## Environment variables

Expand Down
Loading
Loading