Skip to content

feat(agents): drive coder via Claude/Codex SDKs instead of CLI binaries#4263

Open
kevin-dp wants to merge 7 commits intomainfrom
kevin/coder-sdk-runners
Open

feat(agents): drive coder via Claude/Codex SDKs instead of CLI binaries#4263
kevin-dp wants to merge 7 commits intomainfrom
kevin/coder-sdk-runners

Conversation

@kevin-dp
Copy link
Copy Markdown
Contributor

@kevin-dp kevin-dp commented May 4, 2026

The coder entity used to spawn the claude / codex binaries and mirror events by tailing JSONL files on disk. Replace that with two SDK-driven runners that iterate the official SDK async generators and forward each event through agent-session-protocol's per-event normalisers, so the entity no longer depends on global CLI installs and stops touching the filesystem to discover sessions.

  • Claude runner adapts each SDKMessage to ClaudeEntry and routes it through normalizeClaudeEvent.
  • Codex runner translates each completed ThreadItem directly into NormalizedEvents — Codex's SDK envelope is higher-level than the rollout JSONL, so the JSONL normaliser doesn't apply.
  • CodingSessionCliRunner.run() gains onEvent and onSessionId callbacks; the orchestrator loses its file-watcher / pre+post diff plumbing and just forwards events live.

Note: changeset is set as minor, lmk if it needs to be patch instead.

kevin-dp and others added 2 commits May 4, 2026 11:41
The coder entity used to spawn the `claude` / `codex` binaries and
mirror events by tailing JSONL files on disk. Replace that with two
SDK-driven runners that iterate the official SDK async generators and
forward each event through `agent-session-protocol`'s per-event
normalisers, so the entity no longer depends on global CLI installs
and stops touching the filesystem to discover sessions.

- Claude runner adapts each `SDKMessage` to `ClaudeEntry` and routes
  it through `normalizeClaudeEvent`.
- Codex runner translates each completed `ThreadItem` directly into
  `NormalizedEvent`s — Codex's SDK envelope is higher-level than the
  rollout JSONL, so the JSONL normaliser doesn't apply.
- `CodingSessionCliRunner.run()` gains `onEvent` and `onSessionId`
  callbacks; the orchestrator loses its file-watcher / pre+post diff
  plumbing and just forwards events live.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@netlify
Copy link
Copy Markdown

netlify Bot commented May 4, 2026

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit ede5956
🔍 Latest deploy log https://app.netlify.com/projects/electric-next/deploys/69f86ae6d77a9c00087540e9
😎 Deploy Preview https://deploy-preview-4263--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

❌ Patch coverage is 48.71795% with 100 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.83%. Comparing base (38f550c) to head (b9d4702).
⚠️ Report is 6 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
packages/agents/src/agents/runners/codex-sdk.ts 57.37% 52 Missing ⚠️
packages/agents/src/agents/runners/claude-sdk.ts 38.77% 29 Missing and 1 partial ⚠️
packages/agents/src/agents/coding-session.ts 33.33% 11 Missing and 1 partial ⚠️
packages/agents/src/agents/runners/env.ts 0.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4263      +/-   ##
==========================================
+ Coverage   64.61%   64.83%   +0.22%     
==========================================
  Files         144      147       +3     
  Lines       19308    19373      +65     
  Branches     4747     4785      +38     
==========================================
+ Hits        12476    12561      +85     
+ Misses       6829     6807      -22     
- Partials        3        5       +2     
Flag Coverage Δ
packages/agents 57.00% <48.71%> (+2.59%) ⬆️
packages/agents-runtime 78.61% <ø> (ø)
packages/agents-server 66.03% <ø> (ø)
packages/agents-server-ui 0.00% <ø> (ø)
packages/electric-ax 38.59% <ø> (ø)
packages/experimental 87.73% <ø> (ø)
packages/react-hooks 86.48% <ø> (ø)
packages/start 82.83% <ø> (ø)
packages/typescript-client 94.27% <ø> (-0.04%) ⬇️
packages/y-electric 56.05% <ø> (ø)
typescript 64.83% <48.71%> (+0.22%) ⬆️
unit-tests 64.83% <48.71%> (+0.22%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 4, 2026

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit 0c3e6f6
🔍 Latest deploy log https://app.netlify.com/projects/electric-next/deploys/69f86b34d077f00008e7f7a3
😎 Deploy Preview https://deploy-preview-4263--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@kevin-dp kevin-dp added the claude label May 4, 2026
@claude
Copy link
Copy Markdown

claude Bot commented May 4, 2026

Claude Code Review

Summary

This PR replaces CLI binary spawning with SDK-driven runners for both Claude and Codex. The latest commit adds env-stripping so the coder subprocess falls back to user-configured credentials (claude login / codex login) rather than inheriting the parent process's API keys. The implementation is clean and well-motivated.

What's Working Well

  • subprocessEnvWithoutKey is correctly implemented: It spreads all of process.env, skips the named key, and filters out undefined values — covering the edge case where an env key was set then deleted. The JSDoc comment explaining why the full spread is necessary (SDKs replace the env, not merge) is exactly the right thing to document here.
  • Symmetric design between runners: Both runners call subprocessEnvWithoutKey inside run(), so each invocation snapshots the current env independently. This is consistent across Claude and Codex despite using different SDK call sites (options object vs. constructor).
  • Comment quality in claude-sdk.ts: The inline comment explaining the ANTHROPIC_API_KEY stripping and why Horton still needs it in its own process is clear and removes any ambiguity about the intent.

Issues Found

Suggestions (Nice to Have)

subprocessEnvWithoutKey has no unit test

File: packages/agents/src/agents/runners/env.ts

The function is simple but touches security-relevant behavior: accidentally including a key that should be stripped (or stripping one that shouldn't be) would silently change auth behavior. A 5-line test covering the key-removal case and the undefined-value-skipping case would give future maintainers a safety net.

it('strips the named key and preserves everything else', () => {
  const original = process.env
  process.env = { HOME: '/home/user', ANTHROPIC_API_KEY: 'secret', PATH: '/usr/bin' }
  const result = subprocessEnvWithoutKey('ANTHROPIC_API_KEY')
  expect(result).toEqual({ HOME: '/home/user', PATH: '/usr/bin' })
  process.env = original
})

agent-session-protocol version jump still undocumented (from iteration 2)

The bump from ^0.0.2 to ^0.0.8 spans six pre-1.0 minor versions. No explanation was added to the PR description or changeset. A single sentence noting what the new exports (normalizeClaudeEvent, ClaudeEntry, normalizeToolName) require would help future reviewers assess risk on this dependency.


file_change tool_call+tool_result asymmetry still unexplained (from iteration 2)

File: packages/agents/src/agents/runners/codex-sdk.ts:260-263

The current comment explains why the same id is used for both events, but not why threadItemStartedToEvents emits nothing for file_change while threadItemCompletedToEvents synthesizes a paired tool_call+tool_result. Adding one line to fileChangeToEvents (analogous to the web_search comment) would close this gap:

// Codex doesn't emit a `file_change` item.started event, so we
// synthesise both tool_call and tool_result at completion time.
// Synthesise a tool_call + tool_result pair for the patch as a whole.

Behavioral change worth calling out in changeset

Stripping ANTHROPIC_API_KEY / OPENAI_API_KEY is a breaking behavioral change for any deployment that configured coder auth exclusively via env vars (no claude login or codex login configured). The changeset description describes the feature but doesn't mention that API-key-in-env auth for the coder subprocess no longer works. A sentence noting this — and that operators should configure credentials via the CLIs' own auth flows instead — would prevent confusion for existing users.


Issue Conformance

No linked issue — same warning as previous iterations.

Previous Review Status

All items from iteration 2 have been addressed or remain as suggestions:

  • startedItems dead code, web_search tool_result, unit tests, permission flags comment — all resolved in earlier commits
  • The new commit (env key stripping) is a clean addition
  • agent-session-protocol version bump explanation: still open (no description update added)
  • file_change asymmetry comment: still open (comment explains ID sharing, not the started→empty asymmetry)

Review iteration: 3 | 2026-05-04

kevin-dp and others added 5 commits May 4, 2026 12:02
The set was populated on item.started but never read — neither guarded
duplicate processing nor drove downstream logic, just leaked memory
across long sessions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously web_search items emitted a tool_call on item.started but
nothing on item.completed, leaving the lifecycle dangling — UIs that
render tool calls would show a perpetually-pending web search.
Codex's WebSearchItem doesn't expose result content (only `query`),
so emit an empty tool_result to honour the contract every other
tool-style item already follows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer flagged permissionMode + allowDangerouslySkipPermissions as
possibly redundant. They aren't — the SDK requires both: the mode
selects bypass and the boolean is the explicit acknowledgement gate
(SDK throws if missing when the bypass mode is used). Document that
in the call site so the next reader doesn't try to drop one.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer flagged the SDK→NormalizedEvent conversion helpers as the
most likely place for silent regressions when the underlying SDKs
evolve. They are pure functions and don't need the SDK to be mocked,
so cover each branch directly:

- `sdkMessageToClaudeEntry` for system/init, system/compact_boundary,
  user, assistant (with usage + stop_reason), result (renames
  `duration_ms`), and SDK-only types that should return null.
- `threadItemStartedToEvents` and `threadItemCompletedToEvents` for
  every ThreadItem variant — including the new web_search lifecycle
  pairing, file_change kind classification (add → file_write, mixed
  → file_edit), MCP success/failure, and command exit-code → isError.

Helpers are exported with a doc comment marking them as test-only
entry points so it's clear they're not part of the runner's public
surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Strip ANTHROPIC_API_KEY / OPENAI_API_KEY from the env passed to the
spawned `claude` / `codex` subprocesses so they fall back to whatever
the user has set up via `claude login` / `codex login` (OAuth tokens
in ~/.claude/.credentials.json and ~/.codex/auth.json). The parent
process can still hold those keys for other uses — Horton talks to
the Anthropic API directly via @anthropic-ai/sdk and depends on the
key being in scope — they just don't leak into the coder's CLI
subprocess.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant