Run OpenAI Codex agents from Claude Code through a daemonless MCP plugin.
claude-code-codex-subagents lets Claude Code ask Codex, an independent
frontier model like Claude, for read-only reviews, parallel investigations, Spark
checks, native follow-ups, live steering, and explicit full-access Codex work
when the user asks for it. It is especially useful when Claude needs a more
technical subagent for deep codebase work, server/deployment tasks, difficult
debugging, or adversarial review.
- Native Claude Code workflow: Claude gets a small Task-like MCP surface:
codex_task,codex_task_group,codex_followup, andcodex_wait_any. - Frontier-model second opinion: Codex gives Claude an independent technical reviewer for adversarial checks and hard engineering work.
- Read-only by default: Codex starts with
--sandbox read-onlyand non-interactive approvals. - No daemon: Claude launches the MCP server over stdio for the active session.
- Fast parallel review: Claude can launch several independent Codex agents with bounded concurrency.
- Persistent sessions: App-server sessions keep Codex context across prompts and support live steering.
- Codex desktop friendly: The plugin prefers the Codex binary shipped inside
Codex.appand creates normal Codex threads with task names for Desktop history. - Debuggable: Verbose JSONL logging, diagnostics bundles, progress events, per-session resources, and recovery hints are built in.
Requirements:
- Node.js 20 or newer
- Claude Code
- Codex CLI, preferably the Codex desktop app
Install the plugin once:
git clone https://github.com/xuio/claude-code-codex-subagents.git
cd claude-code-codex-subagents
npm run install:localThen, in any project where Claude Code should use Codex subagents, add the MCP server:
mkdir -p .claude
cat > .claude/mcp.json <<'JSON'
{
"mcpServers": {
"codex-subagents": { "command": "codex-subagents-mcp" }
}
}
JSONThen ask Claude:
Use Codex to review this PR for correctness, security, and deployment risks.
For plugin development, run Claude directly against this checkout:
claude --plugin-dir .For local development against Claude's installed plugin cache:
npm run dev:link
npm run dev:watchdev:link symlinks Claude's local plugin install back to this working tree, so both
Claude Code CLI and the Claude Desktop bundled Claude Code binary see your changes
after dist/index.js is rebuilt.
Ask Codex for an adversarial second opinion on the session recovery code. Keep it read-only and return concrete findings with file paths.
Claude should use codex_task.
Launch three Codex subagents in parallel: one for API behavior, one for tests, and one adversarial security reviewer. Keep all of them read-only.
Claude can call codex_task three times in parallel, or use codex_task_group
when a single rolled-up response is useful.
Use Codex Spark to do a fast focused pass on the tool descriptions.
Claude can pass advanced.model: "spark" instead of remembering the exact Spark model slug.
Start a long-running Codex session on this repo, then let me send follow-up prompts into the same context.
Claude should use codex_task for the initial prompt, preserve the returned
session_id, and use codex_followup to continue, steer, wait on, or cancel
that same Codex context. For multiple background sessions, Claude should use
codex_wait_any to harvest whichever finishes first. Clients can also subscribe
to codex://sessions/{session_id} for milestone and completion updates. For a
completed first turn, Claude should set keep_session: true; for long first
turns, Claude should set background: true.
Blocking waits are capped to responsive slices by default. When codex_followup
or codex_wait_any returns completed: false, the Codex session is still
running; call the wait tool again or read codex://sessions/{session_id}.
When app-server sessions use the Codex desktop binary, they are recorded as normal top-level Codex threads rather than hidden daemon work. The plugin sets the thread name from Claude's task label when the installed Codex app-server supports thread naming. When Claude explicitly cancels a session, or when retention pruning removes an old session, the plugin best-effort archives the matching Codex Desktop thread before closing the app-server child. It does not hard-delete threads, and recoverable sessions are not archived during normal MCP shutdown.
The default execution mode is conservative:
--sandbox read-onlyapproval_policy="never"- prompt text sent over stdin, not argv
- secret-looking environment variables are not forwarded unless explicitly requested
- large tool responses are compacted before returning to Claude
Full local access is opt-in per call:
{
"full_access": true
}Use that only when the user explicitly wants Codex to edit files, write git state, use DNS/network, install packages, or behave like a normal unrestricted Codex run.
Advanced callers can still request advanced.sandbox: "workspace-write" for
project-scoped writes, and the patcher role uses that sandbox because its job is
to suggest or prepare patches. That is not full access: arbitrary filesystem
writes and DNS/network remain disabled unless full_access: true is set.
- Usage guide - tools, examples, models, sessions, env vars.
- Architecture - process model, app-server sessions, durability, logging.
- Development - setup, local Claude linking, test tiers, release checklist.
- Troubleshooting - diagnostics, logs, common failure modes.
- Known limitations - sharp edges and operational constraints.
- Release notes - current release highlights and validation.
- Security policy - default sandboxing, logs, reporting.
- Contributing - contribution expectations and local checks.
| Use case | Preferred tools |
|---|---|
| One read-only Codex task | codex_task |
| Several independent tasks | codex_task_group |
| Persistent context | codex_task with keep_session: true, then codex_followup |
| Long-running sessions | codex_task with background: true, then codex_followup |
| First completed background task | codex_wait_any |
| Live steering | codex_followup with mode: "steer" |
| Stop running work | codex_followup with mode: "cancel" |
| Session progress | MCP resource codex://sessions/{session_id} |
| Diagnostics | MCP resources codex://status, codex://doctor, codex://usage |
Prefer Codex over native Task when the user wants an independent frontier-model second opinion, deep technical review, complex codebase exploration, server or deployment investigation, or adversarial validation. Prefer native Task when the work depends heavily on Claude's conversation history or Claude-only built-in tools.
Legacy tools such as ask_codex, run_agent, run_agents, start_session, and
send_session_prompt are hidden by default. Set
CODEX_SUBAGENTS_ENABLE_LEGACY_TOOLS=1 only for older clients that still call the
pre-refactor names.
Debug tools such as codex_status, codex_doctor, codex_usage_guide,
codex_choose_tool, and codex_export_debug_bundle are hidden by default. Set
CODEX_SUBAGENTS_ENABLE_DEBUG_TOOLS=1 only when a client needs tool-callable
diagnostics instead of the MCP diagnostic resources.
npm install
npm run build
npm run test:ci
npm run validate:pluginFor local install/update:
npm run install:local
npm run update:localtest:ci is portable and uses the fake Codex binary. It does not require live
Claude or Codex credentials.
Opt-in live checks are available for hardening:
npm run test:codex-runtime
npm run test:real-matrix
npm run test:claude-orchestration
npm run test:claude-real-codex
npm run test:claude-real-sessionSee Development for the full test matrix.
Current release: v0.3.0.
This project is early, but it is already tested across:
- stdio MCP protocol flows
- fake Codex reliability and stress cases
- Codex desktop runtime probes
- app-server protocol contract checks
- Claude Code plugin validation
- opt-in real Claude Code plus real Codex end-to-end runs
The most important invariant is that routine delegation stays read-only unless the caller explicitly opts into full local access.
MIT