Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
249 changes: 249 additions & 0 deletions src/content/docs/agents/api-reference/code-mode.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
---
title: Code Mode
pcx_content_type: concept
tags:
- AI
- MCP
sidebar:
order: 19
---

import {
TypeScriptExample,
WranglerConfig,
PackageManagers,
LinkCard,
} from "~/components";

Code Mode is an alternative approach to using [MCP](/agents/model-context-protocol/) tools with AI agents. Instead of presenting MCP tools directly to the LLM, Code Mode converts MCP tool schemas into a TypeScript API and asks the LLM to write code that calls that API. The generated code runs in a secure, sandboxed [V8 isolate](/workers/reference/how-workers-works/) powered by the [Dynamic Worker Loader API](/workers/runtime-apis/bindings/worker-loader/).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to call out more directly that you can use Code Mode with EXISTING MCP Servers

and that Code Mode lets MCP servers expose hundreds or thousands of tools without blowing up the agent's context window. Needs to explain benefits more directly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bonk Need to explain that it is not alternative to MCP but better way to use MCP. And make the benefits clearer around how this uses so much less context. You get into that below but I think needs to be moved even further up


:::note[Dynamic Worker Loading is in closed beta]

Code Mode relies on the [Dynamic Worker Loader API](/workers/runtime-apis/bindings/worker-loader/), which is available in local development with Wrangler and workerd. To run dynamic Workers on Cloudflare in production, you must [sign up for the closed beta](https://forms.gle/MoeDxE9wNiqdf8ri9).

:::

## Why Code Mode?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bonk this section needs to call out more clearly that Code Mode means using fewer tokens


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bonk please use this image: https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6DRERHP138FSj3GG0QYj3M/99e8c09b352560b7d4547ca299482c27/image2.png in this section. You will need to take that image file and copy it into this repository, committing it to this repo. Do not just link to that URL. The image is a diagram of code mode vs. traditional MCP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bonk actually the image should go in the "How it Works" section below

Traditional MCP tool calling sends every tool schema — name, description, and full parameter definitions — as part of the LLM prompt on every request. When an MCP server exposes tens or hundreds of tools, those schemas consume a large share of the context window and drive up token usage. Each tool invocation also requires a full round-trip through the model, so multi-step workflows multiply that cost further.

Code Mode drastically reduces token consumption by replacing all of those individual tool schemas with a compact TypeScript API definition. Instead of repeating verbose JSON schemas on every request, the LLM receives a single typed interface and writes code against it. The result is fewer input tokens per request and fewer round-trips overall.

LLMs have extensive training data covering real-world TypeScript code, but far less exposure to tool-calling conventions. When tools are presented as a TypeScript API instead of tool schemas, LLMs can:

- **Use far fewer tokens** — a compact TypeScript interface replaces the full JSON schema for every tool, significantly shrinking prompt size. MCP servers can expose hundreds or thousands of tools without blowing up the context window
- **Handle more tools with higher accuracy** — a familiar TypeScript interface is easier for the model to reason about than abstract tool schemas
- **Chain multiple calls in a single execution** — instead of round-tripping through the model between each tool call, the LLM writes a single script that calls multiple tools in sequence, eliminating per-step token overhead
- **Return only final results** — intermediate values stay within the sandbox, and only the data the LLM needs is passed back, keeping response tokens minimal

## How it works

1. When you connect to an MCP server with Code Mode enabled, the Agents SDK fetches the server's tool schema and converts it into a TypeScript API with doc comments
2. Instead of exposing each MCP tool individually, the agent receives a single tool: **execute TypeScript code**
3. The LLM generates code that calls the TypeScript API to accomplish the task
4. The code runs in an isolated V8 sandbox with no direct Internet access — the only way for the code to interact with the outside world is through the provided API bindings
5. API calls from the sandbox are dispatched back to the agent via [RPC](/workers/runtime-apis/bindings/service-bindings/rpc/), which routes them to the appropriate MCP server
6. Results are collected via `console.log()` and returned to the agent when the script finishes

## Usage

### Before: traditional tool calling

With traditional tool calling, each MCP tool is presented directly to the LLM as a separate tool:

<TypeScriptExample>

```ts
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";

const result = await generateText({
model: openai("gpt-4o"),
system: "You are a helpful assistant",
messages,
tools: this.mcp.getAITools(), // Each MCP tool exposed individually
});
```

</TypeScriptExample>

### After: with Code Mode

With Code Mode, the `codemode()` wrapper transforms your tools and system prompt so the LLM writes code instead of making individual tool calls:

<TypeScriptExample>

```ts
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
import { experimental_codemode as codemode } from "@cloudflare/codemode/ai";

// Wrap your existing tools and system prompt with codemode
const { prompt, tools: wrappedTools } = await codemode({
model: openai("gpt-4.1"),
prompt: "You are a helpful assistant",
tools: this.mcp.getAITools(),
loader: this.env.LOADER,
proxy: this.ctx.exports.CodeModeProxy({
props: {
binding: "MyAgent",
name: this.name,
callback: "callTool",
},
}),
globalOutbound: null, // Block all direct Internet access
});

const result = await generateText({
model: openai("gpt-4.1"),
system: prompt,
messages,
tools: wrappedTools, // Single "execute code" tool
});
```

</TypeScriptExample>

The `codemode()` function returns a modified `prompt` (with the generated TypeScript API documentation injected) and a `tools` object containing a single tool that executes the LLM-generated code in a sandbox.

## Configuration

### Install dependencies

<PackageManagers pkg="@cloudflare/codemode ai @ai-sdk/openai" />

### Add the Worker Loader binding

Code Mode requires a [Dynamic Worker Loader](/workers/runtime-apis/bindings/worker-loader/) binding to create sandboxed isolates:

<WranglerConfig>

```jsonc
{
// ...
"worker_loaders": [
{
"binding": "LOADER",
},
],
// ...
}
```

</WranglerConfig>

### Export the CodeModeProxy

The `CodeModeProxy` entrypoint handles RPC calls from the sandbox back to your agent. Export it alongside your agent class:

<TypeScriptExample>

```ts
import { Agent } from "agents";
export { CodeModeProxy } from "@cloudflare/codemode/ai";

export class MyAgent extends Agent<Env> {
// Your agent implementation
}
```

</TypeScriptExample>

## Sandboxing and security

Code Mode runs LLM-generated code in a fully isolated [V8 isolate](/workers/reference/how-workers-works/). The sandbox provides strong security guarantees:

- **No Internet access** — setting `globalOutbound: null` blocks all `fetch()` and `connect()` calls from the sandbox. The only way for sandboxed code to communicate with the outside world is through the provided TypeScript API bindings
- **API keys are never exposed** — bindings provide already-authorized interfaces to MCP servers. Access tokens are held by the parent agent and injected into requests at the RPC layer, so the generated code never sees credentials
- **Disposable isolates** — each code execution gets a fresh isolate that is discarded after the script finishes. There is no shared state between executions
- **Controlled access** — the sandbox can only call the MCP tools you explicitly provide. You control what capabilities are available through the bindings you pass in

For more details on how isolate sandboxing works, refer to the [Dynamic Worker Loader API documentation](/workers/runtime-apis/bindings/worker-loader/).

## Generated TypeScript API

When Code Mode processes an MCP server's tool schema, it generates a TypeScript interface with doc comments derived from the tool descriptions. For example, an MCP server that provides documentation search tools might produce:

```ts
interface SearchDocumentationInput {
/** The search query to find relevant documentation */
query: string;
}

interface SearchDocumentationOutput {
[key: string]: any;
}

declare const codemode: {
/**
* Semantically search within the fetched documentation.
* Useful for specific queries.
*/
search_documentation: (
input: SearchDocumentationInput,
) => Promise<SearchDocumentationOutput>;
};
```

This generated API is injected into the system prompt so the LLM knows what functions are available and how to call them. The LLM then writes code using these typed functions rather than making raw tool calls.

## Example: multi-step MCP workflow

Code Mode is particularly useful when a task requires chaining multiple MCP operations. Instead of multiple round-trips through the LLM, the model generates a single script:

```js
// LLM-generated code that runs in the sandbox
const files = await codemode.list_files({ path: "/projects" });

const recentProject = files
.filter((f) => f.type === "directory")
.sort((a, b) => new Date(b.modified) - new Date(a.modified))[0];

const status = await codemode.get_project_status({
name: recentProject.name,
});

if (status.state === "needs_review") {
await codemode.create_task({
title: `Review: ${recentProject.name}`,
priority: "high",
});
console.log(`Created review task for ${recentProject.name}`);
} else {
console.log(`${recentProject.name} is up to date`);
}
```

All MCP calls execute within a single sandbox invocation. The LLM reads back only the `console.log()` output.

## Current limitations

- **Experimental** — Code Mode is experimental and may have breaking changes in future releases
- **Closed beta for production** — The underlying [Dynamic Worker Loader API](/workers/runtime-apis/bindings/worker-loader/) is available locally but requires [closed beta access](https://forms.gle/MoeDxE9wNiqdf8ri9) for production deployment on Cloudflare
- **JavaScript only** — Sandbox execution is limited to JavaScript (Python support is planned)

## Next steps

<LinkCard
title="Dynamic Worker Loader API"
href="/workers/runtime-apis/bindings/worker-loader/"
description="Learn about the underlying API that powers Code Mode sandboxing."
/>

<LinkCard
title="McpClient"
href="/agents/api-reference/mcp-client-api/"
description="Connect your agent to MCP servers."
/>

<LinkCard
title="MCP Tools"
href="/agents/model-context-protocol/tools/"
description="Design and add tools to your MCP server."
/>

<LinkCard
title="Using AI Models"
href="/agents/api-reference/using-ai-models/"
description="Call AI models from your agent."
/>
Loading