Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ MODEL_ID=claude-sonnet-4-6
# ---- International ----

# MiniMax https://www.minimax.io
# ANTHROPIC_BASE_URL=https://api.minimax.io/anthropic
# ANTHROPIC_BASE_URL=https://api.minimax.com/anthropic
# MODEL_ID=MiniMax-M2.5

# GLM (Zhipu) https://z.ai
Expand Down
43 changes: 41 additions & 2 deletions README-zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,44 @@

[English](./README.md) | [中文](./README-zh.md) | [日本語](./README-ja.md)

## 模型就是 Agent

在讨论代码之前,先把一件事说清楚。

**Agent 是模型。不是框架。不是流程图。不是提示词链。**

"Agent"这个词被绑架了。不知从什么时候起,一整个行业开始相信:把提示词节点、if-else 分支、LLM API 调用用有向无环图串起来,就算是在"构建 Agent"了。不是的。他们做出来的东西是鲁布·戈德堡机械 -- 一个过度工程化的、脆弱的、硬编码规则流水线,LLM 在里面只是一个被美化了的文本补全节点。那不是 Agent。那是一个有着宏大幻想的 shell 脚本。

**直说了吧:提示词流 "Agent" 是不做模型的码农的意淫。** 他们试图通过堆叠过程式逻辑来暴力模拟智能 -- 庞大的 if-else 树、节点图、链式提示词瀑布流 -- 然后祈祷足够多的胶水代码能涌现出自主行为。不会的。你不可能通过工程手段编码出 agency。Agency 是学出来的,不是编出来的。那些系统从诞生之日起就已经死了:脆弱、不可扩展、根本不具备泛化能力。它们是 GOFAI(Good Old-Fashioned AI,经典符号 AI)的现代翻版 -- 几十年前就被学界抛弃的符号规则系统,现在刷了一层 LLM 的漆又登场了。

### Agent 到底是什么

远在 LLM 存在之前,AI 学界就有精确的定义:**Agent 是一个能感知环境、做出决策、采取行动以达成目标的模型。** 重点在 *模型* -- 一个学习到的函数,不是一个脚本化的过程。

历史已经写好了证据:

- **2013 -- DeepMind DQN 玩 Atari。** 一个神经网络,只接收原始像素和游戏分数,学会了 7 款 Atari 2600 游戏 -- 超越了所有先前算法,在其中 3 款上击败了人类专家。到 2015 年,同一架构扩展到 [49 款游戏,达到职业人类测试员水平](https://www.nature.com/articles/nature14236),论文发表在 *Nature*。没有游戏专属规则。没有决策树。只有一个模型,从经验中学习。那个模型就是 agent。

- **2019 -- OpenAI Five 征服 Dota 2。** 五个神经网络,在 10 个月内与自己对战了 [45,000 年的 Dota 2](https://openai.com/index/openai-five-defeats-dota-2-world-champions/),在旧金山直播赛上 2-0 击败了 **OG** -- TI8 世界冠军。随后的公开竞技场中,AI 在 42,729 场比赛中胜率 99.4%。没有脚本化的策略。没有元编程的团队协调逻辑。模型完全通过自我对弈学会了团队协作、战术和实时适应。

- **2019 -- DeepMind AlphaStar 制霸星际争霸 II。** AlphaStar 在闭门赛中 [10-1 击败职业选手](https://deepmind.google/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii/),随后在欧洲服务器上达到[宗师段位](https://www.nature.com/articles/d41586-019-03298-6) -- 90,000 名玩家中的前 0.15%。一个信息不完全、实时决策、组合动作空间远超国际象棋和围棋的游戏。Agent 是什么?是模型。训练出来的。不是编出来的。

- **2019 -- 腾讯绝悟统治王者荣耀。** 腾讯 AI Lab 的"绝悟"于 2019 年 8 月 2 日世冠杯半决赛上[以 5v5 击败 KPL 职业选手](https://www.jiemian.com/article/3371171.html)。在 1v1 模式下,职业选手 [15 场只赢 1 场,最多坚持不到 8 分钟](https://developer.aliyun.com/article/851058)。训练强度:一天等于人类 440 年。到 2021 年,绝悟在全英雄池 BO5 上全面超越 KPL 职业选手水准。没有手工编写的英雄克制表。没有脚本化的阵容编排。一个从零开始通过自我对弈学习整个游戏的模型。

每一个里程碑都共享同一个架构:**一个训练好的模型,放进一个环境,给予感知和行动的能力。** "Agent"从来都不是外面那层壳。Agent 永远是模型本身。

### "开发 Agent"的两层含义

当一个人说"我在开发 Agent"时,他只可能是两个意思之一:

1. **训练模型。** 通过强化学习、微调、RLHF 或其他基于梯度的方法调整权重。这是 DeepMind、OpenAI、腾讯 AI Lab、Anthropic 在做的事。这是最本义的 Agent 开发 -- 你在从根本上塑造 Agent 的能力。

2. **构建 Harness(工作环境)。** 编写代码来给模型提供一个可操作的环境 -- 工具(文件读写、Shell、网络)、知识库(产品文档、领域资料)、观测通道(git diff、错误日志、浏览器状态)、行动接口(CLI、API 调用)。这就是本仓库所教的。它有价值、有必要,是真正的工程。但它不是在"构建 Agent"。它是在**构建 Agent 栖居的世界。**

模型做决策。Harness 执行。模型做推理。Harness 提供上下文。模型是飞行员。Harness 是驾驶舱。

这个仓库教你造驾驶舱。好的驾驶舱很重要 -- 它决定了 Agent 能看到什么、能做什么。但永远不要把驾驶舱和飞行员搞混。

```
THE AGENT PATTERN
=================
Expand All @@ -18,7 +56,8 @@


这是最小循环。每个 AI 编程 Agent 都需要这个循环。
生产级 Agent 还会叠加策略、权限与生命周期层。
模型决定何时调用工具、何时停止。
代码只是执行模型的要求。
```

**12 个递进式课程, 从简单循环到隔离化的自治执行。**
Expand Down Expand Up @@ -229,4 +268,4 @@ MIT

---

**模型就是智能体。我们的工作就是给它工具, 然后让开。**
**模型就是 Agent。代码是 Harness。搞清楚你在造哪个。**
45 changes: 42 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,44 @@
[English](./README.md) | [中文](./README-zh.md) | [日本語](./README-ja.md)
[English](./README.md) | [中文](./README-zh.md) | [日本語](./README-ja.md)
# Learn Claude Code -- A nano Claude Code-like agent, built from 0 to 1

## The Model IS the Agent

Before we talk about code, let's get one thing straight.

**An agent is a model. Not a framework. Not a flowchart. Not a prompt chain.**

The word "agent" has been hijacked. Somewhere along the way, an entire cottage industry decided that wiring together prompt nodes, if-else branches, and LLM API calls in a directed acyclic graph constitutes "building an agent." It doesn't. What they built is a Rube Goldberg machine -- an over-engineered, brittle pipeline of hardcoded rules dressed up with an LLM as a glorified text-completion node. That is not an agent. That is a shell script with delusions of grandeur.

**Let's be blunt: prompt-flow "agents" are the fantasy of programmers who don't train models.** They attempt to brute-force intelligence by stacking procedural logic -- massive if-else trees, node graphs, chain-of-prompt waterfalls -- and praying that enough glue code will somehow emergently produce autonomous behavior. It won't. You cannot engineer your way to agency. Agency is learned, not programmed. Those systems are dead on arrival: fragile, unscalable, and fundamentally incapable of generalization. They are the modern equivalent of GOFAI (Good Old-Fashioned AI) -- symbolic rule systems that the field abandoned decades ago, now resurrected with a coat of LLM paint.

### What an Agent Actually Is

Long before LLMs existed, the AI community had a precise definition: **an agent is a model that perceives its environment, makes decisions, and takes actions to achieve goals.** The emphasis is on *model* -- a learned function, not a scripted procedure.

The proof is written in history:

- **2013 -- DeepMind DQN plays Atari.** A single neural network, receiving only raw pixels and game scores, learned to play 7 Atari 2600 games -- surpassing all prior algorithms and beating human experts on 3 of them. By 2015, the same architecture scaled to [49 games and matched professional human testers](https://www.nature.com/articles/nature14236), published in *Nature*. No game-specific rules. No decision trees. Just a model, learning from experience. That model was the agent.

- **2019 -- OpenAI Five conquers Dota 2.** Five neural networks, having played [45,000 years of Dota 2](https://openai.com/index/openai-five-defeats-dota-2-world-champions/) against themselves in 10 months, defeated **OG** -- the reigning TI8 world champions -- 2-0 on a San Francisco livestream. In a subsequent public arena, the AI won 99.4% of 42,729 games against all comers. No scripted strategies. No meta-programmed team coordination logic. The models learned teamwork, tactics, and real-time adaptation entirely through self-play.

- **2019 -- DeepMind AlphaStar masters StarCraft II.** AlphaStar [beat professional players 10-1](https://deepmind.google/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii/) in a closed-door match, and later achieved [Grandmaster status](https://www.nature.com/articles/d41586-019-03298-6) on European servers -- top 0.15% of 90,000 players. A game with imperfect information, real-time decisions, and a combinatorial action space that dwarfs chess and Go. The agent? A model. Trained. Not scripted.

- **2019 -- Tencent Jueyu dominates Honor of Kings.** Tencent AI Lab's "Jueyu" (绝悟) [defeated KPL professional players](https://www.jiemian.com/article/3371171.html) in a full 5v5 match on August 2, 2019 at the World Champion Cup. In 1v1 mode, pros won only [1 out of 15 games and never survived past 8 minutes](https://developer.aliyun.com/article/851058). Training intensity: one day equaled 440 human years. By 2021, Jueyu surpassed KPL pros across the full hero pool. No handcrafted hero matchup tables. No scripted team compositions. A model that learned the game from scratch through self-play.

Every one of these milestones shares the same architecture: **a trained model, placed in an environment, given the ability to perceive and act.** The "agent" is never the harness. The agent is always the model.

### Two Meanings of "Developing an Agent"

When someone says "I'm developing an agent," they can only mean one of two things:

1. **Training the model.** Adjusting weights through reinforcement learning, fine-tuning, RLHF, or other gradient-based methods. This is what DeepMind, OpenAI, Tencent AI Lab, and Anthropic do. This is agent development in the truest sense -- you are literally shaping the agent's capabilities.

2. **Building the harness.** Writing the code that gives the model an environment to operate in -- tools (file I/O, shell, network), knowledge bases (product docs, domain references), observation channels (git diff, error logs, browser state), and action interfaces (CLI, API calls). This is what this repository teaches. It is valuable, necessary, and real engineering. But it is not "building the agent." It is **building the world the agent lives in.**

The model decides. The harness executes. The model reasons. The harness provides context. The model is the pilot. The harness is the cockpit.

This repo teaches you to build cockpits. Great cockpits matter -- they determine what the agent can see and do. But never confuse the cockpit with the pilot.

```
THE AGENT PATTERN
=================
Expand All @@ -17,7 +55,8 @@


That's the minimal loop. Every AI coding agent needs this loop.
Production agents add policy, permissions, and lifecycle layers.
The MODEL decides when to call tools and when to stop.
The CODE just executes what the model asks for.
```

**12 progressive sessions, from a simple loop to isolated autonomous execution.**
Expand Down Expand Up @@ -234,4 +273,4 @@ MIT

---

**The model is the agent. Our job is to give it tools and stay out of the way.**
**The model is the agent. The code is the harness. Know which one you're building.**