Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
319 changes: 163 additions & 156 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,156 +1,163 @@
<div align="center" id="opentinker">
<img src="assets/reallogo.png" alt="logo" width="500"/>

<p style="margin-top: 6px; font-size: 18px;">
<em>Democratizing Agentic Reinforcement Learning as a Service</em>
</p>

<p>
<a href="https://open-tinker.github.io/opentinker-page/">Project Page</a> ยท
<a href="https://deepwiki.com/open-tinker/OpenTinker">DeepWiki</a> ยท
<a href="https://join.slack.com/t/opentinker/shared_invite/zt-3lnxuxkfr-QZpzObNvW0DtcQUWzvKtQg">Slack</a> ยท
<a href="https://raw.githubusercontent.com/open-tinker/Assets/c94528fbb429d32bbd9896eec0ffc7e14172afd3/wechat_group.JPG">Wechat</a>
</p>
</div>

## ๐Ÿš€ Quick Start

Choose an example below to get started. Each example includes step-by-step instructions for setup, training, and inference.

| Task | Description | Performance |
| ------------------------------------------------ | ------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------- |
| **[LLM Single-Turn Math](docs/math_singleturn.md)** | Mathematical problem solving | [wandb](https://wandb.ai/zsqzz/Open-Tinker/runs/bwkq1wl8?nw=nwuserzhusq20) |
| **[LLM Multi-Turn Math](docs/math_multiturn.md)** | Multi-turn mathematical problem solving with tool calling | [wandb](https://wandb.ai/zsqzz/Open-Tinker/runs/f5pt6gcw?nw=nwuserzhusq20) |
| **[LLM Single-LoRA Single-Turn Math](docs/math_lora_singleturn.md)** | Math single-turn Trained With LoRA | [wandb](https://wandb.ai/zsqzz/Open-Tinker/runs/cl1w5l07?nw=nwuserzhusq20) |
| **[VLM Single-Turn Math](docs/vlm_geo3k_singleturn.md)** | geometry 3k math problem solving | [wandb](https://wandb.ai/zsqzz/Open-Tinker/runs/aidfc2y1?nw=nwuserzhusq20) |
| **[VLM Multi-Turn Math](docs/vlm_geo3k_multiturn.md)** | geometry 3k math problem solving with tool calling | [wandb](https://wandb.ai/zsqzz/Open-Tinker/runs/r39htm2o?nw=nwuserzhusq20) |
| **[LLM Gomoku Agent](docs/gomoku_multiturn.md)** | A multi-turn gomoku agent | [wandb](https://wandb.ai/zsqzz/Open-Tinker/runs/7a7ggkw3?nw=nwuserzhusq20) |
| **[LLM AlfWorld Agent](docs/alfworld_multiturn.md)** | A multi-turn alfworld agent | [wandb](https://wandb.ai/1125027232/opentinker-public/runs/3jrlolk7?nw=nwuser1125027232) |


## ๐Ÿ“ฆ Installation

### ๐Ÿ”น Common Setup (Client and Server)

#### Clone the Repository

```bash
git clone --recurse-submodules https://github.com/open-tinker/OpenTinker.git
cd OpenTinker
```

#### Install OpenTinker

```bash
pip install -e .
```

#### Install verl (core package)

```bash
cd verl
pip install -e .
cd ..
```

### ๐Ÿ’ป Client Setup

After completing the Common Setup, no additional steps are needed.

> **Note**
> The client currently relies on a small subset of functions from `verl`. This dependency is transitional. In future releases, the client will be fully decoupled from `verl`, allowing it to remain completely lightweight and independent of training-related code.

### ๐Ÿง  Server Setup

In addition to the Common Setup, it must install verl dependencies.

You can choose one of the following two approaches.

#### Option 1: Docker Installation (Recommended)

```bash
# Pull the verl Docker image
docker pull verlai/verl@sha256:3ce56ff018516b28ab9c4f4fc09d3aa67589074495ace75e2674b720aa4d0e5d

# Create and run container
docker run -dit \
--gpus all \
--restart=no \
--entrypoint /bin/bash \
--net=host \
--shm-size=10g \
--cap-add=SYS_ADMIN \
-v .:/workspace/dev \
--name tinker \
verlai/verl@sha256:3ce56ff018516b28ab9c4f4fc09d3aa67589074495ace75e2674b720aa4d0e5d
```

#### Option 2: Manual Installation

you can install verl dependencies manually. After completing the Common Setup, run:

```bash
cd verl
pip install -r requirements.txt
cd ..
```

This installs all GPU and training-related dependencies required by the server.

โš ๏ธ **Warning**
Manual installation may introduce version conflicts. For better stability and reproducibility, we recommend using the Docker-based setup whenever possible.

## ๐Ÿ” Authentication

OpenTinker includes a built-in authentication system to secure access to the scheduler API.

### Configuration

Edit `opentinker/scheduler/config/scheduler.yaml`:

```yaml
enable_auth: true # Set to true to enable authentication, false to disable authentication.
user_db_path: "scheduler_users.db"
```

### Quick Registration

Run the interactive script to register a user and get an API key:

```bash
python opentinker/scheduler/register_user_example.py
```

For advanced usage (REST API registration, using the key) and detailed configuration, see the [Scheduler & Dashboard Guide](opentinker/scheduler/SCHEDULER_GUIDE.md#authentication).

## ๐ŸŽฎ Environments

OpenTinker provides a flexible environment design framework that supports diverse training scenarios. Our architecture accommodates two orthogonal dimensions:

- **Data Source**: _Data-Dependent_ environments load structured datasets (e.g., parquet files) to provide prompts, while _Data-Free_ environments generate prompts dynamically from simulators or game engines.
- **Interaction Mode**: _Single-Turn_ environments involve one-shot model responses, while _Multi-Turn_ environments enable iterative interactions with tool calls and feedback loops.

This 2ร—2 design space enables four distinct paradigms, each suited to different learning objectives:

| Paradigm | Data Source | Interaction | Example Use Case |
| -------------------------------- | ----------- | ----------- | ------------------------------------- |
| **Data-Dependent ร— Single-Turn** | Dataset | One-shot | Math reasoning, QA tasks |
| **Data-Dependent ร— Multi-Turn** | Dataset | Iterative | Tool-assisted problem solving |
| **Data-Free ร— Single-Turn** | Simulator | One-shot | Bandit |
| **Data-Free ร— Multi-Turn** | Simulator | Iterative | Complex game playing, dialogue agents |

## ๐Ÿ“š Documentation

- [Scheduler & Dashboard Guide](opentinker/scheduler/SCHEDULER_GUIDE.md) - Configuration, Usage, and Web Dashboard

## ๐Ÿ“– Citation

```
@misc{opentinker2025,
title = {OpenTinker: Democratizing Agentic Reinforcement Learning as a Service},
author = {Siqi Zhu and Jiaxuan You},
year = {2025},
howpublished = {\url{https://github.com/open-tinker/OpenTinker}},
note = {GitHub repository}
}
```
<div align="center" id="opentinker">
<img src="assets/reallogo.png" alt="logo" width="500"/>

<p style="margin-top: 6px; font-size: 18px;">
<em>Democratizing Agentic Reinforcement Learning as a Service</em>
</p>

<p>
<a href="https://open-tinker.github.io/opentinker-page/">Project Page</a> ยท
<a href="https://deepwiki.com/open-tinker/OpenTinker">DeepWiki</a> ยท
<a href="https://join.slack.com/t/opentinker/shared_invite/zt-3lnxuxkfr-QZpzObNvW0DtcQUWzvKtQg">Slack</a> ยท
<a href="https://raw.githubusercontent.com/open-tinker/Assets/c94528fbb429d32bbd9896eec0ffc7e14172afd3/wechat_group.JPG">Wechat</a>
</p>
</div>

## ๐Ÿš€ Quick Start

Choose an example below to get started. Each example includes step-by-step instructions for setup, training, and inference.

| Task | Description | Performance |
| ------------------------------------------------ | ------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------- |
| **[LLM Single-Turn Math](docs/math_singleturn.md)** | Mathematical problem solving | [wandb](https://wandb.ai/zsqzz/Open-Tinker/runs/bwkq1wl8?nw=nwuserzhusq20) |
| **[LLM Multi-Turn Math](docs/math_multiturn.md)** | Multi-turn mathematical problem solving with tool calling | [wandb](https://wandb.ai/zsqzz/Open-Tinker/runs/f5pt6gcw?nw=nwuserzhusq20) |
| **[LLM Single-LoRA Single-Turn Math](docs/math_lora_singleturn.md)** | Math single-turn Trained With LoRA | [wandb](https://wandb.ai/zsqzz/Open-Tinker/runs/cl1w5l07?nw=nwuserzhusq20) |
| **[VLM Single-Turn Math](docs/vlm_geo3k_singleturn.md)** | geometry 3k math problem solving | [wandb](https://wandb.ai/zsqzz/Open-Tinker/runs/aidfc2y1?nw=nwuserzhusq20) |
| **[VLM Multi-Turn Math](docs/vlm_geo3k_multiturn.md)** | geometry 3k math problem solving with tool calling | [wandb](https://wandb.ai/zsqzz/Open-Tinker/runs/r39htm2o?nw=nwuserzhusq20) |
| **[LLM Gomoku Agent](docs/gomoku_multiturn.md)** | A multi-turn gomoku agent | [wandb](https://wandb.ai/zsqzz/Open-Tinker/runs/7a7ggkw3?nw=nwuserzhusq20) |
| **[LLM AlfWorld Agent](docs/alfworld_multiturn.md)** | A multi-turn alfworld agent | [wandb](https://wandb.ai/1125027232/opentinker-public/runs/3jrlolk7?nw=nwuser1125027232) |


## ๐Ÿ“ฆ Installation

### ๐Ÿ”น Common Setup (Client and Server)

#### Clone the Repository

```bash
git clone --recurse-submodules https://github.com/open-tinker/OpenTinker.git
cd OpenTinker
```

#### Install OpenTinker

```bash
pip install -e .
```

#### Install verl (core package)

```bash
cd verl
pip install -e .
cd ..
```

### ๐Ÿ’ป Client Setup

After completing the Common Setup, no additional steps are needed.

> **Note**
> The client currently relies on a small subset of functions from `verl`. This dependency is transitional. In future releases, the client will be fully decoupled from `verl`, allowing it to remain completely lightweight and independent of training-related code.

### ๐Ÿง  Server Setup

In addition to the Common Setup, it must install verl dependencies.

You can choose one of the following two approaches.

#### Option 1: Docker Installation (Recommended)

```bash
# Pull the verl Docker image
docker pull verlai/verl@sha256:3ce56ff018516b28ab9c4f4fc09d3aa67589074495ace75e2674b720aa4d0e5d

# Create and run container
docker run -dit \
--gpus all \
--restart=no \
--entrypoint /bin/bash \
--net=host \
--shm-size=10g \
--cap-add=SYS_ADMIN \
-v .:/workspace/dev \
--name tinker \
verlai/verl@sha256:3ce56ff018516b28ab9c4f4fc09d3aa67589074495ace75e2674b720aa4d0e5d
```

#### Option 2: Manual Installation

you can install verl dependencies manually. After completing the Common Setup, run:

```bash
cd verl
pip install -r requirements.txt
cd ..
```

This installs all GPU and training-related dependencies required by the server.

โš ๏ธ **Warning**
Manual installation may introduce version conflicts. For better stability and reproducibility, we recommend using the Docker-based setup whenever possible.

## ๐Ÿ” Authentication

OpenTinker includes a built-in authentication system to secure access to the scheduler API.

### Configuration

Edit `opentinker/scheduler/config/scheduler.yaml`:

```yaml
enable_auth: true # Set to true to enable authentication, false to disable authentication.
user_db_path: "scheduler_users.db"
```

### Quick Registration

Run the server to set up the backend:

```bash
chmod +x opentinker/scripts/launch_scheduler.sh
opentinker/scripts/launch_scheduler.sh
```

Run the interactive script to register a user and get an API key:

```bash
python opentinker/scheduler/register_user_example.py
```

For advanced usage (REST API registration, using the key) and detailed configuration, see the [Scheduler & Dashboard Guide](opentinker/scheduler/SCHEDULER_GUIDE.md#authentication).

## ๐ŸŽฎ Environments

OpenTinker provides a flexible environment design framework that supports diverse training scenarios. Our architecture accommodates two orthogonal dimensions:

- **Data Source**: _Data-Dependent_ environments load structured datasets (e.g., parquet files) to provide prompts, while _Data-Free_ environments generate prompts dynamically from simulators or game engines.
- **Interaction Mode**: _Single-Turn_ environments involve one-shot model responses, while _Multi-Turn_ environments enable iterative interactions with tool calls and feedback loops.

This 2ร—2 design space enables four distinct paradigms, each suited to different learning objectives:

| Paradigm | Data Source | Interaction | Example Use Case |
| -------------------------------- | ----------- | ----------- | ------------------------------------- |
| **Data-Dependent ร— Single-Turn** | Dataset | One-shot | Math reasoning, QA tasks |
| **Data-Dependent ร— Multi-Turn** | Dataset | Iterative | Tool-assisted problem solving |
| **Data-Free ร— Single-Turn** | Simulator | One-shot | Bandit |
| **Data-Free ร— Multi-Turn** | Simulator | Iterative | Complex game playing, dialogue agents |

## ๐Ÿ“š Documentation

- [Scheduler & Dashboard Guide](opentinker/scheduler/SCHEDULER_GUIDE.md) - Configuration, Usage, and Web Dashboard

## ๐Ÿ“– Citation

```
@misc{opentinker2025,
title = {OpenTinker: Democratizing Agentic Reinforcement Learning as a Service},
author = {Siqi Zhu and Jiaxuan You},
year = {2025},
howpublished = {\url{https://github.com/open-tinker/OpenTinker}},
note = {GitHub repository}
}
```