Skip to content

Commit 0b31a0c

Browse files
authored
feat(env): support android_world env per-turn training (#30)
* Android_world init * docs: complete android_world_multiturn.md and update android_world configurations Summary of changes: - Completed android_world_multiturn.md with setup, usage, and reward structure. - Updated android_world_param.yaml to set default env_shards to 1. - Added ADB and port environment variables to android_world_game.py. - Adjusted launch_scheduler.sh for local environment paths and GPU configuration. - Implemented prompt truncation in generic_agent_loop.py to prevent tensor size mismatch (replacing submodule modification). * feat: Android World multi-emulator training support with per-turn training Core Android World environment: - AndroidWorldGame with multi-emulator shard support - AndroidWorldServer with emulator-to-worker binding - Multimodal VL prompt templates (INITIAL/ACTION split) - gym_environment_interaction with worker-to-endpoint binding Android agent loop: - AndroidAgentLoop (1190 lines) with multimodal VL support - Per-turn training mode with expansion_index - <obs> separator based generation context optimization - Agent registry entry in agent.yaml Per-turn training system: - PerTurnAgentLoopManager expanding multi-turn episodes into per-turn samples - per_turn_agent_loop.py with expansion logic and reward gamma discounting - Backend patches: ray_trainer.py (PerTurnAgentLoopManager import), rollout.py (per_turn_training/per_turn_reward_gamma config) - http_training_server.py expansion of batch tensors via expansion_index Infrastructure improvements: - base_game.py/base_game_environment.py: agent_loop_name class attribute - job_scheduler.py: ROLLOUT_TRACE_JOB_ID env var + KL divergence forwarding - generic_agent_loop.py: per-job trace subdirectory isolation, JSONL fix - actor.yaml: ppo_max_token_len_per_gpu 32768 Scripts and config: - run_android.sh with env var configuration (no hardcoded paths) - launch_scheduler.sh with generic trace dir - launch_http_server.py with generic checkpoint dir - scheduler.yaml and android_world_param.yaml cleaned * docs: add Android World agent to README quick start table
1 parent ff6ec4f commit 0b31a0c

27 files changed

Lines changed: 3381 additions & 32 deletions

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ Choose an example below to get started. Each example includes step-by-step instr
2727
| **[VLM Multi-Turn Math](docs/vlm_geo3k_multiturn.md)** | geometry 3k math problem solving with tool calling | [wandb](https://wandb.ai/zsqzz/Open-Tinker/runs/r39htm2o?nw=nwuserzhusq20) |
2828
| **[LLM Gomoku Agent](docs/gomoku_multiturn.md)** | A multi-turn gomoku agent | [wandb](https://wandb.ai/zsqzz/Open-Tinker/runs/7a7ggkw3?nw=nwuserzhusq20) |
2929
| **[LLM AlfWorld Agent](docs/alfworld_multiturn.md)** | A multi-turn alfworld agent | [wandb](https://wandb.ai/1125027232/opentinker-public/runs/3jrlolk7?nw=nwuser1125027232) |
30+
| **[LLM Android World Agent](docs/android_world_multiturn.md)** | A multi-turn android world agent | |
3031

3132

3233
## 📦 Installation

docs/android_world_multiturn.md

Lines changed: 232 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,232 @@
1+
# LLM Game Agent (AndroidWorld Multi-Turn)
2+
3+
This example demonstrates training a language model to complete tasks in the Android operating system environment using AndroidWorld.
4+
5+
## Overview
6+
7+
**AndroidWorld** is a dynamic benchmarking environment for autonomous agents to interact with the Android operating system. The agent perceives the screen via a list of UI elements and interacts by performing actions like clicking, typing, and scrolling.
8+
9+
Tasks include:
10+
- Adding contacts
11+
- Managing settings
12+
- Browsing information
13+
- Sending messages
14+
- And more...
15+
16+
## Prerequisites
17+
18+
1. Complete the [Installation](../README.md#-installation) steps.
19+
2. **Environment Setup**: You must install the Android SDK and run an Emulator. See the **[Detailed Environment Setup](#detailed-environment-setup)** section below for instructions.
20+
3. Get your IP address: `hostname -I`
21+
22+
## Step 1: Start the Scheduler (Server Side)
23+
24+
```bash
25+
bash opentinker/scripts/launch_scheduler.sh --scheduler-port <scheduler_port>
26+
```
27+
28+
## Step 2: Start the AndroidWorld Environment (Server Side)
29+
30+
Before starting the environment server, ensure your Android Emulator is running (see setup below).
31+
32+
```bash
33+
python -m opentinker.environment.android_world.android_world_server \
34+
--port 8092 \
35+
--max_steps 50 \
36+
--split train
37+
```
38+
39+
**Server Options:**
40+
41+
- `--port`: Server port (default: 8082, recommend 8092 to match client config)
42+
- `--max_steps`: Max steps per episode (default: 50)
43+
- `--split`: Dataset split (`train`, `eval_in_distribution`, `eval_out_of_distribution`)
44+
- `--shards`: Number of parallel server instances (for parallel training)
45+
46+
## Step 3: Run Training
47+
48+
```bash
49+
python opentinker/client/android_world_rl.py \
50+
tokenizer_path=Qwen/Qwen2.5-3B-Instruct \
51+
batch_size=4 \
52+
val_batch_size=50 \
53+
num_steps=1000 \
54+
save_freq=20000 \
55+
test_freq=10 \
56+
scheduler_url=http://<server_endpoint>:<scheduler_port> \
57+
interaction.config.env_port=8092 \
58+
interaction.config.env_host=<env_server_endpoint>
59+
```
60+
61+
**Training Parameters:**
62+
63+
- `num_steps`: Total training steps (alternative: use `num_epochs`)
64+
- `batch_size`: Training batch size
65+
- `val_batch_size`: Validation samples per evaluation
66+
- `test_freq`: Validation frequency (every N steps)
67+
- `adv_estimator`: Advantage estimator (`gae`, `grpo`, `grpo_per_step`)
68+
69+
## Reward Structure
70+
71+
| Event | Reward |
72+
| :--------------- | ------ |
73+
| Task Success | +10.0 |
74+
| Task Failure | -1.0 |
75+
| Per Step Penalty | -0.01 |
76+
| Invalid Action | -0.1 |
77+
78+
## Example Actions
79+
80+
The agent interacts with the environment by outputting JSON commands referencing UI element indices:
81+
82+
- **Click**: `{"action_type": "click", "index": 4}`
83+
- **Type**: `{"action_type": "input_text", "text": "Alice", "index": 2}`
84+
- **Scroll**: `{"action_type": "scroll", "direction": "down"}`
85+
- **Open App**: `{"action_type": "open_app", "app_name": "Settings"}`
86+
- **Navigate Home**: `{"action_type": "navigate_home"}`
87+
- **Navigate Back**: `{"action_type": "navigate_back"}`
88+
- **Answer Question**: `{"action_type": "answer", "text": "It is 5 PM."}`
89+
- **Finish Task**: `{"action_type": "status", "goal_status": "complete"}`
90+
91+
## Configuration Reference
92+
93+
See [`opentinker/client/client_config/android_world_param.yaml`](../opentinker/client/client_config/android_world_param.yaml) for full configuration options.
94+
95+
---
96+
97+
## Detailed Environment Setup
98+
99+
### 1. Android SDK & Command Line Tools
100+
101+
If you do not have Android Studio installed, you can set up the command-line tools manually.
102+
103+
1. **Create Directory Structure:**
104+
```bash
105+
mkdir -p /usr/local/android-sdk/cmdline-tools
106+
cd /usr/local/android-sdk/cmdline-tools
107+
```
108+
109+
2. **Download Command Line Tools:**
110+
```bash
111+
wget https://dl.google.com/android/repository/commandlinetools-linux-11076708_latest.zip -O cmdline-tools.zip
112+
unzip cmdline-tools.zip
113+
mv cmdline-tools latest
114+
rm cmdline-tools.zip
115+
```
116+
117+
3. **Install SDK Components:**
118+
```bash
119+
export ANDROID_HOME=/usr/local/android-sdk
120+
export PATH=$ANDROID_HOME/cmdline-tools/latest/bin:$PATH
121+
122+
# Accept licenses
123+
yes | sdkmanager --licenses --sdk_root=$ANDROID_HOME
124+
125+
# Install Platform Tools (adb), Android 33 Platform, and Build Tools
126+
sdkmanager "platform-tools" "platforms;android-33" "build-tools;34.0.0" "emulator" --sdk_root=$ANDROID_HOME
127+
```
128+
129+
4. **Configure Environment Variables:**
130+
Add the following to your shell configuration file (`~/.bashrc` or `~/.zshrc`):
131+
```bash
132+
export JAVA_HOME="/usr/local/android-studio/jbr" # Or your JDK path
133+
export ANDROID_HOME="/usr/local/android-sdk"
134+
export PATH="$JAVA_HOME/bin:$ANDROID_HOME/cmdline-tools/latest/bin:$ANDROID_HOME/platform-tools:$ANDROID_HOME/emulator:$PATH"
135+
```
136+
137+
### 2. Create Android Virtual Device (AVD)
138+
139+
Create an AVD named `AndroidWorldAvd` targeting Android 13 (Tiramisu, API 33).
140+
141+
1. **Install System Image:**
142+
* For x86_64 (Standard PC):
143+
```bash
144+
sdkmanager "system-images;android-33;google_apis;x86_64" --sdk_root=$ANDROID_HOME
145+
```
146+
* For ARM64 (Apple Silicon or Software Emulation on x86):
147+
```bash
148+
sdkmanager "system-images;android-33;google_apis;arm64-v8a" --sdk_root=$ANDROID_HOME
149+
```
150+
151+
2. **Create AVD:**
152+
```bash
153+
echo "no" | avdmanager create avd --name AndroidWorldAvd --package "system-images;android-33;google_apis;x86_64" --device "pixel_6"
154+
```
155+
*(Replace `x86_64` with `arm64-v8a` if applicable)*
156+
157+
### 3. Launch Emulator
158+
159+
Start the emulator in a separate terminal or background process using the `sg` command to ensure correct group permissions (e.g., `kvm`).
160+
161+
* **Standard Launch (with GUI):**
162+
```bash
163+
sg kvm -c "emulator -avd AndroidWorldAvd -no-snapshot -grpc 8554"
164+
```
165+
166+
* **Headless Launch (Server/Docker):**
167+
```bash
168+
sg kvm -c "emulator -avd AndroidWorldAvd -no-snapshot -grpc 8554 -no-window -no-audio"
169+
```
170+
171+
* **Software Emulation (No KVM):**
172+
If hardware acceleration is unavailable, add `-accel off`. **Warning: Performance will be very low.**
173+
```bash
174+
emulator -avd AndroidWorldAvd -no-snapshot -grpc 8554 -no-window -no-audio -accel off
175+
```
176+
177+
## Quick Start with `run_android.sh`
178+
179+
For multi-emulator parallel training, we provide an all-in-one launcher script [`opentinker/scripts/run_android.sh`](../opentinker/scripts/run_android.sh) that automates AVD creation, emulator startup, environment server, and training client.
180+
181+
### Usage
182+
183+
Run each step in a **separate terminal**:
184+
185+
```bash
186+
# Step 0 (one-time): Create N AVDs for parallel training
187+
bash opentinker/scripts/run_android.sh setup-avds
188+
189+
# Step 1: Start the scheduler
190+
bash opentinker/scripts/run_android.sh scheduler
191+
192+
# Step 2: Start N Android emulators in parallel
193+
bash opentinker/scripts/run_android.sh simulator
194+
195+
# Step 3: Start the sharded environment server (after emulators fully boot)
196+
bash opentinker/scripts/run_android.sh env
197+
198+
# Step 4: Launch RL training
199+
bash opentinker/scripts/run_android.sh client
200+
```
201+
202+
### Environment Variables
203+
204+
All settings are configurable via environment variables:
205+
206+
| Variable | Default | Description |
207+
| :------- | :------ | :---------- |
208+
| `NUM_EMULATORS` | `4` | Number of parallel emulators |
209+
| `NUM_GPUS` | `4` | Number of GPUs for model parallelism |
210+
| `GPUS` | `[0,1,2,3]` | GPU device list |
211+
| `MODEL_PATH` | `Qwen/Qwen2.5-3B-Instruct` | Model path or HuggingFace ID |
212+
| `AVD_NAME` | `AndroidWorldAvd` | AVD name prefix (creates `{AVD_NAME}_0`, `{AVD_NAME}_1`, ...) |
213+
| `EMULATOR_HEADLESS` | `1` | Set `0` to show emulator GUI |
214+
| `EMULATOR_NO_KVM` | `0` | Set `1` for software emulation (slow) |
215+
| `SCHEDULER_PORT` | `9780` | Scheduler listen port |
216+
| `ENV_PORT` | `9092` | Environment server base port |
217+
218+
**Example** — scale to 8 emulators on 8 GPUs:
219+
220+
```bash
221+
NUM_EMULATORS=8 NUM_GPUS=8 GPUS="[0,1,2,3,4,5,6,7]" bash opentinker/scripts/run_android.sh setup-avds
222+
# Then run scheduler / simulator / env / client with the same env vars
223+
```
224+
225+
---
226+
227+
## Troubleshooting
228+
229+
* **"KVM is not found"**: Ensure virtualization is enabled in your BIOS/Hypervisor. On Linux, check permissions for `/dev/kvm`. If in a container, run with `--device /dev/kvm`.
230+
* **Emulator crashes immediately**: Check logs. If running x86_64 image on ARM or vice-versa, the emulator will fail. Use the correct system image for your host architecture.
231+
* **"ADB command not found"**: Ensure `platform-tools` is in your `$PATH`.
232+
* **"Process system isn't responding"**: Common in software emulation (`-accel off`). Wait for the system to stabilize or dismiss the dialog.

opentinker/backend_patch/verl/experimental/__init__.py

Whitespace-only changes.

opentinker/backend_patch/verl/experimental/agent_loop/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)