|
| 1 | +# LLM Game Agent (AndroidWorld Multi-Turn) |
| 2 | + |
| 3 | +This example demonstrates training a language model to complete tasks in the Android operating system environment using AndroidWorld. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +**AndroidWorld** is a dynamic benchmarking environment for autonomous agents to interact with the Android operating system. The agent perceives the screen via a list of UI elements and interacts by performing actions like clicking, typing, and scrolling. |
| 8 | + |
| 9 | +Tasks include: |
| 10 | +- Adding contacts |
| 11 | +- Managing settings |
| 12 | +- Browsing information |
| 13 | +- Sending messages |
| 14 | +- And more... |
| 15 | + |
| 16 | +## Prerequisites |
| 17 | + |
| 18 | +1. Complete the [Installation](../README.md#-installation) steps. |
| 19 | +2. **Environment Setup**: You must install the Android SDK and run an Emulator. See the **[Detailed Environment Setup](#detailed-environment-setup)** section below for instructions. |
| 20 | +3. Get your IP address: `hostname -I` |
| 21 | + |
| 22 | +## Step 1: Start the Scheduler (Server Side) |
| 23 | + |
| 24 | +```bash |
| 25 | +bash opentinker/scripts/launch_scheduler.sh --scheduler-port <scheduler_port> |
| 26 | +``` |
| 27 | + |
| 28 | +## Step 2: Start the AndroidWorld Environment (Server Side) |
| 29 | + |
| 30 | +Before starting the environment server, ensure your Android Emulator is running (see setup below). |
| 31 | + |
| 32 | +```bash |
| 33 | +python -m opentinker.environment.android_world.android_world_server \ |
| 34 | + --port 8092 \ |
| 35 | + --max_steps 50 \ |
| 36 | + --split train |
| 37 | +``` |
| 38 | + |
| 39 | +**Server Options:** |
| 40 | + |
| 41 | +- `--port`: Server port (default: 8082, recommend 8092 to match client config) |
| 42 | +- `--max_steps`: Max steps per episode (default: 50) |
| 43 | +- `--split`: Dataset split (`train`, `eval_in_distribution`, `eval_out_of_distribution`) |
| 44 | +- `--shards`: Number of parallel server instances (for parallel training) |
| 45 | + |
| 46 | +## Step 3: Run Training |
| 47 | + |
| 48 | +```bash |
| 49 | +python opentinker/client/android_world_rl.py \ |
| 50 | + tokenizer_path=Qwen/Qwen2.5-3B-Instruct \ |
| 51 | + batch_size=4 \ |
| 52 | + val_batch_size=50 \ |
| 53 | + num_steps=1000 \ |
| 54 | + save_freq=20000 \ |
| 55 | + test_freq=10 \ |
| 56 | + scheduler_url=http://<server_endpoint>:<scheduler_port> \ |
| 57 | + interaction.config.env_port=8092 \ |
| 58 | + interaction.config.env_host=<env_server_endpoint> |
| 59 | +``` |
| 60 | + |
| 61 | +**Training Parameters:** |
| 62 | + |
| 63 | +- `num_steps`: Total training steps (alternative: use `num_epochs`) |
| 64 | +- `batch_size`: Training batch size |
| 65 | +- `val_batch_size`: Validation samples per evaluation |
| 66 | +- `test_freq`: Validation frequency (every N steps) |
| 67 | +- `adv_estimator`: Advantage estimator (`gae`, `grpo`, `grpo_per_step`) |
| 68 | + |
| 69 | +## Reward Structure |
| 70 | + |
| 71 | +| Event | Reward | |
| 72 | +| :--------------- | ------ | |
| 73 | +| Task Success | +10.0 | |
| 74 | +| Task Failure | -1.0 | |
| 75 | +| Per Step Penalty | -0.01 | |
| 76 | +| Invalid Action | -0.1 | |
| 77 | + |
| 78 | +## Example Actions |
| 79 | + |
| 80 | +The agent interacts with the environment by outputting JSON commands referencing UI element indices: |
| 81 | + |
| 82 | +- **Click**: `{"action_type": "click", "index": 4}` |
| 83 | +- **Type**: `{"action_type": "input_text", "text": "Alice", "index": 2}` |
| 84 | +- **Scroll**: `{"action_type": "scroll", "direction": "down"}` |
| 85 | +- **Open App**: `{"action_type": "open_app", "app_name": "Settings"}` |
| 86 | +- **Navigate Home**: `{"action_type": "navigate_home"}` |
| 87 | +- **Navigate Back**: `{"action_type": "navigate_back"}` |
| 88 | +- **Answer Question**: `{"action_type": "answer", "text": "It is 5 PM."}` |
| 89 | +- **Finish Task**: `{"action_type": "status", "goal_status": "complete"}` |
| 90 | + |
| 91 | +## Configuration Reference |
| 92 | + |
| 93 | +See [`opentinker/client/client_config/android_world_param.yaml`](../opentinker/client/client_config/android_world_param.yaml) for full configuration options. |
| 94 | + |
| 95 | +--- |
| 96 | + |
| 97 | +## Detailed Environment Setup |
| 98 | + |
| 99 | +### 1. Android SDK & Command Line Tools |
| 100 | + |
| 101 | +If you do not have Android Studio installed, you can set up the command-line tools manually. |
| 102 | + |
| 103 | +1. **Create Directory Structure:** |
| 104 | + ```bash |
| 105 | + mkdir -p /usr/local/android-sdk/cmdline-tools |
| 106 | + cd /usr/local/android-sdk/cmdline-tools |
| 107 | + ``` |
| 108 | + |
| 109 | +2. **Download Command Line Tools:** |
| 110 | + ```bash |
| 111 | + wget https://dl.google.com/android/repository/commandlinetools-linux-11076708_latest.zip -O cmdline-tools.zip |
| 112 | + unzip cmdline-tools.zip |
| 113 | + mv cmdline-tools latest |
| 114 | + rm cmdline-tools.zip |
| 115 | + ``` |
| 116 | + |
| 117 | +3. **Install SDK Components:** |
| 118 | + ```bash |
| 119 | + export ANDROID_HOME=/usr/local/android-sdk |
| 120 | + export PATH=$ANDROID_HOME/cmdline-tools/latest/bin:$PATH |
| 121 | +
|
| 122 | + # Accept licenses |
| 123 | + yes | sdkmanager --licenses --sdk_root=$ANDROID_HOME |
| 124 | +
|
| 125 | + # Install Platform Tools (adb), Android 33 Platform, and Build Tools |
| 126 | + sdkmanager "platform-tools" "platforms;android-33" "build-tools;34.0.0" "emulator" --sdk_root=$ANDROID_HOME |
| 127 | + ``` |
| 128 | + |
| 129 | +4. **Configure Environment Variables:** |
| 130 | + Add the following to your shell configuration file (`~/.bashrc` or `~/.zshrc`): |
| 131 | + ```bash |
| 132 | + export JAVA_HOME="/usr/local/android-studio/jbr" # Or your JDK path |
| 133 | + export ANDROID_HOME="/usr/local/android-sdk" |
| 134 | + export PATH="$JAVA_HOME/bin:$ANDROID_HOME/cmdline-tools/latest/bin:$ANDROID_HOME/platform-tools:$ANDROID_HOME/emulator:$PATH" |
| 135 | + ``` |
| 136 | + |
| 137 | +### 2. Create Android Virtual Device (AVD) |
| 138 | + |
| 139 | +Create an AVD named `AndroidWorldAvd` targeting Android 13 (Tiramisu, API 33). |
| 140 | + |
| 141 | +1. **Install System Image:** |
| 142 | + * For x86_64 (Standard PC): |
| 143 | + ```bash |
| 144 | + sdkmanager "system-images;android-33;google_apis;x86_64" --sdk_root=$ANDROID_HOME |
| 145 | + ``` |
| 146 | + * For ARM64 (Apple Silicon or Software Emulation on x86): |
| 147 | + ```bash |
| 148 | + sdkmanager "system-images;android-33;google_apis;arm64-v8a" --sdk_root=$ANDROID_HOME |
| 149 | + ``` |
| 150 | + |
| 151 | +2. **Create AVD:** |
| 152 | + ```bash |
| 153 | + echo "no" | avdmanager create avd --name AndroidWorldAvd --package "system-images;android-33;google_apis;x86_64" --device "pixel_6" |
| 154 | + ``` |
| 155 | + *(Replace `x86_64` with `arm64-v8a` if applicable)* |
| 156 | + |
| 157 | +### 3. Launch Emulator |
| 158 | + |
| 159 | +Start the emulator in a separate terminal or background process using the `sg` command to ensure correct group permissions (e.g., `kvm`). |
| 160 | + |
| 161 | +* **Standard Launch (with GUI):** |
| 162 | + ```bash |
| 163 | + sg kvm -c "emulator -avd AndroidWorldAvd -no-snapshot -grpc 8554" |
| 164 | + ``` |
| 165 | + |
| 166 | +* **Headless Launch (Server/Docker):** |
| 167 | + ```bash |
| 168 | + sg kvm -c "emulator -avd AndroidWorldAvd -no-snapshot -grpc 8554 -no-window -no-audio" |
| 169 | + ``` |
| 170 | + |
| 171 | +* **Software Emulation (No KVM):** |
| 172 | + If hardware acceleration is unavailable, add `-accel off`. **Warning: Performance will be very low.** |
| 173 | + ```bash |
| 174 | + emulator -avd AndroidWorldAvd -no-snapshot -grpc 8554 -no-window -no-audio -accel off |
| 175 | + ``` |
| 176 | + |
| 177 | +## Quick Start with `run_android.sh` |
| 178 | + |
| 179 | +For multi-emulator parallel training, we provide an all-in-one launcher script [`opentinker/scripts/run_android.sh`](../opentinker/scripts/run_android.sh) that automates AVD creation, emulator startup, environment server, and training client. |
| 180 | + |
| 181 | +### Usage |
| 182 | + |
| 183 | +Run each step in a **separate terminal**: |
| 184 | + |
| 185 | +```bash |
| 186 | +# Step 0 (one-time): Create N AVDs for parallel training |
| 187 | +bash opentinker/scripts/run_android.sh setup-avds |
| 188 | +
|
| 189 | +# Step 1: Start the scheduler |
| 190 | +bash opentinker/scripts/run_android.sh scheduler |
| 191 | +
|
| 192 | +# Step 2: Start N Android emulators in parallel |
| 193 | +bash opentinker/scripts/run_android.sh simulator |
| 194 | +
|
| 195 | +# Step 3: Start the sharded environment server (after emulators fully boot) |
| 196 | +bash opentinker/scripts/run_android.sh env |
| 197 | +
|
| 198 | +# Step 4: Launch RL training |
| 199 | +bash opentinker/scripts/run_android.sh client |
| 200 | +``` |
| 201 | + |
| 202 | +### Environment Variables |
| 203 | + |
| 204 | +All settings are configurable via environment variables: |
| 205 | + |
| 206 | +| Variable | Default | Description | |
| 207 | +| :------- | :------ | :---------- | |
| 208 | +| `NUM_EMULATORS` | `4` | Number of parallel emulators | |
| 209 | +| `NUM_GPUS` | `4` | Number of GPUs for model parallelism | |
| 210 | +| `GPUS` | `[0,1,2,3]` | GPU device list | |
| 211 | +| `MODEL_PATH` | `Qwen/Qwen2.5-3B-Instruct` | Model path or HuggingFace ID | |
| 212 | +| `AVD_NAME` | `AndroidWorldAvd` | AVD name prefix (creates `{AVD_NAME}_0`, `{AVD_NAME}_1`, ...) | |
| 213 | +| `EMULATOR_HEADLESS` | `1` | Set `0` to show emulator GUI | |
| 214 | +| `EMULATOR_NO_KVM` | `0` | Set `1` for software emulation (slow) | |
| 215 | +| `SCHEDULER_PORT` | `9780` | Scheduler listen port | |
| 216 | +| `ENV_PORT` | `9092` | Environment server base port | |
| 217 | + |
| 218 | +**Example** — scale to 8 emulators on 8 GPUs: |
| 219 | + |
| 220 | +```bash |
| 221 | +NUM_EMULATORS=8 NUM_GPUS=8 GPUS="[0,1,2,3,4,5,6,7]" bash opentinker/scripts/run_android.sh setup-avds |
| 222 | +# Then run scheduler / simulator / env / client with the same env vars |
| 223 | +``` |
| 224 | + |
| 225 | +--- |
| 226 | + |
| 227 | +## Troubleshooting |
| 228 | + |
| 229 | +* **"KVM is not found"**: Ensure virtualization is enabled in your BIOS/Hypervisor. On Linux, check permissions for `/dev/kvm`. If in a container, run with `--device /dev/kvm`. |
| 230 | +* **Emulator crashes immediately**: Check logs. If running x86_64 image on ARM or vice-versa, the emulator will fail. Use the correct system image for your host architecture. |
| 231 | +* **"ADB command not found"**: Ensure `platform-tools` is in your `$PATH`. |
| 232 | +* **"Process system isn't responding"**: Common in software emulation (`-accel off`). Wait for the system to stabilize or dismiss the dialog. |
0 commit comments