|
| 1 | +# OpenHands Harness |
| 2 | + |
| 3 | +Run CodeScaleBench tasks using the OpenHands agent instead of Claude Code. |
| 4 | + |
| 5 | +## Prerequisites |
| 6 | + |
| 7 | +- Harbor CLI installed (`uv tool install harbor`) |
| 8 | +- Docker running locally or `HARBOR_ENV=daytona` for cloud execution |
| 9 | +- `.env.local` at project root with credentials (see Auth Setup below) |
| 10 | + |
| 11 | +## Model Configuration |
| 12 | + |
| 13 | +The `MODEL` env var accepts any LiteLLM-format string (`provider/model-name`): |
| 14 | + |
| 15 | +| Model | MODEL value | Short name | |
| 16 | +|-------|------------|------------| |
| 17 | +| Opus 4.6 (default) | `anthropic/claude-opus-4-6` | `opus46` | |
| 18 | +| Sonnet 4.6 | `anthropic/claude-sonnet-4-6` | `sonnet46` | |
| 19 | +| Sonnet 4.5 | `anthropic/claude-sonnet-4-5-20241022` | `sonnet45` | |
| 20 | +| Haiku 4.5 | `anthropic/claude-haiku-4-5-20251001` | `haiku45` | |
| 21 | +| GPT-4o | `openai/gpt-4o` | `gpt4o` | |
| 22 | +| Codex | `openai/gpt-5.3-codex` | `gpt53codex` | |
| 23 | + |
| 24 | +The short name determines the run directory name (e.g. `runs/staging/openhands_sonnet46_20260306_120000/`). |
| 25 | + |
| 26 | +## Auth Setup |
| 27 | + |
| 28 | +### Anthropic Models (OAuth Subscription) |
| 29 | + |
| 30 | +The project uses Claude Max subscription tokens, not API keys. The agent reads the OAuth access token from `~/.claude/.credentials.json` and injects it into `ANTHROPIC_API_KEY` so Harbor's resolver can find it. |
| 31 | + |
| 32 | +Ensure tokens are fresh before launching: |
| 33 | +```bash |
| 34 | +source configs/_common.sh |
| 35 | +load_credentials |
| 36 | +ensure_fresh_token_all |
| 37 | +``` |
| 38 | + |
| 39 | +If `ANTHROPIC_API_KEY` is explicitly set in `.env.local`, it takes precedence over OAuth. |
| 40 | + |
| 41 | +### OpenAI Models |
| 42 | + |
| 43 | +Set `OPENAI_API_KEY` in `.env.local`. For Codex models, you can also use `CODEX_API_KEY`. |
| 44 | + |
| 45 | +## Example Commands |
| 46 | + |
| 47 | +```bash |
| 48 | +# Full 2-config run (baseline + MCP) with Sonnet 4.6 |
| 49 | +MODEL=anthropic/claude-sonnet-4-6 ./configs/openhands_2config.sh |
| 50 | + |
| 51 | +# Baseline-only with Opus 4.6 (default model) |
| 52 | +./configs/openhands_2config.sh --baseline-only |
| 53 | + |
| 54 | +# Single task |
| 55 | +./configs/openhands_2config.sh --benchmark csb_sdlc_fix --task my-task-001 |
| 56 | + |
| 57 | +# Override parallelism |
| 58 | +./configs/openhands_2config.sh --parallel 4 |
| 59 | + |
| 60 | +# GPT-4o run |
| 61 | +MODEL=openai/gpt-4o ./configs/openhands_2config.sh --baseline-only |
| 62 | +``` |
| 63 | + |
| 64 | +## Run Directory Structure |
| 65 | + |
| 66 | +``` |
| 67 | +runs/staging/openhands_sonnet46_20260306_120000/ |
| 68 | + baseline-local-direct/ |
| 69 | + task-name__abcd1234/ |
| 70 | + result.json |
| 71 | + task-name.log |
| 72 | + mcp-remote-direct/ |
| 73 | + task-name__abcd1234/ |
| 74 | + result.json |
| 75 | + task-name.log |
| 76 | +``` |
| 77 | + |
| 78 | +## Architecture |
| 79 | + |
| 80 | +- OpenHands runs **inside the Docker container** (installed by Harbor's template), not on the host |
| 81 | +- `agents/harnesses/openhands/agent.py` extends Harbor's built-in `OpenHands` agent + `BaselineHarnessMixin` |
| 82 | +- `BaselineHarnessMixin` (`agents/harnesses/base.py`) handles instruction preparation, MCP configuration, and container env propagation |
| 83 | +- The 2-config launcher (`configs/openhands_2config.sh`) runs baseline (no MCP) then MCP-Full (Sourcegraph) |
| 84 | + |
| 85 | +## Known Limitations |
| 86 | + |
| 87 | +- Codex models require the `openai/` prefix for LiteLLM; the agent adds this automatically |
| 88 | +- OAuth tokens expire after ~8 hours; long runs should call `ensure_fresh_token_all` between batches |
| 89 | +- OpenHands agent does not support `CLAUDE_CODE_OAUTH_TOKEN` — it uses `LLM_API_KEY` for all providers |
0 commit comments