Skip to content

Comments

Add Austin profiling workflow for SWE-bench runs#2191

Draft
csmith49 wants to merge 27 commits intomainfrom
feature/austin-profiling-workflow
Draft

Add Austin profiling workflow for SWE-bench runs#2191
csmith49 wants to merge 27 commits intomainfrom
feature/austin-profiling-workflow

Conversation

@csmith49
Copy link
Collaborator

@csmith49 csmith49 commented Feb 23, 2026

Summary

This PR adds a new GitHub Actions workflow for profiling OpenHands agent runs using the Austin profiler to identify performance bottlenecks on real workloads.

What's Included

1. Profile Runner Workflow (.github/workflows/profile-runner.yml)

A workflow that:

  • Profiles actual SWE-bench instances via the benchmarks repo (not just integration tests)
  • Generates flame graphs (SVG) and Speedscope JSON for analysis
  • Posts summary reports to PRs with artifact download links

Triggers:

  • Add profile-test label to a PR
  • Manual dispatch via Actions UI

Configurable Options:

Option Description Default
profile_target swebench or example swebench
instance_ids SWE-bench instance IDs django__django-11039
model_id Model from resolve_model_config.py claude-haiku-4-5
max_iterations Agent iteration limit 30
sampling_interval Austin sampling interval (µs) 100

2. Flame Graph Generator (scripts/generate_flamegraph.py)

A standalone script that converts Austin profiler output to:

  • Collapsed stack format (for flamegraph.pl SVG generation)
  • Speedscope JSON (for interactive web-based analysis at https://speedscope.app)
  • Markdown summary with top functions and call stacks

3. Documentation (AGENTS.md)

Added <PROFILING_WORKFLOW> section with usage instructions.

How It Works

┌─────────────────────────────────────────────────────────────┐
│  1. Checkout SDK repo + benchmarks repo                     │
│  2. Link SDK into benchmarks as submodule                   │
│  3. Run SWE-bench inference wrapped with Austin profiler    │
│  4. Generate flame graphs from Austin output                │
│  5. Post results to PR                                      │
└─────────────────────────────────────────────────────────────┘

Example PR Comment Output

## 🔬 Profiling Results

### Overview
- **Total Samples:** 15,234
- **Unique Call Stacks:** 892

### Top Functions by Sample Count
| Function | Samples | % |
|----------|---------|---|
| `_call_llm` | 3,421 | 22.5% |
| `parse_response` | 2,100 | 13.8% |
...

### Generated Artifacts
**Flame Graphs (SVG):** `swebench.svg`
**Speedscope Profiles:** `swebench.speedscope.json`

---
📦 **Artifacts:** [Download from workflow run](link)

Testing

To test this workflow:

  1. Merge this PR
  2. Add the profile-test label to trigger profiling, OR
  3. Go to Actions → "Profile OpenHands Run" → Run workflow

Notes

  • Requires ALLHANDS_BOT_GITHUB_PAT secret for benchmarks repo access
  • The SWE-bench profiling command may need adjustment based on actual benchmarks repo structure
  • Uses --children flag to capture all child process activity

@csmith49 can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:85b6ead-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-85b6ead-python \
  ghcr.io/openhands/agent-server:85b6ead-python

All tags pushed for this build

ghcr.io/openhands/agent-server:85b6ead-golang-amd64
ghcr.io/openhands/agent-server:85b6ead-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:85b6ead-golang-arm64
ghcr.io/openhands/agent-server:85b6ead-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:85b6ead-java-amd64
ghcr.io/openhands/agent-server:85b6ead-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:85b6ead-java-arm64
ghcr.io/openhands/agent-server:85b6ead-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:85b6ead-python-amd64
ghcr.io/openhands/agent-server:85b6ead-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:85b6ead-python-arm64
ghcr.io/openhands/agent-server:85b6ead-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:85b6ead-golang
ghcr.io/openhands/agent-server:85b6ead-java
ghcr.io/openhands/agent-server:85b6ead-python

About Multi-Architecture Support

  • Each variant tag (e.g., 85b6ead-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 85b6ead-python-amd64) are also available if needed

This PR adds a new GitHub Actions workflow for profiling OpenHands agent
runs using the Austin profiler. The workflow:

- Profiles actual SWE-bench instances (not just integration tests)
- Generates flame graphs (SVG) and Speedscope JSON for analysis
- Posts summary reports to PRs with artifact links

New files:
- .github/workflows/profile-runner.yml: Main workflow
- scripts/generate_flamegraph.py: Austin output to flame graph converter

Triggers:
- Add 'profile-test' label to a PR
- Manual dispatch with configurable options

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 23, 2026

API breakage checks (Griffe)

Result: Passed

Action log

csmith49 and others added 26 commits February 23, 2026 13:21
This allows the workflow to run when pushing to the feature branch,
enabling testing before merging to main.

Co-authored-by: openhands <openhands@all-hands.dev>
The previous default 'claude-haiku-4-5' doesn't exist in the model config.

Co-authored-by: openhands <openhands@all-hands.dev>
The benchmarks repo expects the SDK at vendor/software-agent-sdk/ based on
its uv workspace configuration.

Also use swebench-infer entry point instead of python -m.

Co-authored-by: openhands <openhands@all-hands.dev>
Austin needs to profile the Python interpreter directly, not the uv wrapper.
Get the Python path with 'uv run which python' and call it directly.

Co-authored-by: openhands <openhands@all-hands.dev>
swebench-infer requires:
1. llm_config_path as a positional argument (JSON file)
2. --select for instance IDs (text file, one per line)

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
The benchmarks version.py runs 'git submodule status' which fails
with symlinks. Clone the SDK properly instead.

Co-authored-by: openhands <openhands@all-hands.dev>
Instead of trying to integrate with the complex benchmarks repo (which
has git submodule version.py requirements), profile a realistic SDK
conversation directly. This still exercises the core code paths:
- LLM calls
- Tool execution (file operations)
- Conversation loop

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
GitHub Actions doesn't handle heredocs well in YAML. Use a separate
Python script file instead.

New: scripts/profile_conversation.py - SWE-bench style profiling script

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
- Install austin-dist and austin-python from PyPI for latest Austin 4.0+
- Austin 4.0+ uses MOJO binary format, need mojo2austin for conversion
- Fix profile_conversation.py to use correct SDK API:
  - Use Tool(name=...) instead of get_default_tools()
  - Agent doesn't take system_message parameter directly
  - Use conversation.run(max_iterations=...) instead of iteration
- Add debugging output to diagnose 'No such process' error
- Add simple test script to verify Austin works before real profiling

Co-authored-by: openhands <openhands@all-hands.dev>
- Install austin-dist with sudo pip so it's available to sudo commands
- Remove redundant austin-python install in profiling step
- Use mojo2austin directly instead of uv run

Co-authored-by: openhands <openhands@all-hands.dev>
… syntax

- Profile script was failing silently because LLM() didn't get credentials
- Add api_key and base_url from environment variables
- Fix mojo2austin command to provide output file argument

Co-authored-by: openhands <openhands@all-hands.dev>
- Add startup message to verify script runs
- Add exception handling to capture any errors
- Remove unused traceback import

Co-authored-by: openhands <openhands@all-hands.dev>
- The script was exiting before print statements in main()
- This means the crash happens during import of openhands modules
- Add debug output at module level to capture this

Co-authored-by: openhands <openhands@all-hands.dev>
- sudo strips env vars by default, so LLM_API_KEY etc were not available
- sudo -E preserves the environment for the Austin child process
- Add debug output for working directory and Python path

Co-authored-by: openhands <openhands@all-hands.dev>
- Pass env vars explicitly through sudo instead of relying on sudo -E
- Add PYTHONPATH to include venv site-packages
- Add test of script running without Austin to confirm import works
- Pass HOME variable to help with any path resolution issues

Co-authored-by: openhands <openhands@all-hands.dev>
Create a shell wrapper script that:
1. Exports all required environment variables (LLM_API_KEY, etc.)
2. Changes to the correct working directory
3. Executes the Python profile script

This avoids issues with sudo stripping environment variables
and with PYTHONPATH conflicts.

Also added wrapper script test to verify env vars work before
running Austin profiler.

Co-authored-by: openhands <openhands@all-hands.dev>
Austin can only profile Python scripts directly, not bash scripts.
Create a Python wrapper that:
1. Receives env vars as VAR=value command-line arguments
2. Sets os.environ from those arguments
3. Changes to the SDK directory
4. Imports and runs the profile script

This allows Austin to profile the Python process while getting
the necessary environment variables through the command line.

Co-authored-by: openhands <openhands@all-hands.dev>
When running under sudo austin, the Python interpreter doesn't see
the virtual environment's site-packages. Add SITE_PACKAGES argument
that the wrapper uses to prepend to sys.path.

Also added debug output for sys.executable and sys.path to help
diagnose any remaining import issues.

Co-authored-by: openhands <openhands@all-hands.dev>
The venv Python () is a symlink that doesn't work correctly
under sudo + austin. Instead, resolve the actual Python binary path
using sys.executable and use that for Austin.

This ensures Austin runs the correct Python version (3.13) which has
the SDK packages installed, rather than falling back to the system
Python (3.12) which doesn't have the packages.

Co-authored-by: openhands <openhands@all-hands.dev>
Use readlink -f to follow all symlinks and get the actual Python
binary path. The venv python is typically a symlink, and Austin
under sudo needs the actual binary path.

Co-authored-by: openhands <openhands@all-hands.dev>
Instead of using a Python wrapper script (which Austin still runs with
the wrong Python), add CLI argument parsing directly to the profile
script. This handles --site-packages and --sdk-dir arguments to set up
sys.path and working directory before any imports.

This approach:
1. Allows the profile script to configure its own environment
2. Works around Austin's Python version detection issues
3. Simplifies the workflow by eliminating the wrapper script

Co-authored-by: openhands <openhands@all-hands.dev>
When Austin launches Python via sudo, it consistently runs the wrong
Python version (system Python 3.12 instead of venv Python 3.13).

Solution: Use Austin's attach mode (--pid) instead:
1. Start the profile script in the background as normal user
2. Get its PID
3. Have Austin attach to the running process

This ensures the correct Python version runs with proper environment,
and Austin just attaches to profile it.

Co-authored-by: openhands <openhands@all-hands.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants