Add Austin profiling workflow for SWE-bench runs by csmith49 · Pull Request #2191 · OpenHands/software-agent-sdk

csmith49 · 2026-02-23T20:18:20Z

Summary

This PR adds a new GitHub Actions workflow for profiling OpenHands agent runs using the Austin profiler to identify performance bottlenecks on real workloads.

What's Included

1. Profile Runner Workflow (`.github/workflows/profile-runner.yml`)

A workflow that:

Profiles actual SWE-bench instances via the benchmarks repo (not just integration tests)
Generates flame graphs (SVG) and Speedscope JSON for analysis
Posts summary reports to PRs with artifact download links

Triggers:

Add profile-test label to a PR
Manual dispatch via Actions UI

Configurable Options:

Option	Description	Default
`profile_target`	`swebench` or `example`	`swebench`
`instance_ids`	SWE-bench instance IDs	`django__django-11039`
`model_id`	Model from resolve_model_config.py	`claude-haiku-4-5`
`max_iterations`	Agent iteration limit	`30`
`sampling_interval`	Austin sampling interval (µs)	`100`

2. Flame Graph Generator (`scripts/generate_flamegraph.py`)

A standalone script that converts Austin profiler output to:

Collapsed stack format (for flamegraph.pl SVG generation)
Speedscope JSON (for interactive web-based analysis at https://speedscope.app)
Markdown summary with top functions and call stacks

3. Documentation (`AGENTS.md`)

Added <PROFILING_WORKFLOW> section with usage instructions.

How It Works

┌─────────────────────────────────────────────────────────────┐
│  1. Checkout SDK repo + benchmarks repo                     │
│  2. Link SDK into benchmarks as submodule                   │
│  3. Run SWE-bench inference wrapped with Austin profiler    │
│  4. Generate flame graphs from Austin output                │
│  5. Post results to PR                                      │
└─────────────────────────────────────────────────────────────┘

Example PR Comment Output

## 🔬 Profiling Results

### Overview
- **Total Samples:** 15,234
- **Unique Call Stacks:** 892

### Top Functions by Sample Count
| Function | Samples | % |
|----------|---------|---|
| `_call_llm` | 3,421 | 22.5% |
| `parse_response` | 2,100 | 13.8% |
...

### Generated Artifacts
**Flame Graphs (SVG):** `swebench.svg`
**Speedscope Profiles:** `swebench.speedscope.json`

---
📦 **Artifacts:** [Download from workflow run](link)

Testing

To test this workflow:

Merge this PR
Add the profile-test label to trigger profiling, OR
Go to Actions → "Profile OpenHands Run" → Run workflow

Notes

Requires ALLHANDS_BOT_GITHUB_PAT secret for benchmarks repo access
The SWE-bench profiling command may need adjustment based on actual benchmarks repo structure
Uses --children flag to capture all child process activity

@csmith49 can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:85b6ead-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-85b6ead-python \
  ghcr.io/openhands/agent-server:85b6ead-python

All tags pushed for this build

ghcr.io/openhands/agent-server:85b6ead-golang-amd64
ghcr.io/openhands/agent-server:85b6ead-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:85b6ead-golang-arm64
ghcr.io/openhands/agent-server:85b6ead-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:85b6ead-java-amd64
ghcr.io/openhands/agent-server:85b6ead-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:85b6ead-java-arm64
ghcr.io/openhands/agent-server:85b6ead-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:85b6ead-python-amd64
ghcr.io/openhands/agent-server:85b6ead-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:85b6ead-python-arm64
ghcr.io/openhands/agent-server:85b6ead-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:85b6ead-golang
ghcr.io/openhands/agent-server:85b6ead-java
ghcr.io/openhands/agent-server:85b6ead-python

About Multi-Architecture Support

Each variant tag (e.g., 85b6ead-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 85b6ead-python-amd64) are also available if needed

This PR adds a new GitHub Actions workflow for profiling OpenHands agent runs using the Austin profiler. The workflow: - Profiles actual SWE-bench instances (not just integration tests) - Generates flame graphs (SVG) and Speedscope JSON for analysis - Posts summary reports to PRs with artifact links New files: - .github/workflows/profile-runner.yml: Main workflow - scripts/generate_flamegraph.py: Austin output to flame graph converter Triggers: - Add 'profile-test' label to a PR - Manual dispatch with configurable options Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-02-23T20:18:55Z

API breakage checks (Griffe)

Result: Passed

Action log

This allows the workflow to run when pushing to the feature branch, enabling testing before merging to main. Co-authored-by: openhands <openhands@all-hands.dev>

The previous default 'claude-haiku-4-5' doesn't exist in the model config. Co-authored-by: openhands <openhands@all-hands.dev>

The benchmarks repo expects the SDK at vendor/software-agent-sdk/ based on its uv workspace configuration. Also use swebench-infer entry point instead of python -m. Co-authored-by: openhands <openhands@all-hands.dev>

Austin needs to profile the Python interpreter directly, not the uv wrapper. Get the Python path with 'uv run which python' and call it directly. Co-authored-by: openhands <openhands@all-hands.dev>

swebench-infer requires: 1. llm_config_path as a positional argument (JSON file) 2. --select for instance IDs (text file, one per line) Co-authored-by: openhands <openhands@all-hands.dev>

Co-authored-by: openhands <openhands@all-hands.dev>

The benchmarks version.py runs 'git submodule status' which fails with symlinks. Clone the SDK properly instead. Co-authored-by: openhands <openhands@all-hands.dev>

Instead of trying to integrate with the complex benchmarks repo (which has git submodule version.py requirements), profile a realistic SDK conversation directly. This still exercises the core code paths: - LLM calls - Tool execution (file operations) - Conversation loop Co-authored-by: openhands <openhands@all-hands.dev>

Co-authored-by: openhands <openhands@all-hands.dev>

GitHub Actions doesn't handle heredocs well in YAML. Use a separate Python script file instead. New: scripts/profile_conversation.py - SWE-bench style profiling script Co-authored-by: openhands <openhands@all-hands.dev>

Co-authored-by: openhands <openhands@all-hands.dev>

- Install austin-dist and austin-python from PyPI for latest Austin 4.0+ - Austin 4.0+ uses MOJO binary format, need mojo2austin for conversion - Fix profile_conversation.py to use correct SDK API: - Use Tool(name=...) instead of get_default_tools() - Agent doesn't take system_message parameter directly - Use conversation.run(max_iterations=...) instead of iteration - Add debugging output to diagnose 'No such process' error - Add simple test script to verify Austin works before real profiling Co-authored-by: openhands <openhands@all-hands.dev>

- Install austin-dist with sudo pip so it's available to sudo commands - Remove redundant austin-python install in profiling step - Use mojo2austin directly instead of uv run Co-authored-by: openhands <openhands@all-hands.dev>

… syntax - Profile script was failing silently because LLM() didn't get credentials - Add api_key and base_url from environment variables - Fix mojo2austin command to provide output file argument Co-authored-by: openhands <openhands@all-hands.dev>

- Add startup message to verify script runs - Add exception handling to capture any errors - Remove unused traceback import Co-authored-by: openhands <openhands@all-hands.dev>

- The script was exiting before print statements in main() - This means the crash happens during import of openhands modules - Add debug output at module level to capture this Co-authored-by: openhands <openhands@all-hands.dev>

- sudo strips env vars by default, so LLM_API_KEY etc were not available - sudo -E preserves the environment for the Austin child process - Add debug output for working directory and Python path Co-authored-by: openhands <openhands@all-hands.dev>

- Pass env vars explicitly through sudo instead of relying on sudo -E - Add PYTHONPATH to include venv site-packages - Add test of script running without Austin to confirm import works - Pass HOME variable to help with any path resolution issues Co-authored-by: openhands <openhands@all-hands.dev>

Create a shell wrapper script that: 1. Exports all required environment variables (LLM_API_KEY, etc.) 2. Changes to the correct working directory 3. Executes the Python profile script This avoids issues with sudo stripping environment variables and with PYTHONPATH conflicts. Also added wrapper script test to verify env vars work before running Austin profiler. Co-authored-by: openhands <openhands@all-hands.dev>

Austin can only profile Python scripts directly, not bash scripts. Create a Python wrapper that: 1. Receives env vars as VAR=value command-line arguments 2. Sets os.environ from those arguments 3. Changes to the SDK directory 4. Imports and runs the profile script This allows Austin to profile the Python process while getting the necessary environment variables through the command line. Co-authored-by: openhands <openhands@all-hands.dev>

When running under sudo austin, the Python interpreter doesn't see the virtual environment's site-packages. Add SITE_PACKAGES argument that the wrapper uses to prepend to sys.path. Also added debug output for sys.executable and sys.path to help diagnose any remaining import issues. Co-authored-by: openhands <openhands@all-hands.dev>

The venv Python () is a symlink that doesn't work correctly under sudo + austin. Instead, resolve the actual Python binary path using sys.executable and use that for Austin. This ensures Austin runs the correct Python version (3.13) which has the SDK packages installed, rather than falling back to the system Python (3.12) which doesn't have the packages. Co-authored-by: openhands <openhands@all-hands.dev>

Use readlink -f to follow all symlinks and get the actual Python binary path. The venv python is typically a symlink, and Austin under sudo needs the actual binary path. Co-authored-by: openhands <openhands@all-hands.dev>

Instead of using a Python wrapper script (which Austin still runs with the wrong Python), add CLI argument parsing directly to the profile script. This handles --site-packages and --sdk-dir arguments to set up sys.path and working directory before any imports. This approach: 1. Allows the profile script to configure its own environment 2. Works around Austin's Python version detection issues 3. Simplifies the workflow by eliminating the wrapper script Co-authored-by: openhands <openhands@all-hands.dev>

When Austin launches Python via sudo, it consistently runs the wrong Python version (system Python 3.12 instead of venv Python 3.13). Solution: Use Austin's attach mode (--pid) instead: 1. Start the profile script in the background as normal user 2. Get its PID 3. Have Austin attach to the running process This ensures the correct Python version runs with proper environment, and Austin just attaches to profile it. Co-authored-by: openhands <openhands@all-hands.dev>

csmith49 added the profile-test label Feb 23, 2026

csmith49 and others added 26 commits February 23, 2026 13:21

Merge branch 'main' into feature/austin-profiling-workflow

4661d6e

Add push trigger for feature branch testing

54be6a9

This allows the workflow to run when pushing to the feature branch, enabling testing before merging to main. Co-authored-by: openhands <openhands@all-hands.dev>

Fix default model ID to use claude-sonnet-4-6

c58fe6d

The previous default 'claude-haiku-4-5' doesn't exist in the model config. Co-authored-by: openhands <openhands@all-hands.dev>

Fix benchmarks SDK path - use vendor/software-agent-sdk

6604664

The benchmarks repo expects the SDK at vendor/software-agent-sdk/ based on its uv workspace configuration. Also use swebench-infer entry point instead of python -m. Co-authored-by: openhands <openhands@all-hands.dev>

Fix Austin profiling by using direct Python path

504de80

Austin needs to profile the Python interpreter directly, not the uv wrapper. Get the Python path with 'uv run which python' and call it directly. Co-authored-by: openhands <openhands@all-hands.dev>

Fix swebench-infer arguments - use proper LLM config file

75c20b8

swebench-infer requires: 1. llm_config_path as a positional argument (JSON file) 2. --select for instance IDs (text file, one per line) Co-authored-by: openhands <openhands@all-hands.dev>

Add debug output to diagnose swebench-infer failures

38d968c

Co-authored-by: openhands <openhands@all-hands.dev>

Clone SDK as proper git repo to satisfy version.py check

ff3f9d5

The benchmarks version.py runs 'git submodule status' which fails with symlinks. Clone the SDK properly instead. Co-authored-by: openhands <openhands@all-hands.dev>

Fix heredoc indentation in workflow

341f6d9

Co-authored-by: openhands <openhands@all-hands.dev>

Fix workflow YAML - use separate profile script file

fec2691

GitHub Actions doesn't handle heredocs well in YAML. Use a separate Python script file instead. New: scripts/profile_conversation.py - SWE-bench style profiling script Co-authored-by: openhands <openhands@all-hands.dev>

Debug Austin profiling - add test run and try --pipe

5a31c92

Co-authored-by: openhands <openhands@all-hands.dev>

Add debug output and error handling to profile script

606b078

- Add startup message to verify script runs - Add exception handling to capture any errors - Remove unused traceback import Co-authored-by: openhands <openhands@all-hands.dev>

Add debug output before imports to diagnose crash

b4b2307

- The script was exiting before print statements in main() - This means the crash happens during import of openhands modules - Add debug output at module level to capture this Co-authored-by: openhands <openhands@all-hands.dev>

Fix: Use readlink -f to get absolute Python binary path

9115e4a

Use readlink -f to follow all symlinks and get the actual Python binary path. The venv python is typically a symlink, and Austin under sudo needs the actual binary path. Co-authored-by: openhands <openhands@all-hands.dev>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add Austin profiling workflow for SWE-bench runs#2191

Add Austin profiling workflow for SWE-bench runs#2191
csmith49 wants to merge 27 commits intomainfrom
feature/austin-profiling-workflow

csmith49 commented Feb 23, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

csmith49 commented Feb 23, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's Included

1. Profile Runner Workflow (.github/workflows/profile-runner.yml)

2. Flame Graph Generator (scripts/generate_flamegraph.py)

3. Documentation (AGENTS.md)

How It Works

Example PR Comment Output

Testing

Notes

Uh oh!

github-actions bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

API breakage checks (Griffe)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

csmith49 commented Feb 23, 2026 •

edited by github-actions bot

Loading

1. Profile Runner Workflow (`.github/workflows/profile-runner.yml`)

2. Flame Graph Generator (`scripts/generate_flamegraph.py`)

3. Documentation (`AGENTS.md`)

github-actions bot commented Feb 23, 2026 •

edited

Loading