Add Austin profiling workflow for SWE-bench runs#2191
Draft
Add Austin profiling workflow for SWE-bench runs#2191
Conversation
This PR adds a new GitHub Actions workflow for profiling OpenHands agent runs using the Austin profiler. The workflow: - Profiles actual SWE-bench instances (not just integration tests) - Generates flame graphs (SVG) and Speedscope JSON for analysis - Posts summary reports to PRs with artifact links New files: - .github/workflows/profile-runner.yml: Main workflow - scripts/generate_flamegraph.py: Austin output to flame graph converter Triggers: - Add 'profile-test' label to a PR - Manual dispatch with configurable options Co-authored-by: openhands <openhands@all-hands.dev>
Contributor
API breakage checks (Griffe)Result: Passed |
This allows the workflow to run when pushing to the feature branch, enabling testing before merging to main. Co-authored-by: openhands <openhands@all-hands.dev>
The previous default 'claude-haiku-4-5' doesn't exist in the model config. Co-authored-by: openhands <openhands@all-hands.dev>
The benchmarks repo expects the SDK at vendor/software-agent-sdk/ based on its uv workspace configuration. Also use swebench-infer entry point instead of python -m. Co-authored-by: openhands <openhands@all-hands.dev>
Austin needs to profile the Python interpreter directly, not the uv wrapper. Get the Python path with 'uv run which python' and call it directly. Co-authored-by: openhands <openhands@all-hands.dev>
swebench-infer requires: 1. llm_config_path as a positional argument (JSON file) 2. --select for instance IDs (text file, one per line) Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
The benchmarks version.py runs 'git submodule status' which fails with symlinks. Clone the SDK properly instead. Co-authored-by: openhands <openhands@all-hands.dev>
Instead of trying to integrate with the complex benchmarks repo (which has git submodule version.py requirements), profile a realistic SDK conversation directly. This still exercises the core code paths: - LLM calls - Tool execution (file operations) - Conversation loop Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
GitHub Actions doesn't handle heredocs well in YAML. Use a separate Python script file instead. New: scripts/profile_conversation.py - SWE-bench style profiling script Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
- Install austin-dist and austin-python from PyPI for latest Austin 4.0+ - Austin 4.0+ uses MOJO binary format, need mojo2austin for conversion - Fix profile_conversation.py to use correct SDK API: - Use Tool(name=...) instead of get_default_tools() - Agent doesn't take system_message parameter directly - Use conversation.run(max_iterations=...) instead of iteration - Add debugging output to diagnose 'No such process' error - Add simple test script to verify Austin works before real profiling Co-authored-by: openhands <openhands@all-hands.dev>
- Install austin-dist with sudo pip so it's available to sudo commands - Remove redundant austin-python install in profiling step - Use mojo2austin directly instead of uv run Co-authored-by: openhands <openhands@all-hands.dev>
… syntax - Profile script was failing silently because LLM() didn't get credentials - Add api_key and base_url from environment variables - Fix mojo2austin command to provide output file argument Co-authored-by: openhands <openhands@all-hands.dev>
- Add startup message to verify script runs - Add exception handling to capture any errors - Remove unused traceback import Co-authored-by: openhands <openhands@all-hands.dev>
- The script was exiting before print statements in main() - This means the crash happens during import of openhands modules - Add debug output at module level to capture this Co-authored-by: openhands <openhands@all-hands.dev>
- sudo strips env vars by default, so LLM_API_KEY etc were not available - sudo -E preserves the environment for the Austin child process - Add debug output for working directory and Python path Co-authored-by: openhands <openhands@all-hands.dev>
- Pass env vars explicitly through sudo instead of relying on sudo -E - Add PYTHONPATH to include venv site-packages - Add test of script running without Austin to confirm import works - Pass HOME variable to help with any path resolution issues Co-authored-by: openhands <openhands@all-hands.dev>
Create a shell wrapper script that: 1. Exports all required environment variables (LLM_API_KEY, etc.) 2. Changes to the correct working directory 3. Executes the Python profile script This avoids issues with sudo stripping environment variables and with PYTHONPATH conflicts. Also added wrapper script test to verify env vars work before running Austin profiler. Co-authored-by: openhands <openhands@all-hands.dev>
Austin can only profile Python scripts directly, not bash scripts. Create a Python wrapper that: 1. Receives env vars as VAR=value command-line arguments 2. Sets os.environ from those arguments 3. Changes to the SDK directory 4. Imports and runs the profile script This allows Austin to profile the Python process while getting the necessary environment variables through the command line. Co-authored-by: openhands <openhands@all-hands.dev>
When running under sudo austin, the Python interpreter doesn't see the virtual environment's site-packages. Add SITE_PACKAGES argument that the wrapper uses to prepend to sys.path. Also added debug output for sys.executable and sys.path to help diagnose any remaining import issues. Co-authored-by: openhands <openhands@all-hands.dev>
The venv Python () is a symlink that doesn't work correctly under sudo + austin. Instead, resolve the actual Python binary path using sys.executable and use that for Austin. This ensures Austin runs the correct Python version (3.13) which has the SDK packages installed, rather than falling back to the system Python (3.12) which doesn't have the packages. Co-authored-by: openhands <openhands@all-hands.dev>
Use readlink -f to follow all symlinks and get the actual Python binary path. The venv python is typically a symlink, and Austin under sudo needs the actual binary path. Co-authored-by: openhands <openhands@all-hands.dev>
Instead of using a Python wrapper script (which Austin still runs with the wrong Python), add CLI argument parsing directly to the profile script. This handles --site-packages and --sdk-dir arguments to set up sys.path and working directory before any imports. This approach: 1. Allows the profile script to configure its own environment 2. Works around Austin's Python version detection issues 3. Simplifies the workflow by eliminating the wrapper script Co-authored-by: openhands <openhands@all-hands.dev>
When Austin launches Python via sudo, it consistently runs the wrong Python version (system Python 3.12 instead of venv Python 3.13). Solution: Use Austin's attach mode (--pid) instead: 1. Start the profile script in the background as normal user 2. Get its PID 3. Have Austin attach to the running process This ensures the correct Python version runs with proper environment, and Austin just attaches to profile it. Co-authored-by: openhands <openhands@all-hands.dev>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a new GitHub Actions workflow for profiling OpenHands agent runs using the Austin profiler to identify performance bottlenecks on real workloads.
What's Included
1. Profile Runner Workflow (
.github/workflows/profile-runner.yml)A workflow that:
Triggers:
profile-testlabel to a PRConfigurable Options:
profile_targetswebenchorexampleswebenchinstance_idsdjango__django-11039model_idclaude-haiku-4-5max_iterations30sampling_interval1002. Flame Graph Generator (
scripts/generate_flamegraph.py)A standalone script that converts Austin profiler output to:
3. Documentation (
AGENTS.md)Added
<PROFILING_WORKFLOW>section with usage instructions.How It Works
Example PR Comment Output
Testing
To test this workflow:
profile-testlabel to trigger profiling, ORNotes
ALLHANDS_BOT_GITHUB_PATsecret for benchmarks repo access--childrenflag to capture all child process activity@csmith49 can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:85b6ead-pythonRun
All tags pushed for this build
About Multi-Architecture Support
85b6ead-python) is a multi-arch manifest supporting both amd64 and arm6485b6ead-python-amd64) are also available if needed