Update default LLM model to gpt-5.5#3257
Conversation
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED Behavioral default changes detectedThese public
|
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
Coverage Report •
|
|||||||||||||||||||||||||||||||||||||||||||||
|
@OpenHands /iterate |
|
I'm on it! neubig can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
All PR objectives verified: default model successfully changed to gpt-5.5, constructor honors the field default, and test coverage added.
Does this PR achieve its stated goal?
Yes. The PR accomplishes all three stated objectives:
- ✓ Changed the SDK
LLM.modelfield default fromclaude-sonnet-4-20250514togpt-5.5 - ✓ Made
LLM()honor the field default when model argument is omitted (via_coerce_inputs) - ✓ Added test coverage for the default model constructor path
I verified this by creating LLM instances without specifying a model and confirming they use gpt-5.5, and by running both manual tests and the included test suite. The API breakage check correctly allows this intentional policy change.
| Phase | Result |
|---|---|
| Environment Setup | ✅ Dependencies installed with uv, SDK imports successfully |
| CI Status | ✅ All checks passing (pre-commit, tests, API checks, builds) |
| Functional Verification | ✅ Default model behavior verified end-to-end |
Functional Verification
Test 1: Verify default model is gpt-5.5 when no model specified
Step 1 — Establish baseline (main branch):
Checked the default on main branch:
git show origin/main:openhands-sdk/openhands/sdk/llm/llm.py | grep -A 2 'model: str = Field'Output:
model: str = Field(
default="claude-sonnet-4-20250514",
description="Model name.",
This confirms the old default was claude-sonnet-4-20250514.
Step 2 — Apply PR changes:
Already on PR branch codex/default-llm-gpt-5-5 (commit 9772df5a).
Step 3 — Verify new default is used:
Ran test script creating LLM(usage_id="test-default") without model argument:
from openhands.sdk import LLM
llm_default = LLM(usage_id="test-default")
print(llm_default.model)Output:
✓ LLM created successfully
Model: gpt-5.5
✓ PASS: Default model is 'gpt-5.5' as expected
This confirms LLM() now defaults to gpt-5.5 when no model is specified.
Test 2: Verify explicit model still works
Verification:
Ran test creating LLM(model="gpt-4o-mini", usage_id="test-explicit"):
llm_explicit = LLM(model="gpt-4o-mini", usage_id="test-explicit")
print(llm_explicit.model)Output:
✓ LLM created successfully
Model: gpt-4o-mini
✓ PASS: Explicit model 'gpt-4o-mini' is used as expected
This confirms explicit model specifications still override the default correctly.
Test 3: Verify Field default value
Verification:
Checked the Pydantic field default:
field_default = LLM.model_fields["model"].get_default()
print(field_default)Output:
Field default: gpt-5.5
✓ PASS: Field default is 'gpt-5.5' as expected
Test 4: Run included test
Verification:
Ran the new test added in the PR:
uv run pytest tests/sdk/config/test_llm_config.py::test_llm_config_defaults -vOutput:
tests/sdk/config/test_llm_config.py::test_llm_config_defaults PASSED [100%]
1 passed, 5 warnings in 0.35s
Test passes, confirming the default behavior is correct.
Test 5: Verify API breakage check allows this change
Verification:
Ran the API breakage check tests:
uv run pytest tests/cross/test_check_sdk_api_breakage.py::test_allowed_field_default_change_llm_model \
tests/cross/test_check_sdk_api_breakage.py::test_allowed_field_default_change_rejects_other_fields -vOutput:
test_allowed_field_default_change_llm_model PASSED [ 50%]
test_allowed_field_default_change_rejects_other_fields PASSED [100%]
2 passed, 5 warnings in 0.18s
This confirms the API breakage checker correctly allows LLM.model default changes while still protecting other fields.
Issues Found
None.
all-hands-bot
left a comment
There was a problem hiding this comment.
The implementation is solid (breakage checker changes are well-designed, validator logic is reasonable, tests are adequate), but this needs human review after benchmark runs.
|
@OpenHands /iterate we already have benchmarks in OpenHands/openhands-index-results |
|
I'm on it! neubig can track my progress at all-hands.dev |
|
Thanks for the pointer. I found the existing standard OpenHands GPT-5.5 benchmark entry in Relevant scores from
So I don't think this PR needs a duplicate eval run just to satisfy the benchmark-risk note; reviewers can use the existing OpenHands Index artifacts for GPT-5.5 validation. This comment was created by an AI agent (OpenHands) on behalf of the requester. |
|
Since my last summary, there were no additional code changes or repository actions. Final status:
|
|
OpenHands encountered an error: Request timeout after 30 seconds to https://jxdhpozhpwcyjfyu.prod-runtime.all-hands.dev/api/conversations/d1155223-3eda-4ab6-b386-ad9d69b2f76f/ask_agent See the conversation for more information. |
xingyaoww
left a comment
There was a problem hiding this comment.
LGTM.
nit: you might also want to change the default model across the different examples.
|
@OpenHands check all other places where we use claude sonnet as the default and switch them to gpt-5.5 |
|
I'm on it! neubig can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
|
Addressed in ecd0414: updated the remaining user-facing/default Claude Sonnet references to GPT-5.5 across SDK examples/docs, workflows, UI defaults, and fixture-generation defaults. Left only historical/API-breakage baseline strings and recorded fixture log model names unchanged.\n\n_This PR comment was created by an AI agent (OpenHands) on behalf of the user._ |
|
Since the last summary, there were no additional code changes or repository actions. Final status:
|
|
@OpenHands /iterate |
|
I'm on it! neubig can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
|
Iterated on the PR and pushed Validation completed:
This PR comment was created by an AI agent (OpenHands) on behalf of the user. |
|
Since my last summary, there were no additional repository changes or GitHub actions. Final status:
Latest pushed SHA: |
|
@OpenHands the changes to field validation are beyond scope of this PR, revert them |
|
I'm on it! neubig can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
|
Addressed in 15cd051: reverted the out-of-scope LLM field validation change and removed the omitted-model constructor assertion, while keeping the GPT-5.5 field default update intact. Validation passed with |
|
OpenHands encountered an error: Request timeout after 30 seconds to https://vghucksvkftugwql.prod-runtime.all-hands.dev/api/conversations/a9fd430a-8de1-4a11-8337-84a6879296b9/ask_agent See the conversation for more information. |
Co-authored-by: openhands <openhands@all-hands.dev>
|
@OpenHands in CI the API breakage check fails. Understand the failure, and the purpose of the checks; then take a step back and tell me : WDYT, is our workflow too strict or is this a breakage worth enforcement of deprecation period? Respond directly on github and make suggestions |
|
@enyst your session has expired. Please login again at OpenHands Cloud and try again. |
|
@OpenHands in CI the API breakage check fails. Understand the failure, and the purpose of the checks; then take a step back and tell me : WDYT, is our workflow too strict or is this a breakage worth enforcement of deprecation period? Respond directly on github and make suggestions |
|
I'm on it! enyst can track my progress at all-hands.dev |
|
@enyst I dug into the failing check. What is failing:
What the check is trying to protect:
My take:
Suggestions:
So short version: worth flagging, probably not worth a deprecation runway. This PR comment was created by an AI agent (OpenHands) on behalf of the user. |
This comment was marked as duplicate.
This comment was marked as duplicate.
|
A few lightweight ways we could handle this without forcing a long deprecation runway:
If I had to pick one, I’d start with:
That seems like the right balance: visible, automatable, but not pretending that a default switch needs a 5-minor deprecation runway. This PR comment was created by an AI agent (OpenHands) on behalf of the user. |
enyst
left a comment
There was a problem hiding this comment.
This is crazy fun, thank you for the PR!
Finally, a non-Anthropic LLM takes the default spot! 🥳
Well, it's OpenAI, but nobody is perfect 😂
Keep the Anthropic compliance default while updating the OpenAI entry to gpt-5.5 and align the compliance runner metadata with the new model id. Co-authored-by: openhands <openhands@all-hands.dev>
Summary
LLM.modelfield default fromclaude-sonnet-4-20250514togpt-5.5LLM()honor the field default when the model argument is omittedTests
uv run pytest tests/sdk/config/test_llm_config.py::test_llm_config_defaultsuv run ruff format openhands-sdk/openhands/sdk/llm/llm.py tests/sdk/config/test_llm_config.pyuv run ruff check openhands-sdk/openhands/sdk/llm/llm.py tests/sdk/config/test_llm_config.pyuv run pyright openhands-sdk/openhands/sdk/llm/llm.py tests/sdk/config/test_llm_config.pyAgent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:da8b063-pythonRun
All tags pushed for this build
About Multi-Architecture Support
da8b063-python) is a multi-arch manifest supporting both amd64 and arm64da8b063-python-amd64) are also available if needed