fix: strip reasoning items from Responses API input when store=False by mr-karan · Pull Request #2178 · OpenHands/software-agent-sdk

mr-karan · 2026-02-23T08:29:13Z

Summary

When store=False (the default in agent/utils.py:221), OpenAI doesn't persist conversation items. The SDK serializes reasoning item IDs from previous turns back into the input via Message.to_responses_dict(), but OpenAI can't resolve them:

"Item with id 'rs_...' not found. Items are not persisted when `store` is set to false.
Try again with `store` set to true, or remove this item from your input."

This is reproducible with any multi-turn conversation using reasoning models (gpt-5.x, codex, o3, o4) via the Responses API.

Fix: Filter out type: "reasoning" items from the input list in LLM.responses() when the effective store value is False. The model reasons fresh each turn from the full conversation context (user messages, tool calls, tool results), so omitting echoed reasoning items from prior turns doesn't affect output quality.

Reproduction

Use any reasoning model via the Responses API (e.g. gpt-5.1-codex-mini)
Start a multi-turn conversation (agent loop with tool calls)
First turn succeeds; second turn fails with the "Item not found" error

fix: keep message id/status for Responses API round-trip #1754 addresses a different error with reasoning items (store=True case — "reasoning item without its required following item"). These are complementary fixes: this PR handles store=False, fix: keep message id/status for Responses API round-trip #1754 handles store=True serialization format.

Test plan

Added unit test: test_reasoning_items_stripped_from_input_when_store_false
All 627 existing LLM tests pass
Manually verified with gpt-5.1-codex-mini and gpt-5.2 (multi-turn agent conversations complete successfully)

When store=False (the default for agent calls), OpenAI doesn't persist conversation items. The SDK serializes reasoning item IDs from previous turns back into the input, but OpenAI can't resolve them, causing: "Item with id 'rs_...' not found. Items are not persisted when `store` is set to false." This is reproducible with any multi-turn conversation using reasoning models (gpt-5.x, codex, o3, o4) via the Responses API. Fix: filter out `type: "reasoning"` items from the input list in LLM.responses() when the effective store value is False. The model reasons fresh each turn from the full conversation context (user messages, tool calls, tool results), so omitting echoed reasoning items from prior turns doesn't affect output quality.

The OpenHands SDK replays reasoning item IDs on subsequent turns, but with store=False (the default) OpenAI can't resolve them, breaking all multi-turn conversations with gpt-5.x/codex/o3/o4 models. Fix: strip reasoning items at parse time (from_llm_responses_output) so they're never stored on the Message and never replayed. The model reasons fresh each turn from the full conversation context. - Upgrade openhands-sdk/tools from v1.8.1 to v1.11.5 - Patch Message.from_llm_responses_output to discard reasoning items - Remove fragile to_responses_dict and dual-import patches - Add 3 regression tests for the reasoning strip behaviour - Fix conftest.py stubs shadowing real SDK in test environment - Upstream fix: OpenHands/software-agent-sdk#2178 Bump version to 0.2.12.

enyst

Thank you for the contribution! According to OpenAI docs, I think the behavior described in the PR description should not accurate: the reasoning items are available and can be part of a conversation with store=False

But I've seen this before 🤔

Could you tell how you ran the sdk?

mr-karan · 2026-02-24T08:54:11Z

Hi @enyst, thanks for the review!

I hit this while running hodor (a PR review agent built on top of the OpenHands SDK). The agent loop naturally produces multi-turn conversations — prompt → bash tool calls → observations → next turn.

With any reasoning model (o4-mini, gpt-5.1-codex-mini, gpt-5.2) via the Responses API, the first turn succeeds fine. On the second turn, Message.to_responses_dict() serializes reasoning item IDs (rs_...) from the previous turn back into the input. Since the SDK defaults to store=False (agent/utils.py:221), OpenAI can't resolve those IDs:

"Item with id 'rs_...' not found. Items are not persisted when store is set to false."

I think the docs may be describing the behavior when store=True — in that case reasoning items are persisted server-side and referencing them by ID works. With store=False, the items aren't persisted between API calls, so echoing the IDs back causes the error.

Here's a minimal repro script (no SDK dependency, just the raw OpenAI client) that demonstrates the issue:

# /// script
# requires-python = ">=3.11"
# dependencies = ["openai>=1.11.0"]
# ///
"""
Usage:
    export OPENAI_API_KEY=sk-...
    uv run repro_reasoning_store_false.py
"""
from openai import OpenAI

client = OpenAI()
model = "o4-mini"  # any reasoning model

# Turn 1: succeeds
resp = client.responses.create(
    model=model,
    input="What is 2+2? Be brief.",
    store=False,
)
prior_items = [item.to_dict() for item in resp.output]

# Turn 2: fails — reasoning item IDs can't be resolved
resp2 = client.responses.create(
    model=model,
    input=prior_items + [{"role": "user", "content": "Now what is 3+3?"}],
    store=False,
)

Output:

[turn 1] OK — got 2 output items
[turn 1] Reasoning item IDs being sent back: ['rs_0c43e2006a44...']

[turn 2] FAILED as expected:
  Error code: 404 - Item with id 'rs_0c43e200...' not found.
  Items are not persisted when `store` is set to false.

[turn 2 fixed] Stripping reasoning items and retrying ...
[turn 2 fixed] OK

Turn 1 succeeds, turn 2 fails with 404, stripping type: "reasoning" items from input fixes it.

enyst · 2026-02-24T09:13:43Z

@OpenHands Read all comments in this PR. Test the script, using the openai key, and try multiple models on Responses API: o4-mini (also check if this is on Responses), gpt-5-nano, gpt-5.2, gpt-5.2-codex. or so.
Record in a .pr/ directory in a .md file each result and what you did.

Read https://developers.openai.com/api/reference/resources/responses/methods/create and investigate the docs. Note that we only care about store=False. Then, if you learn something about how you could fix the script so it works with reasoning items and with store=False, try it, preferably on gpt-5-nano (or some other version number for mini or nano, if 5 is too old)

Commit the files / test results in the .pr directory, it's a directory just for testing. Do not modify the PR otherwise. Note that it's from a fork so make sure you set the remote correctly.

openhands-ai · 2026-02-24T09:14:24Z

I'm on it! enyst can track my progress at all-hands.dev

Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2026-02-24T09:22:11Z

Final summary

New changes made

Added and committed only PR-validation artifacts under .pr/:
- .pr/repro_responses_reasoning_store_false.py (standalone repro/exploration script)
- .pr/repro_responses_reasoning_store_false.output.txt (captured run output)
- .pr/responses_api_store_false_reasoning_results.md (written summary + findings)
No SDK/production code changes were made beyond the existing PR contents.

Verification / instructions checklist

Read all PR comments/review context.
Tested with OPENAI_API_KEY against multiple Responses API models (o4-mini, gpt-5-nano, gpt-5.2, gpt-5.2-codex) using store=False.
Recorded what I did + results in a .pr/*.md file.
Committed and pushed only .pr/ files (no other repo changes included).
Accounted for fork PR: pushed to the PR head repo/branch (mr-karan/software-agent-sdk:fix/strip-reasoning-items-store-false).

Conciseness check

The commit contains only the requested .pr/ artifacts; nothing else was added/modified.
Any tool-generated scratch (.agent_tmp/) remained uncommitted.

Pushed update

Pushed commit b0c8c107 to the PR branch fix/strip-reasoning-items-store-false (fork remote).

Co-authored-by: openhands <openhands@all-hands.dev>

enyst · 2026-02-24T10:12:20Z

(OpenHands-GPT-5.2 speaking)

I tested include=["reasoning.encrypted_content"] (what the docs call out for stateless / store=False) and it does change the behavior materially.

1) What worked with encrypted reasoning included

I updated the .pr/repro_responses_reasoning_store_false.py script to support:

uv run python .pr/repro_responses_reasoning_store_false.py --include-encrypted

That sets include=["reasoning.encrypted_content"] on both turns.

Results (Responses API, store=False):

For o4-mini, gpt-5-nano, gpt-5.2-codex:
- Turn 1 returns a type:"reasoning" output item with an rs_... id and a non-empty encrypted_content (I recorded only lengths, e.g. 1124/1144/868 chars).
- Turn 2 succeeds even when echoing response.output verbatim (i.e., including the rs_... reasoning item + its id). No 404.
For gpt-5.2, turn 1 didn’t include a reasoning output item in my run; turn 2 worked either way.

The corresponding output artifact is:

.pr/repro_responses_reasoning_store_false.include_encrypted.output.txt

Baseline (no include) still reproduces the 404 as before:

.pr/repro_responses_reasoning_store_false.output.txt

I also updated the report:

.pr/responses_api_store_false_reasoning_results.md

Pushed in commit: 12cd3b2d (only .pr/ artifacts).

2) Does the SDK set this already?

Yes — in the current repo state, the SDK already has support for this and it’s enabled by default:

LLM.enable_encrypted_reasoning defaults to True
- openhands-sdk/openhands/sdk/llm/llm.py (around line ~291)
For Responses API calls, when store=False and enable_encrypted_reasoning=True, we append "reasoning.encrypted_content" to the include list:
- openhands-sdk/openhands/sdk/llm/options/responses_options.py

So the SDK should already be requesting encrypted reasoning on stateless Responses calls, unless someone explicitly disables enable_encrypted_reasoning.

If you want, I can also add one more targeted experiment in .pr/ that mimics the SDK’s exact serialization path (i.e., taking a Message with a responses_reasoning_item and round-tripping) to confirm the encrypted_content is preserved across turns in the SDK objects.

enyst · 2026-02-24T10:20:16Z

HUMAN: @mr-karan I think we see what's happening: you are entirely correct, however the SDK should have asked for encrypted_reasoning, not the other id... oof why does OpenAI even send something we can't use. 😢

I think we have two alternatives:

ask for encrypted_reasoning, and send it back - seems better IMHO, the model will have its past reasoning in context and can continue it (OpenAI decrypts server-side, and it's the recommended way for stateless requests)
don't ask for encrypted, but then we also need to strip this as you suggested.

Co-authored-by: openhands <openhands@all-hands.dev>

enyst · 2026-02-24T10:26:11Z

(OpenHands-GPT-5.2 speaking)

1) “Real use” in the SDK (Responses API via LiteLLM)

I added a SDK-level repro that exercises the actual OpenHands call path:

Turn 1 via LLM.responses() (SDK parsing)
Turn 2 input via LLM.format_messages_for_responses() (so the prior reasoning item is replayed exactly like the SDK would)
Turn 2 API call via litellm.responses.main.responses(...)

Artifacts:

.pr/sdk_responses_stateless_replay_test.py
Outputs:
- .pr/sdk_responses_stateless_replay_test.o4-mini.output.txt
- .pr/sdk_responses_stateless_replay_test.openai_gpt-5-nano.output.txt
- .pr/sdk_responses_stateless_replay_test.openai_gpt-5.2-codex.output.txt
- .pr/sdk_responses_stateless_replay_test.openai_gpt-5.2.output.txt

What I observed

For reasoning models (openai/o4-mini, openai/gpt-5-nano, openai/gpt-5.2-codex):

With enable_encrypted_reasoning=True (SDK default):
- Turn 1 reasoning item had non-empty encrypted_content
- Turn 2 succeeds even when replaying the reasoning item verbatim with store=False
With enable_encrypted_reasoning=False:
- Turn 1 reasoning item has encrypted_content=None
- Turn 2 fails with:
  Item with id 'rs_...' not found. Items are not persisted when store is set to false...
With the current PR behavior (strip reasoning items from input when store=False):
- Turn 2 succeeds even when encrypted reasoning is disabled

For openai/gpt-5.2, I didn’t get a type:"reasoning" output item in these runs, so the replay problem doesn’t show up.

Does the SDK set it?

Yes:

LLM.enable_encrypted_reasoning defaults to True
select_responses_options() appends "reasoning.encrypted_content" to include when store=False and that flag is enabled.

(Details + file pointers are also written up in .pr/sdk_and_hodor_investigation.md.)

2) hodor: why did he see the error even though it’s built on the SDK?

I cloned https://github.com/mr-karan/hodor and inspected its OpenHands integration.

Key finding: hodor intentionally disables encrypted reasoning:

hodor/hodor/llm/openhands_client.py sets:
- llm_config["enable_encrypted_reasoning"] = False
- and even monkeypatches the SDK options to ensure "reasoning.encrypted_content" gets removed from include.

It also contains a monkeypatch to drop responses_reasoning_item at parse-time (so it never gets replayed), with comments explicitly referencing PR #2178 as the upstream fix. So the 404 error is exactly what you get when:

encrypted reasoning is disabled and
reasoning items are still being replayed (e.g., before those patches existed / if they aren’t active in a particular execution path).

Where I recorded everything / pushed

I committed and pushed only .pr/ artifacts to the PR branch in commit:

938b16ed

Main writeup:

.pr/sdk_and_hodor_investigation.md

If you want to adjust the PR’s fix based on this: one nuance is that unconditionally stripping reasoning items when store=False also strips the valid stateless path (when encrypted_content is present). A more targeted strategy could be: only strip reasoning items when they lack encrypted_content (or when encrypted reasoning is disabled). I didn’t change the PR code since you asked to keep changes in .pr/ only, but I can implement that if you decide that’s the desired behavior.

enyst · 2026-02-24T10:41:30Z

HUMAN: (we kinda have to say it around here 😅 )

hodor/hodor/llm/openhands_client.py sets:

llm_config["enable_encrypted_reasoning"] = False

and even monkeypatches the SDK options to ensure "reasoning.encrypted_content" gets removed from include.

I agree it's a bug that the SDK fails if you disable encrypted_reasoning, but I'm curious why you do? It's literally the model reasoning... 🤔 This is what OpenAI doc says:

reasoning.encrypted_content: Includes an encrypted version of reasoning tokens in reasoning item outputs. This enables reasoning items to be used in multi-turn conversations when using the Responses API statelessly (like when the store parameter is set to false, or when an organization is enrolled in the zero data retention program).

Or here:

To disable statefulness, but still take advantage of reasoning:

set store: false in the store field

add ["reasoning.encrypted_content"] to the include field

enyst requested changes Feb 23, 2026

View reviewed changes

test: record Responses API store=false reasoning repro

b0c8c10

Co-authored-by: openhands <openhands@all-hands.dev>

test: validate reasoning.encrypted_content stateless replay

12cd3b2

Co-authored-by: openhands <openhands@all-hands.dev>

test: validate encrypted reasoning replay via SDK and hodor notes

938b16e

Co-authored-by: openhands <openhands@all-hands.dev>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix: strip reasoning items from Responses API input when store=False#2178

fix: strip reasoning items from Responses API input when store=False#2178
mr-karan wants to merge 4 commits intoOpenHands:mainfrom
mr-karan:fix/strip-reasoning-items-store-false

mr-karan commented Feb 23, 2026

Uh oh!

enyst left a comment

Uh oh!

mr-karan commented Feb 24, 2026

Uh oh!

enyst commented Feb 24, 2026

Uh oh!

openhands-ai bot commented Feb 24, 2026

Uh oh!

openhands-ai bot commented Feb 24, 2026 •

edited by enyst

Loading

Uh oh!

enyst commented Feb 24, 2026

Uh oh!

enyst commented Feb 24, 2026 •

edited

Loading

Uh oh!

enyst commented Feb 24, 2026

Uh oh!

enyst commented Feb 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

mr-karan commented Feb 23, 2026

Summary

Reproduction

Related

Test plan

Uh oh!

enyst left a comment

Choose a reason for hiding this comment

Uh oh!

mr-karan commented Feb 24, 2026

Uh oh!

enyst commented Feb 24, 2026

Uh oh!

openhands-ai bot commented Feb 24, 2026

Uh oh!

openhands-ai bot commented Feb 24, 2026 • edited by enyst Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Final summary

Uh oh!

enyst commented Feb 24, 2026

1) What worked with encrypted reasoning included

2) Does the SDK set this already?

Uh oh!

enyst commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

enyst commented Feb 24, 2026

1) “Real use” in the SDK (Responses API via LiteLLM)

What I observed

Does the SDK set it?

2) hodor: why did he see the error even though it’s built on the SDK?

Where I recorded everything / pushed

Uh oh!

enyst commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

openhands-ai bot commented Feb 24, 2026 •

edited by enyst

Loading

enyst commented Feb 24, 2026 •

edited

Loading

enyst commented Feb 24, 2026 •

edited

Loading