fix(timeline,tools): GPT-5 timeline robustness — reasoning-item drop + tool-batch isolation by ngoclam9415 · Pull Request #11 · aitomatic/dana-runtime

ngoclam9415 · 2026-05-18T14:56:43Z

Two independent robustness fixes for GPT-5/o3/o4 timelines. Both surfaced while verifying a real session timeline (atlas-q1).

Commit 1 — persist reasoning items on empty-summary turns

Problem

GPT-5/o3/o4 turns return a reasoning item (rs_… + encrypted_content) with an empty summary on low-summary turns — typically single-call tool continuations. _record_think_results gated AGENT_THOUGHTS creation on summary text:

if reasoning and len(reasoning) > 0:   # keys on summary text

When the summary was empty, no entry was created — and reasoning_items (which rides in that entry's metadata) was silently dropped along with it. Observed: 7 of ~14 model turns in atlas-q1 were bare tool_call entries with no preceding agent_thoughts.

Impact: on resume, those turns replay with no reasoning state. Cross-turn reasoning replay (PR #10) silently does not apply to low-summary turns.

Fix

Gate now keys on reasoning_items presence, not summary text. Entry emitted with empty content when only the item is present; metadata still carries the item for replay. Applied to both branches.

Commit 2 — isolate tool-call failures within a batch

Problem

A dispatch-phase exception (registry getattr, name parsing, object lookup) escaped before the inner try in _execute_single_call / _execute_single_call_async. In the async path it propagated out of asyncio.gather, discarding the entire batch's results — including calls that succeeded.

The TOOL_CALL entry already recorded N tool_call_ids, so the next OpenAI turn 400s on the unanswered tool calls.

Fix

Each single-call dispatcher wrapped in one outer guard covering dispatch + execution; both now non-raising. Redundant inner try blocks removed.
execute_tools_async: asyncio.gather(return_exceptions=True) + escaped-exception → error result (defense-in-depth).

Every tool_call_id in a batch now always gets a result.

Tests

TestEmptySummaryReasoningPersistence — tool-call / direct-answer branches + no-items negative case.
Batch-isolation tests — async + sync paths: failing call isolated, siblings succeed, tool_call_ids preserved.
Followed RED→GREEN for both.
test_thinking_metadata_persistence.py 14/14, test_tool_executor_parallel.py 9/9, tests/unit/core/ + tests/regression/ 202 passed / 18 skipped.
2 pre-existing failures in TestEndpointHashAndFingerprint (test pollution, fail on clean tree too) — out of scope.

Notes / not in scope

Mixed async/sync tool batches still degrade (sync tool blocks the event loop). That's a separate design item — needs tool concurrency classification — deliberately not bundled here.
Existing broken timelines (e.g. atlas-q1) stay unrecoverable; commit 1 fixes capture going forward only.

GPT-5/o3/o4 turns return a reasoning item (rs_… + encrypted_content) with an empty summary on low-summary turns — typically single-call tool continuations. The AGENT_THOUGHTS gate in _record_think_results keyed on summary text, so these turns produced no timeline entry and the encrypted reasoning item was silently dropped. On resume the affected turns replay with no reasoning state, breaking cross-turn reasoning continuity for GPT-5/o3/o4. Gate now keys on reasoning_items presence, not summary text. Entry is emitted with empty content when only the item is present; metadata still carries the item for replay. Add TestEmptySummaryReasoningPersistence covering tool-call and direct-answer branches plus the no-items negative case.

A dispatch-phase exception (registry getattr, name parsing, object lookup) escaped before the inner try block in _execute_single_call and _execute_single_call_async. In the async path it propagated out of asyncio.gather, discarding the entire batch's results — including calls that succeeded. The TOOL_CALL entry still recorded N tool_call_ids, so the next OpenAI turn 400s on the unanswered tool calls. - Wrap each single-call dispatcher in one outer guard covering dispatch and execution; both are now non-raising. Removes the redundant inner try blocks. - execute_tools_async: asyncio.gather(return_exceptions=True) and convert any escaped exception to an error result — defense-in-depth. Every tool_call_id in a batch now always gets a result. Add isolation tests covering async and sync paths: a failing call yields an isolated error result, siblings succeed, tool_call_ids preserved.

ngoclam9415 added 2 commits May 18, 2026 21:56

ngoclam9415 changed the title ~~fix(timeline): persist reasoning items on empty-summary turns~~ fix(timeline,tools): GPT-5 timeline robustness — reasoning-item drop + tool-batch isolation May 18, 2026

ngoclam9415 merged commit 1e31c41 into develop May 19, 2026
1 check failed

TheVinhLuong102 deleted the fix/reasoning-item-empty-summary-drop branch May 21, 2026 22:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(timeline,tools): GPT-5 timeline robustness — reasoning-item drop + tool-batch isolation#11

fix(timeline,tools): GPT-5 timeline robustness — reasoning-item drop + tool-batch isolation#11
ngoclam9415 merged 2 commits into
developfrom
fix/reasoning-item-empty-summary-drop

ngoclam9415 commented May 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ngoclam9415 commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Commit 1 — persist reasoning items on empty-summary turns

Problem

Fix

Commit 2 — isolate tool-call failures within a batch

Problem

Fix

Tests

Notes / not in scope

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ngoclam9415 commented May 18, 2026 •

edited

Loading