Conversation
Sessions with very large message histories (212+ messages) enter a repeated stuck cycle: SendAsync hangs (no timeout due to SDK CancellationToken.None workaround) → watchdog fires after 120s → adds 'stuck' system message to history → user retries → history grows → next failure more likely. Three-part fix: 1. SendAsync timeout wrapper (60s) using Task.WhenAny + Task.Delay, routes to existing reconnect/error path on timeout 2. ConsecutiveStuckCount on AgentSessionInfo, incremented by watchdog Case C, reset on successful CompleteResponse 3. Feedback loop breaker: after 3+ consecutive stucks, skip adding system messages to history and clear queued auto-retries Also fixes pre-existing test issue: CompleteResponse_Source_ClearsSendingFlag had a 5000-char search window that was too small (offset was 5068 on main). Increased to 6000. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add ContinueWith(OnlyOnFaulted) observation to the sendTask and retryTask after SendAsync timeout, matching the established pattern at WsBridgeClient.cs:162. This prevents UnobservedTaskException when the abandoned tasks eventually fault after the session is disposed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
c7f21d8 to
3d026a2
Compare
arisng
pushed a commit
to arisng/PolyPilot
that referenced
this pull request
Apr 4, 2026
…ry (PureWeen#354) ## Problem Sessions with very large message histories (212+ messages, e.g. session-20260309-113058) enter a repeated stuck cycle: 1. \SendAsync\ is called with \CancellationToken.None\ (SDK bug workaround) — no transport timeout 2. Server takes long to process large context or transport hangs 3. Watchdog fires after 120s → adds *Session appears stuck* system message to history 4. User retries → same result, but history is now +2 messages larger 5. **Positive feedback loop**: each stuck cycle grows history, increasing probability of next failure ## Fix (3 parts) ### 1. SendAsync timeout wrapper (60s) Wraps both primary and retry \SendAsync\ calls with \Task.WhenAny\ + \Task.Delay(60_000)\. On timeout, throws \TimeoutException\ which routes to the existing reconnect/error path. The SDK \CancellationToken.None\ workaround is preserved. ### 2. ConsecutiveStuckCount tracking Adds \ConsecutiveStuckCount\ property to \AgentSessionInfo\. Incremented by watchdog Case C (timeout kill). Reset to 0 on successful \CompleteResponse\. ### 3. Feedback loop breaker When \ConsecutiveStuckCount >= 3\: - Skips adding system error messages to history (prevents unbounded growth) - Clears the message queue (prevents automatic re-dispatch) - Shows a clear warning suggesting the user start a new session ## Pre-existing test fix \CompleteResponse_Source_ClearsSendingFlag\ had a 5000-char search window that was too small on main (SendingFlag at offset 5068). Increased to 6000. ## Tests 18 new tests in \ConsecutiveStuckSessionTests.cs\ covering: - ConsecutiveStuckCount defaults, incrementing, and reset - CompleteResponse resetting the counter - Watchdog increment, history skip, and queue clear logic - SendAsync timeout constant and patterns - Error message quality All 2437 tests pass. Windows build succeeds. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Sessions with very large message histories (212+ messages, e.g. session-20260309-113058) enter a repeated stuck cycle:
Fix (3 parts)
1. SendAsync timeout wrapper (60s)
Wraps both primary and retry \SendAsync\ calls with \Task.WhenAny\ + \Task.Delay(60_000). On timeout, throws \TimeoutException\ which routes to the existing reconnect/error path. The SDK \CancellationToken.None\ workaround is preserved.
2. ConsecutiveStuckCount tracking
Adds \ConsecutiveStuckCount\ property to \AgentSessionInfo. Incremented by watchdog Case C (timeout kill). Reset to 0 on successful \CompleteResponse.
3. Feedback loop breaker
When \ConsecutiveStuckCount >= 3:
Pre-existing test fix
\CompleteResponse_Source_ClearsSendingFlag\ had a 5000-char search window that was too small on main (SendingFlag at offset 5068). Increased to 6000.
Tests
18 new tests in \ConsecutiveStuckSessionTests.cs\ covering:
All 2437 tests pass. Windows build succeeds.