Skip to content

fix: break positive feedback loop for sessions stuck with large history#354

Merged
PureWeen merged 2 commits intomainfrom
fix/session-session-20260309-113058-can-you-20260311-1914
Mar 13, 2026
Merged

fix: break positive feedback loop for sessions stuck with large history#354
PureWeen merged 2 commits intomainfrom
fix/session-session-20260309-113058-can-you-20260311-1914

Conversation

@PureWeen
Copy link
Copy Markdown
Owner

Problem

Sessions with very large message histories (212+ messages, e.g. session-20260309-113058) enter a repeated stuck cycle:

  1. \SendAsync\ is called with \CancellationToken.None\ (SDK bug workaround) — no transport timeout
  2. Server takes long to process large context or transport hangs
  3. Watchdog fires after 120s → adds Session appears stuck system message to history
  4. User retries → same result, but history is now +2 messages larger
  5. Positive feedback loop: each stuck cycle grows history, increasing probability of next failure

Fix (3 parts)

1. SendAsync timeout wrapper (60s)

Wraps both primary and retry \SendAsync\ calls with \Task.WhenAny\ + \Task.Delay(60_000). On timeout, throws \TimeoutException\ which routes to the existing reconnect/error path. The SDK \CancellationToken.None\ workaround is preserved.

2. ConsecutiveStuckCount tracking

Adds \ConsecutiveStuckCount\ property to \AgentSessionInfo. Incremented by watchdog Case C (timeout kill). Reset to 0 on successful \CompleteResponse.

3. Feedback loop breaker

When \ConsecutiveStuckCount >= 3:

  • Skips adding system error messages to history (prevents unbounded growth)
  • Clears the message queue (prevents automatic re-dispatch)
  • Shows a clear warning suggesting the user start a new session

Pre-existing test fix

\CompleteResponse_Source_ClearsSendingFlag\ had a 5000-char search window that was too small on main (SendingFlag at offset 5068). Increased to 6000.

Tests

18 new tests in \ConsecutiveStuckSessionTests.cs\ covering:

  • ConsecutiveStuckCount defaults, incrementing, and reset
  • CompleteResponse resetting the counter
  • Watchdog increment, history skip, and queue clear logic
  • SendAsync timeout constant and patterns
  • Error message quality

All 2437 tests pass. Windows build succeeds.

PureWeen and others added 2 commits March 13, 2026 13:00
Sessions with very large message histories (212+ messages) enter a repeated
stuck cycle: SendAsync hangs (no timeout due to SDK CancellationToken.None
workaround) → watchdog fires after 120s → adds 'stuck' system message to
history → user retries → history grows → next failure more likely.

Three-part fix:
1. SendAsync timeout wrapper (60s) using Task.WhenAny + Task.Delay, routes
   to existing reconnect/error path on timeout
2. ConsecutiveStuckCount on AgentSessionInfo, incremented by watchdog Case C,
   reset on successful CompleteResponse
3. Feedback loop breaker: after 3+ consecutive stucks, skip adding system
   messages to history and clear queued auto-retries

Also fixes pre-existing test issue: CompleteResponse_Source_ClearsSendingFlag
had a 5000-char search window that was too small (offset was 5068 on main).
Increased to 6000.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add ContinueWith(OnlyOnFaulted) observation to the sendTask and retryTask
after SendAsync timeout, matching the established pattern at WsBridgeClient.cs:162.
This prevents UnobservedTaskException when the abandoned tasks eventually fault
after the session is disposed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen PureWeen force-pushed the fix/session-session-20260309-113058-can-you-20260311-1914 branch from c7f21d8 to 3d026a2 Compare March 13, 2026 18:03
@PureWeen PureWeen merged commit 52e90ff into main Mar 13, 2026
@PureWeen PureWeen deleted the fix/session-session-20260309-113058-can-you-20260311-1914 branch March 13, 2026 18:13
arisng pushed a commit to arisng/PolyPilot that referenced this pull request Apr 4, 2026
…ry (PureWeen#354)

## Problem
Sessions with very large message histories (212+ messages, e.g.
session-20260309-113058) enter a repeated stuck cycle:
1. \SendAsync\ is called with \CancellationToken.None\ (SDK bug
workaround) — no transport timeout
2. Server takes long to process large context or transport hangs
3. Watchdog fires after 120s → adds *Session appears stuck* system
message to history
4. User retries → same result, but history is now +2 messages larger
5. **Positive feedback loop**: each stuck cycle grows history,
increasing probability of next failure

## Fix (3 parts)

### 1. SendAsync timeout wrapper (60s)
Wraps both primary and retry \SendAsync\ calls with \Task.WhenAny\ +
\Task.Delay(60_000)\. On timeout, throws \TimeoutException\ which routes
to the existing reconnect/error path. The SDK \CancellationToken.None\
workaround is preserved.

### 2. ConsecutiveStuckCount tracking
Adds \ConsecutiveStuckCount\ property to \AgentSessionInfo\. Incremented
by watchdog Case C (timeout kill). Reset to 0 on successful
\CompleteResponse\.

### 3. Feedback loop breaker
When \ConsecutiveStuckCount >= 3\:
- Skips adding system error messages to history (prevents unbounded
growth)
- Clears the message queue (prevents automatic re-dispatch)
- Shows a clear warning suggesting the user start a new session

## Pre-existing test fix
\CompleteResponse_Source_ClearsSendingFlag\ had a 5000-char search
window that was too small on main (SendingFlag at offset 5068).
Increased to 6000.

## Tests
18 new tests in \ConsecutiveStuckSessionTests.cs\ covering:
- ConsecutiveStuckCount defaults, incrementing, and reset
- CompleteResponse resetting the counter
- Watchdog increment, history skip, and queue clear logic
- SendAsync timeout constant and patterns
- Error message quality

All 2437 tests pass. Windows build succeeds.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant