Skip to content

fix: retry transient 'thinking blocks cannot be modified' 400s from Bedrock#468

Open
seanturner83 wants to merge 2 commits into
usestrix:mainfrom
seanturner83:fix/transient-thinking-retry
Open

fix: retry transient 'thinking blocks cannot be modified' 400s from Bedrock#468
seanturner83 wants to merge 2 commits into
usestrix:mainfrom
seanturner83:fix/transient-thinking-retry

Conversation

@seanturner83
Copy link
Copy Markdown
Contributor

Summary

Bedrock occasionally returns HTTP 400 claiming the assistant's thinking blocks have been modified, even when the payload is structurally valid. The identical payload replayed immediately succeeds. This is a transient server-side condition, not a client bug.

Observed on bedrock/us.anthropic.claude-sonnet-4-6 with adaptive thinking (reasoning_effort=high{type: "adaptive"} on claude-4-6). Error text:

messages.N.content.M: `thinking` or `redacted_thinking` blocks in the latest
assistant message cannot be modified. These blocks must remain as they were
in the original response.

Fix

  • Add _is_transient_thinking_error() that matches on status_code == 400 and the characteristic message ("thinking" + "cannot be modified"). Deliberately strict to avoid hijacking unrelated 400s.
  • On detection, retry with exponential backoff: 5s, 10s, 20s, 40s, 80s, 160s (~5 min total budget) before falling through to generic retry.
  • Self-contained detector — works independently of other 400 handlers.

Evidence

  • Replayed 33 captured payloads from around an observed failure: all succeeded on replay, confirming transience.
  • In a multi-repo variance study (21+ scans × 7 repos of Sonnet 4.6 on real production code), the 3-retry budget was exhausted in one case (~30s). A 6-retry budget succeeded on a subsequent attempt with the same repo, same model, same config.

Test plan

  • Backoff schedule unit-check: 5s, 10s, 20s, 40s, 80s, 160s; total 315s
  • Detector smoke test: correctly matches a BedrockException with the thinking-blocks message, rejects oversize-payload 400s and generic 400s
  • No change to behaviour when the detector returns False (fall-through to existing retry path)

…edrock

Bedrock occasionally returns HTTP 400 claiming the assistant's thinking
blocks have been modified, even when the payload is structurally valid.
The same payload replayed immediately succeeds, indicating a transient
server-side condition rather than a client bug.

Observed on bedrock/us.anthropic.claude-sonnet-4-6 with adaptive thinking
(reasoning_effort=high maps to {type: "adaptive"} on claude-4-6). The
error message is:

  messages.N.content.M: `thinking` or `redacted_thinking` blocks in the
  latest assistant message cannot be modified. These blocks must remain
  as they were in the original response.

Reproduced during a large-scale multi-scan study (800+ Strix runs). Most
failures clear within seconds but we observed one case where the error
persisted across ~2 minutes of backoff, so the retry budget allows up to
~5 minutes total.

Fix:
- Add _is_transient_thinking_error() helper that matches on the status
  code (400) AND the characteristic message ("thinking" + "cannot be
  modified"), to avoid treating unrelated 400s as transient.
- On detection, retry with exponential backoff (5s, 10s, 20s, 40s, 80s,
  160s — total ~5 min) before falling through to the generic retry path.
- Self-contained: the detector checks the exception directly, so it works
  with or without other 400 handlers in place.

Testing:
- Replayed 33 captured payloads that appeared adjacent to a failure: all
  succeeded, confirming the condition is transient and not a client-side
  malformation.
- In a multi-repo scan study, 3-retry budget was insufficient in one case
  (exhausted retries over ~30s). The 6-retry budget implemented here
  succeeded on the same repo on a subsequent attempt.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 27, 2026

Greptile Summary

This PR adds a targeted retry path for Bedrock's transient "thinking blocks cannot be modified" 400 errors, using a dedicated _is_transient_thinking_error detector and exponential back-off (5–160 s, up to 6 retries).

  • Retry budget is not independent of max_retries: the continue in the transient handler advances the outer for attempt loop, so each transient retry consumes a generic-retry slot. With the default max_retries=5 the loop runs 6 total iterations, the 6th back-off (160 s) never fires, and the documented "fall through to generic retry" path is never reached. A separate inner loop is needed to keep the two retry budgets independent.

Confidence Score: 3/5

The fix improves resilience against the observed transient error but does not behave as documented with default configuration — the full 6-retry budget and generic-retry fallthrough are unreachable.

A P1 logic defect: the transient retry counter shares the outer attempt slots, so max_retries=5 caps effective transient retries at 5 (not 6) and prevents generic fallthrough. The change is otherwise self-contained and additive.

strix/llm/llm.py — generate method retry loop (lines 176–181)

Important Files Changed

Filename Overview
strix/llm/llm.py Adds transient-thinking-error detector and retry handler, but the retry loop shares the outer attempt counter, limiting effective retries to min(max_retries, 6)=5 with defaults and making the "fall through to generic retry" path unreachable.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: strix/llm/llm.py
Line: 176-181

Comment:
**Transient retries share the outer `attempt` budget**

The `continue` on line 181 advances the `for attempt in range(max_retries + 1)` counter, so each transient thinking retry consumes one generic-retry slot. With the default `max_retries=5` the loop runs for attempts 0–5 (6 iterations total). On attempt 5, `attempt >= max_retries` is `True`, so the `_raise_error` branch fires before the 6th sleep — the documented 160 s back-off never runs and the "fall through to generic retry" path is unreachable with default config.

The fix as described in the PR (independent 6-retry budget that falls through to generic retry) requires the transient counter to drive its own inner loop without consuming outer `attempt` slots, e.g.:

```python
if self._is_transient_thinking_error(e):
    if transient_thinking_retries < 6:
        transient_thinking_retries += 1
        await asyncio.sleep(min(240, 5 * (2 ** (transient_thinking_retries - 1))))
        continue  # retry WITHOUT advancing attempt
    # exhausted transient budget; fall through to generic retry below
```

But `continue` in a `for` loop always advances the iterator — a separate `while True` inner retry loop is needed to keep `attempt` from incrementing.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix: retry transient 'thinking blocks ca..." | Re-trigger Greptile

Comment thread strix/llm/llm.py Outdated
Greptile caught a real logic bug in the initial patch: the `continue`
inside the transient-thinking branch advanced the outer
`for attempt in range(max_retries + 1)` counter, so each transient retry
consumed a generic-retry slot. With the default `max_retries=5`:

- Loop runs 6 iterations (attempts 0–5)
- Each transient retry burned one iteration
- On attempt 5, `attempt >= max_retries` fired `_raise_error` before the
  6th sleep (160s) ever ran
- The documented "fall through to generic retry" path was unreachable

Fix: inner `while` loop that does its own `_stream()` retry without
advancing the outer `attempt` counter. The transient budget of 6 (5 / 10
/ 20 / 40 / 80 / 160 s) is now independent of `max_retries`, and once
the inner budget is exhausted the most recent exception falls through
to the generic retry path as the PR originally intended.

Also extracts the literal `6` into `max_transient_thinking_retries` for
readability — still unconfigurable since the envelope budget is based on
observed Bedrock transient durations, not something we want users tuning
blindly.
seanturner83 added a commit to seanturner83/strix that referenced this pull request Apr 29, 2026
Pulls usestrix#467 (feat: add resume
session feature) ahead of upstream merge. Enables unattended scans
to survive crashes or forced termination (e.g. AWS STS session
expiry mid-scan) by appending every LLM message to
strix_runs/<run>/conversation.jsonl and replaying on --resume.

New CLI:
  strix --list-sessions         # tabulate past scans
  strix --continue (or -c)      # resume most recent
  strix --resume <run_name>     # resume by name
  strix --resume                # open interactive picker

Motivates:
  https://github.com/seedcx/strix-scan-workflow/actions/runs/25116247499
  — trade-api flag-aware scan exited at 71min with
    APIConnectionError ("security token included in the request is
    expired"). Session had produced 10 findings; all lost once the
    report upload also hit ExpiredToken. With this commit merged,
    the CI composite action can wrap strix in a re-assume + --continue
    loop so each sub-session fits inside the 60min STS budget and
    total scan duration is bounded only by the overall workflow
    timeout.

Upstream integration:
  - Once usestrix#467 merges, drop from the seedcx-build recipe
  - Until then, include alongside usestrix#454 (memory compression), usestrix#460
    (truncate retry), usestrix#468 (thinking retry), this one, and the
    adaptive-thinking commit on this branch

Clean 3-way merge, no conflicts. Smoke tests:
  - uv sync succeeds
  - strix --help surfaces --resume / --continue / --list-sessions
  - imports (strix.sessions.resume, strix.telemetry.conversation_log) OK

Follow-up (separate): strix-scan-workflow README recipe + composite
action consumer changes to invoke resume on re-dispatch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant