fix: retry transient 'thinking blocks cannot be modified' 400s from Bedrock by seanturner83 · Pull Request #468 · usestrix/strix

seanturner83 · 2026-04-27T21:27:34Z

Summary

Bedrock occasionally returns HTTP 400 claiming the assistant's thinking blocks have been modified, even when the payload is structurally valid. The identical payload replayed immediately succeeds. This is a transient server-side condition, not a client bug.

Observed on bedrock/us.anthropic.claude-sonnet-4-6 with adaptive thinking (reasoning_effort=high → {type: "adaptive"} on claude-4-6). Error text:

messages.N.content.M: `thinking` or `redacted_thinking` blocks in the latest
assistant message cannot be modified. These blocks must remain as they were
in the original response.

Fix

Add _is_transient_thinking_error() that matches on status_code == 400 and the characteristic message ("thinking" + "cannot be modified"). Deliberately strict to avoid hijacking unrelated 400s.
On detection, retry with exponential backoff: 5s, 10s, 20s, 40s, 80s, 160s (~5 min total budget) before falling through to generic retry.
Self-contained detector — works independently of other 400 handlers.

Evidence

Replayed 33 captured payloads from around an observed failure: all succeeded on replay, confirming transience.
In a multi-repo variance study (21+ scans × 7 repos of Sonnet 4.6 on real production code), the 3-retry budget was exhausted in one case (~30s). A 6-retry budget succeeded on a subsequent attempt with the same repo, same model, same config.

Test plan

Backoff schedule unit-check: 5s, 10s, 20s, 40s, 80s, 160s; total 315s
Detector smoke test: correctly matches a BedrockException with the thinking-blocks message, rejects oversize-payload 400s and generic 400s
No change to behaviour when the detector returns False (fall-through to existing retry path)

…edrock Bedrock occasionally returns HTTP 400 claiming the assistant's thinking blocks have been modified, even when the payload is structurally valid. The same payload replayed immediately succeeds, indicating a transient server-side condition rather than a client bug. Observed on bedrock/us.anthropic.claude-sonnet-4-6 with adaptive thinking (reasoning_effort=high maps to {type: "adaptive"} on claude-4-6). The error message is: messages.N.content.M: `thinking` or `redacted_thinking` blocks in the latest assistant message cannot be modified. These blocks must remain as they were in the original response. Reproduced during a large-scale multi-scan study (800+ Strix runs). Most failures clear within seconds but we observed one case where the error persisted across ~2 minutes of backoff, so the retry budget allows up to ~5 minutes total. Fix: - Add _is_transient_thinking_error() helper that matches on the status code (400) AND the characteristic message ("thinking" + "cannot be modified"), to avoid treating unrelated 400s as transient. - On detection, retry with exponential backoff (5s, 10s, 20s, 40s, 80s, 160s — total ~5 min) before falling through to the generic retry path. - Self-contained: the detector checks the exception directly, so it works with or without other 400 handlers in place. Testing: - Replayed 33 captured payloads that appeared adjacent to a failure: all succeeded, confirming the condition is transient and not a client-side malformation. - In a multi-repo scan study, 3-retry budget was insufficient in one case (exhausted retries over ~30s). The 6-retry budget implemented here succeeded on the same repo on a subsequent attempt.

greptile-apps · 2026-04-27T21:29:30Z

Greptile Summary

This PR adds a targeted retry path for Bedrock's transient "thinking blocks cannot be modified" 400 errors, using a dedicated _is_transient_thinking_error detector and exponential back-off (5–160 s, up to 6 retries).

Retry budget is not independent of max_retries: the continue in the transient handler advances the outer for attempt loop, so each transient retry consumes a generic-retry slot. With the default max_retries=5 the loop runs 6 total iterations, the 6th back-off (160 s) never fires, and the documented "fall through to generic retry" path is never reached. A separate inner loop is needed to keep the two retry budgets independent.

Confidence Score: 3/5

The fix improves resilience against the observed transient error but does not behave as documented with default configuration — the full 6-retry budget and generic-retry fallthrough are unreachable.

A P1 logic defect: the transient retry counter shares the outer attempt slots, so max_retries=5 caps effective transient retries at 5 (not 6) and prevents generic fallthrough. The change is otherwise self-contained and additive.

strix/llm/llm.py — generate method retry loop (lines 176–181)

Important Files Changed

Filename	Overview
strix/llm/llm.py	Adds transient-thinking-error detector and retry handler, but the retry loop shares the outer `attempt` counter, limiting effective retries to min(max_retries, 6)=5 with defaults and making the "fall through to generic retry" path unreachable.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: strix/llm/llm.py
Line: 176-181

Comment:
**Transient retries share the outer `attempt` budget**

The `continue` on line 181 advances the `for attempt in range(max_retries + 1)` counter, so each transient thinking retry consumes one generic-retry slot. With the default `max_retries=5` the loop runs for attempts 0–5 (6 iterations total). On attempt 5, `attempt >= max_retries` is `True`, so the `_raise_error` branch fires before the 6th sleep — the documented 160 s back-off never runs and the "fall through to generic retry" path is unreachable with default config.

The fix as described in the PR (independent 6-retry budget that falls through to generic retry) requires the transient counter to drive its own inner loop without consuming outer `attempt` slots, e.g.:

```python
if self._is_transient_thinking_error(e):
    if transient_thinking_retries < 6:
        transient_thinking_retries += 1
        await asyncio.sleep(min(240, 5 * (2 ** (transient_thinking_retries - 1))))
        continue  # retry WITHOUT advancing attempt
    # exhausted transient budget; fall through to generic retry below
```

But `continue` in a `for` loop always advances the iterator — a separate `while True` inner retry loop is needed to keep `attempt` from incrementing.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix: retry transient 'thinking blocks ca..." | Re-trigger Greptile}

Greptile caught a real logic bug in the initial patch: the `continue` inside the transient-thinking branch advanced the outer `for attempt in range(max_retries + 1)` counter, so each transient retry consumed a generic-retry slot. With the default `max_retries=5`: - Loop runs 6 iterations (attempts 0–5) - Each transient retry burned one iteration - On attempt 5, `attempt >= max_retries` fired `_raise_error` before the 6th sleep (160s) ever ran - The documented "fall through to generic retry" path was unreachable Fix: inner `while` loop that does its own `_stream()` retry without advancing the outer `attempt` counter. The transient budget of 6 (5 / 10 / 20 / 40 / 80 / 160 s) is now independent of `max_retries`, and once the inner budget is exhausted the most recent exception falls through to the generic retry path as the PR originally intended. Also extracts the literal `6` into `max_transient_thinking_retries` for readability — still unconfigurable since the envelope budget is based on observed Bedrock transient durations, not something we want users tuning blindly.

Pulls usestrix#467 (feat: add resume session feature) ahead of upstream merge. Enables unattended scans to survive crashes or forced termination (e.g. AWS STS session expiry mid-scan) by appending every LLM message to strix_runs/<run>/conversation.jsonl and replaying on --resume. New CLI: strix --list-sessions # tabulate past scans strix --continue (or -c) # resume most recent strix --resume <run_name> # resume by name strix --resume # open interactive picker Motivates: https://github.com/seedcx/strix-scan-workflow/actions/runs/25116247499 — trade-api flag-aware scan exited at 71min with APIConnectionError ("security token included in the request is expired"). Session had produced 10 findings; all lost once the report upload also hit ExpiredToken. With this commit merged, the CI composite action can wrap strix in a re-assume + --continue loop so each sub-session fits inside the 60min STS budget and total scan duration is bounded only by the overall workflow timeout. Upstream integration: - Once usestrix#467 merges, drop from the seedcx-build recipe - Until then, include alongside usestrix#454 (memory compression), usestrix#460 (truncate retry), usestrix#468 (thinking retry), this one, and the adaptive-thinking commit on this branch Clean 3-way merge, no conflicts. Smoke tests: - uv sync succeeds - strix --help surfaces --resume / --continue / --list-sessions - imports (strix.sessions.resume, strix.telemetry.conversation_log) OK Follow-up (separate): strix-scan-workflow README recipe + composite action consumer changes to invoke resume on re-dispatch.

greptile-apps Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread strix/llm/llm.py Outdated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: retry transient 'thinking blocks cannot be modified' 400s from Bedrock#468

fix: retry transient 'thinking blocks cannot be modified' 400s from Bedrock#468
seanturner83 wants to merge 2 commits into
usestrix:mainfrom
seanturner83:fix/transient-thinking-retry

seanturner83 commented Apr 27, 2026

Uh oh!

greptile-apps Bot commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

seanturner83 commented Apr 27, 2026

Summary

Fix

Evidence

Test plan

Uh oh!

greptile-apps Bot commented Apr 27, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant