fix: retry transient 'thinking blocks cannot be modified' 400s from Bedrock#468
fix: retry transient 'thinking blocks cannot be modified' 400s from Bedrock#468seanturner83 wants to merge 2 commits into
Conversation
…edrock
Bedrock occasionally returns HTTP 400 claiming the assistant's thinking
blocks have been modified, even when the payload is structurally valid.
The same payload replayed immediately succeeds, indicating a transient
server-side condition rather than a client bug.
Observed on bedrock/us.anthropic.claude-sonnet-4-6 with adaptive thinking
(reasoning_effort=high maps to {type: "adaptive"} on claude-4-6). The
error message is:
messages.N.content.M: `thinking` or `redacted_thinking` blocks in the
latest assistant message cannot be modified. These blocks must remain
as they were in the original response.
Reproduced during a large-scale multi-scan study (800+ Strix runs). Most
failures clear within seconds but we observed one case where the error
persisted across ~2 minutes of backoff, so the retry budget allows up to
~5 minutes total.
Fix:
- Add _is_transient_thinking_error() helper that matches on the status
code (400) AND the characteristic message ("thinking" + "cannot be
modified"), to avoid treating unrelated 400s as transient.
- On detection, retry with exponential backoff (5s, 10s, 20s, 40s, 80s,
160s — total ~5 min) before falling through to the generic retry path.
- Self-contained: the detector checks the exception directly, so it works
with or without other 400 handlers in place.
Testing:
- Replayed 33 captured payloads that appeared adjacent to a failure: all
succeeded, confirming the condition is transient and not a client-side
malformation.
- In a multi-repo scan study, 3-retry budget was insufficient in one case
(exhausted retries over ~30s). The 6-retry budget implemented here
succeeded on the same repo on a subsequent attempt.
Greptile SummaryThis PR adds a targeted retry path for Bedrock's transient
Confidence Score: 3/5The fix improves resilience against the observed transient error but does not behave as documented with default configuration — the full 6-retry budget and generic-retry fallthrough are unreachable. A P1 logic defect: the transient retry counter shares the outer attempt slots, so max_retries=5 caps effective transient retries at 5 (not 6) and prevents generic fallthrough. The change is otherwise self-contained and additive. strix/llm/llm.py — generate method retry loop (lines 176–181) Important Files Changed
Prompt To Fix All With AIThis is a comment left during a code review.
Path: strix/llm/llm.py
Line: 176-181
Comment:
**Transient retries share the outer `attempt` budget**
The `continue` on line 181 advances the `for attempt in range(max_retries + 1)` counter, so each transient thinking retry consumes one generic-retry slot. With the default `max_retries=5` the loop runs for attempts 0–5 (6 iterations total). On attempt 5, `attempt >= max_retries` is `True`, so the `_raise_error` branch fires before the 6th sleep — the documented 160 s back-off never runs and the "fall through to generic retry" path is unreachable with default config.
The fix as described in the PR (independent 6-retry budget that falls through to generic retry) requires the transient counter to drive its own inner loop without consuming outer `attempt` slots, e.g.:
```python
if self._is_transient_thinking_error(e):
if transient_thinking_retries < 6:
transient_thinking_retries += 1
await asyncio.sleep(min(240, 5 * (2 ** (transient_thinking_retries - 1))))
continue # retry WITHOUT advancing attempt
# exhausted transient budget; fall through to generic retry below
```
But `continue` in a `for` loop always advances the iterator — a separate `while True` inner retry loop is needed to keep `attempt` from incrementing.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix: retry transient 'thinking blocks ca..." | Re-trigger Greptile |
Greptile caught a real logic bug in the initial patch: the `continue` inside the transient-thinking branch advanced the outer `for attempt in range(max_retries + 1)` counter, so each transient retry consumed a generic-retry slot. With the default `max_retries=5`: - Loop runs 6 iterations (attempts 0–5) - Each transient retry burned one iteration - On attempt 5, `attempt >= max_retries` fired `_raise_error` before the 6th sleep (160s) ever ran - The documented "fall through to generic retry" path was unreachable Fix: inner `while` loop that does its own `_stream()` retry without advancing the outer `attempt` counter. The transient budget of 6 (5 / 10 / 20 / 40 / 80 / 160 s) is now independent of `max_retries`, and once the inner budget is exhausted the most recent exception falls through to the generic retry path as the PR originally intended. Also extracts the literal `6` into `max_transient_thinking_retries` for readability — still unconfigurable since the envelope budget is based on observed Bedrock transient durations, not something we want users tuning blindly.
Pulls usestrix#467 (feat: add resume session feature) ahead of upstream merge. Enables unattended scans to survive crashes or forced termination (e.g. AWS STS session expiry mid-scan) by appending every LLM message to strix_runs/<run>/conversation.jsonl and replaying on --resume. New CLI: strix --list-sessions # tabulate past scans strix --continue (or -c) # resume most recent strix --resume <run_name> # resume by name strix --resume # open interactive picker Motivates: https://github.com/seedcx/strix-scan-workflow/actions/runs/25116247499 — trade-api flag-aware scan exited at 71min with APIConnectionError ("security token included in the request is expired"). Session had produced 10 findings; all lost once the report upload also hit ExpiredToken. With this commit merged, the CI composite action can wrap strix in a re-assume + --continue loop so each sub-session fits inside the 60min STS budget and total scan duration is bounded only by the overall workflow timeout. Upstream integration: - Once usestrix#467 merges, drop from the seedcx-build recipe - Until then, include alongside usestrix#454 (memory compression), usestrix#460 (truncate retry), usestrix#468 (thinking retry), this one, and the adaptive-thinking commit on this branch Clean 3-way merge, no conflicts. Smoke tests: - uv sync succeeds - strix --help surfaces --resume / --continue / --list-sessions - imports (strix.sessions.resume, strix.telemetry.conversation_log) OK Follow-up (separate): strix-scan-workflow README recipe + composite action consumer changes to invoke resume on re-dispatch.
Summary
Bedrock occasionally returns HTTP 400 claiming the assistant's thinking blocks have been modified, even when the payload is structurally valid. The identical payload replayed immediately succeeds. This is a transient server-side condition, not a client bug.
Observed on
bedrock/us.anthropic.claude-sonnet-4-6with adaptive thinking (reasoning_effort=high→{type: "adaptive"}on claude-4-6). Error text:Fix
_is_transient_thinking_error()that matches onstatus_code == 400and the characteristic message ("thinking"+"cannot be modified"). Deliberately strict to avoid hijacking unrelated 400s.Evidence
Test plan
5s, 10s, 20s, 40s, 80s, 160s; total 315s