Skip to content

fix: recover from BadRequestError caused by oversized payloads#460

Open
seanturner83 wants to merge 5 commits into
usestrix:mainfrom
seanturner83:fix/bad-request-truncate-retry
Open

fix: recover from BadRequestError caused by oversized payloads#460
seanturner83 wants to merge 5 commits into
usestrix:mainfrom
seanturner83:fix/bad-request-truncate-retry

Conversation

@seanturner83
Copy link
Copy Markdown
Contributor

Title: fix: recover from BadRequestError caused by oversized payloads

Summary

  • Adds BadRequestError (HTTP 400) recovery to the LLM retry loop — first retries bare, then optionally truncates large <tool_result> XML blocks when STRIX_TRUNCATE_ON_OVERSIZE=true
  • Truncation is opt-in (default off) to avoid lossy recovery for users who don't want it
  • Observed on repos with 50KB+ config files (CIS benchmark YAML, large tfvars.json) that push Bedrock payload past its limit at ~6M tokens

Context

During large-scale scanning (700+ repos), certain repos with very large files caused persistent HTTP 400s that killed scans immediately. The two-stage approach handles both transient 400s (bare retry) and genuine payload oversize (truncation + retry).

How it works

  1. On HTTP 400, retry once after 2s (handles transient provider hiccups)
  2. If still failing and STRIX_TRUNCATE_ON_OVERSIZE=true, scan messages in reverse for <tool_result> blocks exceeding 2000 chars, truncate to 1000 chars with a "requires manual review" notice, retry
  3. If neither works, fall through to existing retry/error logic

Test plan

  • Confirmed bare retry resolves transient 400s in production scanning
  • Confirmed truncation triggers on repos with large tool results hitting Bedrock payload limits
  • Verify truncation notice appears in scan output when triggered
  • Verify flag defaults to off — no behavior change without explicit opt-in

…yloads

When tool results (e.g. large config files, CIS benchmark YAML) accumulate
in conversation history, the serialized payload can exceed the provider's
request size limit, causing a persistent HTTP 400. Previously this was not
retried, failing the scan immediately.

Now handles BadRequestError in two stages:
1. Bare retry after 2s (transient 400s from provider hiccups)
2. If STRIX_TRUNCATE_ON_OVERSIZE=true, truncates the largest tool_result
   XML blocks to 1000 chars with a "requires manual review" notice, then
   retries. This is opt-in to avoid lossy recovery for users who don't
   want it.

Observed in practice on repos with 50KB+ CIS defaults YAML and 94KB
tfvars.json files hitting Bedrock payload limits at ~6M tokens.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 20, 2026

Greptile Summary

This PR adds a two-stage HTTP 400 recovery path to the LLM retry loop: a bare retry after 2 s (for transient provider hiccups) followed by optional <tool_result> XML truncation when STRIX_TRUNCATE_ON_OVERSIZE=true (for genuine payload-oversize errors against Bedrock). The previous silent-exit-on-last-attempt bug and the missing import re at module level are both correctly addressed.

Confidence Score: 4/5

Safe to merge with minor caveats; core logic is sound and previous blocking issues are fixed.

No P0/P1 findings in the new code. Two P2 items remain: the bare retry fires on all 400s (not just oversize payloads), adding latency for permanent config errors; and structured type=tool_result content blocks are not covered by the truncation path. Neither blocks the stated use-case of recovering from Bedrock payload limits on large YAML/tfvars files.

strix/llm/llm.py — specifically the _is_bad_request catch scope and the _truncate_large_tool_results content-type coverage.

Important Files Changed

Filename Overview
strix/llm/llm.py Adds HTTP 400 recovery with bare retry and optional truncation; import re correctly moved to module-level; previous silent-exit and last-attempt guard bugs fixed; a few P2 edge cases remain around 400 misclassification and truncation coverage.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: strix/llm/llm.py
Line: 177-183

Comment:
**Bare retry fires on all HTTP 400s, not just payload oversize**

The `_is_bad_request` check intercepts every HTTP 400 — including persistent provider errors like invalid model name, unsupported parameter combinations, or malformed request bodies that will never recover. For these, the bare retry adds a guaranteed 2-second stall and an extra wasted attempt before failing. Consider at least logging a warning, or inspecting the error body for "payload too large" / "context window" signals before incurring the retry.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: strix/llm/llm.py
Line: 380-389

Comment:
**Structured `tool_result` content blocks are not truncated**

The code searches for `<tool_result>` XML only inside `type == "text"` content blocks. Some providers / litellm versions represent tool results as structured list entries with `type == "tool_result"` (and a nested `content` key holding the large body). Those are silently skipped, so truncation may not trigger even when the payload contains large structured tool results, leaving the caller in the `_raise_error` path without any reduction in payload size.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (5): Last reviewed commit: "Fix truncate list formatting and tool re..." | Re-trigger Greptile

Comment thread strix/llm/llm.py
Comment thread strix/llm/llm.py Outdated
Comment thread strix/llm/llm.py Outdated
@bearsyankees
Copy link
Copy Markdown
Collaborator

@greptile

Comment thread strix/llm/llm.py
…loads

Remove one-shot bad_request_truncated guard so truncation retries on each
400 until nothing remains to truncate. Also scan all messages per pass
instead of stopping at the first hit.

Addresses review feedback from Greptile on usestrix#460.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bearsyankees
Copy link
Copy Markdown
Collaborator

@greptile

…runcation retry

- Rename max_chars → threshold_chars and add separate truncate_to_chars
  parameter so the threshold and truncation target can differ by design,
  replacing the hardcoded 1000. Callers can now shrink blocks aggressively
  without re-processing blocks that are already acceptable.
- Add 2s sleep before the truncation-path `continue` to match the bare-retry
  pacing. If Bedrock is throttling after the original 400, immediately hitting
  it again risks a second rejection before the truncated payload is evaluated.
@seanturner83
Copy link
Copy Markdown
Contributor Author

Addressed both P2 points in 6e25294:

  • Split max_chars into threshold_chars (qualifier for truncation) and truncate_to_chars (retained prefix size), with a docstring explaining why they're independent. The hardcoded 1000 is gone.
  • Added await asyncio.sleep(2) before the truncation path continue to match the bare-retry pacing, with a comment explaining the intent.
    Ready for another Greptile pass whenever you want to re-trigger.

@bearsyankees
Copy link
Copy Markdown
Collaborator

@greptileai

@bearsyankees
Copy link
Copy Markdown
Collaborator

@greptile

seanturner83 added a commit to seanturner83/strix that referenced this pull request Apr 29, 2026
Pulls usestrix#467 (feat: add resume
session feature) ahead of upstream merge. Enables unattended scans
to survive crashes or forced termination (e.g. AWS STS session
expiry mid-scan) by appending every LLM message to
strix_runs/<run>/conversation.jsonl and replaying on --resume.

New CLI:
  strix --list-sessions         # tabulate past scans
  strix --continue (or -c)      # resume most recent
  strix --resume <run_name>     # resume by name
  strix --resume                # open interactive picker

Motivates:
  https://github.com/seedcx/strix-scan-workflow/actions/runs/25116247499
  — trade-api flag-aware scan exited at 71min with
    APIConnectionError ("security token included in the request is
    expired"). Session had produced 10 findings; all lost once the
    report upload also hit ExpiredToken. With this commit merged,
    the CI composite action can wrap strix in a re-assume + --continue
    loop so each sub-session fits inside the 60min STS budget and
    total scan duration is bounded only by the overall workflow
    timeout.

Upstream integration:
  - Once usestrix#467 merges, drop from the seedcx-build recipe
  - Until then, include alongside usestrix#454 (memory compression), usestrix#460
    (truncate retry), usestrix#468 (thinking retry), this one, and the
    adaptive-thinking commit on this branch

Clean 3-way merge, no conflicts. Smoke tests:
  - uv sync succeeds
  - strix --help surfaces --resume / --continue / --list-sessions
  - imports (strix.sessions.resume, strix.telemetry.conversation_log) OK

Follow-up (separate): strix-scan-workflow README recipe + composite
action consumer changes to invoke resume on re-dispatch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants