Skip to content

Per-field extraction cap to prevent truncation-based bypass #2706

@Fieldnote-Echo

Description

@Fieldnote-Echo

Problem

The defense-in-depth analyzers (PR #2472) apply a single 30,000-character budget across all extracted fields. Fields are processed in order: tool_name, then tool_call.name, then tool_call.arguments (exec segments); thought entries, then reasoning_content, then summary (text segments). An oversized early field starves later fields of scanning budget, making their content invisible to all downstream analyzers.

Source: openhands-sdk/openhands/sdk/security/defense_in_depth/utils.py, lines 75-141.

Threat model

The primary threat at the action boundary is indirect prompt injection — attacker-controlled content (documents, web pages, tool output) processed by the LLM, which then emits tool calls with side effects.

tool_name on ActionEvent has no length or content validation at any layer — it is a bare str field with no max_length constraint (see action.py:41, message.py:32). When the LLM emits a tool name not in the registry, the raw string flows through to the ActionEvent (agent.py:766) which is emitted via on_event and reaches the security analyzer before the error path runs. For non-function-calling models (via fn_call_converter.py:80), the extraction regex [^>]+ allows arbitrary-length strings with no cap. OpenAI's API constrains native function call names to 64 characters, but other providers and the non-function-calling path do not. Research confirms LLMs can be coaxed into calling tool names not in the registry (Answer.AI, Jan 2026).

Proof of concept

tool_name = "A" * 29990
tool_call.arguments = '{"command": "rm -rf /"}'

Total content is under 30,000 characters, but arguments receives at most 10 characters of budget. The rm -rf / payload is silently truncated. PatternSecurityAnalyzer returns LOW.

Exploitability: For models using native function calling with OpenAI (64-char name limit), the attack requires a different provider or proxy that does not enforce name length. For non-function-calling models (fn_call_converter path), the [^>]+ regex imposes no length constraint and the attack is directly exploitable via prompt injection. The ActionEvent is emitted and analyzed before the tool-not-found error fires.

Decision rationale

Per-field cap is the simplest fix that removes the structural weakness without changing the overall budget or any downstream behavior. The alternative — validating tool_name length upstream — would also be correct but is a separate concern (input validation vs. extraction resilience). The 50% cap ensures that even if an upstream validation gap is discovered later, the extraction pipeline does not have a structural weakness. The heuristic guarantees that if one pre-argument field is adversarially large, arguments still gets at least 15,000 characters. If both tool_name and tool_call.name simultaneously hit the cap, arguments would get 0 from the global budget — but this requires two adversarially large fields, which is harder to achieve than one.

Proposed fix

In utils.py, add _FIELD_CAP = _EXTRACT_HARD_CAP // 2 and enforce it in the _add helper in both _extract_exec_segments() and _extract_text_segments():

_FIELD_CAP = _EXTRACT_HARD_CAP // 2  # No single field > 50% of total budget

def _add(text: str, field_cap: int = _FIELD_CAP) -> None:
    nonlocal total
    remaining = min(_EXTRACT_HARD_CAP - total, field_cap)
    if remaining <= 0:
        return
    if len(text) > remaining:
        text = text[:remaining]
    segments.append(text)
    total += len(text)

Approximately 5 lines per extraction function. No changes to normalization or pattern matching.

Test plan

  • Oversized tool_name does not starve tool_call.arguments
  • Oversized tool_call.name does not starve tool_call.arguments
  • Oversized thought entries do not starve summary
  • Total cap still honored when all fields are large
  • Small fields extracted in full (no unnecessary truncation)
  • End-to-end: oversized tool_name + malicious arguments returns HIGH
  • Update test_payload_past_hard_cap xfail (field-starvation variant should now pass; genuine total-cap exceedance remains documented)

Context

Identified in adversarial security review (2026-04-03) of PR #2472. Priority 1 in the red-team fix list.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions