Skip to content
148 changes: 148 additions & 0 deletions sdk/guides/security.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -444,6 +444,154 @@ agent = Agent(llm=llm, tools=tools, security_analyzer=security_analyzer)
For more details on the base class implementation, see the [source code](https://github.com/OpenHands/software-agent-sdk/blob/main/openhands-sdk/openhands/sdk/security/analyzer.py).
</Tip>

### Defense-in-Depth Security Analyzer

#### The problem

Your agent is about to run a tool call. Is it safe?

The `LLMSecurityAnalyzer` asks the model itself — but the model can be
manipulated, and encoding tricks can hide dangerous commands from it.
You need a layer that does not depend on model judgment: something
deterministic, local, and fast.

#### What this gives you

Three composable analyzers that classify actions at the boundary —
before the tool runs, not after. No network calls, no model inference,
no extra dependencies. They return a `SecurityRisk` level; your
`ConfirmRisky` policy decides whether to prompt the user.

| Analyzer | What it catches | How it works |
|----------|----------------|--------------|
| `PatternSecurityAnalyzer` | Known threat signatures (rm -rf, eval, curl\|sh) | Regex patterns on two corpora: shell patterns scan executable fields only; injection patterns scan all fields |
| `PolicyRailSecurityAnalyzer` | Composed threats (fetch piped to exec, raw disk writes, catastrophic deletes) | Deterministic rules evaluated per-segment — both tokens must appear in the same field |
| `EnsembleSecurityAnalyzer` | Nothing on its own — it combines the others | Takes the highest concrete risk across all child analyzers |

#### Quick start

You must configure both the analyzer and the confirmation policy.
Setting an analyzer does not automatically change confirmation behavior.

```python icon="python" focus={7-18}
from openhands.sdk import Conversation
from openhands.sdk.security import (
PatternSecurityAnalyzer,
PolicyRailSecurityAnalyzer,
EnsembleSecurityAnalyzer,
ConfirmRisky,
SecurityRisk,
)

# Create the analyzer — rails catch composed threats,
# patterns catch individual signatures
security_analyzer = EnsembleSecurityAnalyzer(
analyzers=[
PolicyRailSecurityAnalyzer(),
PatternSecurityAnalyzer(),
]
)

# Tell the SDK when to ask the user — HIGH is the recommended baseline
confirmation_policy = ConfirmRisky(threshold=SecurityRisk.HIGH)

# Wire both into the conversation
# Assumes `agent` is already configured — see Quick Start guide
conversation = Conversation(agent=agent, workspace=".")
conversation.set_security_analyzer(security_analyzer)
conversation.set_confirmation_policy(confirmation_policy)
```

After this, every agent action passes through the analyzer before
execution. HIGH-risk actions trigger a confirmation prompt — the user
sees the risk level and can approve or reject before the tool runs.
MEDIUM and LOW are allowed. UNKNOWN is confirmed by default
(`confirm_unknown=True`).

For security-sensitive environments, lower the threshold to catch more:

```python
# Stricter posture — MEDIUM and above require confirmation
confirmation_policy = ConfirmRisky(threshold=SecurityRisk.MEDIUM)
```

<Warning>
`conversation.execute_tool()` bypasses the analyzer and confirmation
policy. These analyzers protect agent actions in the conversation
loop, not direct tool calls.
</Warning>

#### Adding the LLM analyzer for deeper coverage

The pattern analyzer catches known threats instantly. The LLM analyzer
can catch novel or ambiguous cases. Composing both gives you speed and
breadth:

```python
from openhands.sdk.security import LLMSecurityAnalyzer

security_analyzer = EnsembleSecurityAnalyzer(
analyzers=[
PolicyRailSecurityAnalyzer(),
PatternSecurityAnalyzer(),
LLMSecurityAnalyzer(),
]
)

confirmation_policy = ConfirmRisky(threshold=SecurityRisk.HIGH)
```

The ensemble takes the worst case across all analyzers. If the pattern
analyzer says HIGH and the LLM says LOW, the result is HIGH.

#### Why it works this way

**Two corpora, not one.** An agent that runs `ls /tmp` but thinks
"I should avoid rm -rf /" is not flagged — shell patterns only see
the `ls /tmp` that will actually execute. Injection patterns like
"ignore all previous instructions" scan everything, because they
target the model's instruction-following regardless of where they
appear.

**Max-severity, not averaging.** The analyzers scan the same input —
they are correlated, not independent. The highest concrete risk wins.
That is simpler and more auditable than probabilistic fusion.

**UNKNOWN means "I don't know," not "safe."** If all analyzers return
UNKNOWN, the ensemble preserves it. Under the default `ConfirmRisky`
policy, UNKNOWN triggers confirmation. Promoting UNKNOWN to HIGH
would make optional analyzers unusable.

**Confirm, don't block.** The analyzers return a risk level. The
confirmation policy decides what happens. The analyzer does not
prevent execution — it classifies risk for the policy layer to act on.
Pair with Docker isolation for stronger safety guarantees.

#### What this does not do

This is a deterministic action-boundary control. It is not:

- A complete prompt-injection solution
- A full shell parser or AST interpreter
- A sandbox replacement
- A guarantee against novel threats the patterns do not cover

It is additive to `LLMSecurityAnalyzer` and `GraySwanAnalyzer`, not a
replacement for either.

#### Known limitations

| Limitation | Why | What would fix it |
|---|---|---|
| No hard-deny at the analyzer boundary | SDK analyzers return `SecurityRisk`, not block/allow | Hook-based enforcement |
| `execute_tool()` bypasses checks | Direct tool execution skips the conversation loop | Hooks |
| No Cyrillic/homoglyph detection | NFKC maps compatibility forms, not cross-script confusables | Unicode TR39 confusable tables |
| Content past 30k chars is invisible | Hard cap prevents regex denial-of-service | Raise the cap (increases ReDoS exposure) |
| `thinking_blocks` not scanned | Scanning model reasoning risks false positives on deliberation | Separate injection-only CoT scan |

<Note>
Ready-to-run example: [examples/01_standalone_sdk/47_defense_in_depth_security.py](https://github.com/OpenHands/software-agent-sdk/blob/main/examples/01_standalone_sdk/47_defense_in_depth_security.py)
</Note>

---

Expand Down