Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
201 changes: 201 additions & 0 deletions docs/ai/monitoring/agents/sampling.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
---
title: Sampling Strategies
sidebar_order: 17
description: "Configure trace sampling to capture 100% of AI agent runs without sampling all traffic."
keywords:
- AI sampling
- tracesSampler
- agent tracing
- head-based sampling
- gen_ai spans
---

Sentry uses head-based sampling, which means the sampling decision happens once at the root span. All child spans, including `gen_ai.*` operations, inherit that decision. If the root span is dropped, every nested LLM call, tool execution, and agent handoff in the trace is lost with it.

Agent runs can produce many spans per execution. You either capture the full span tree or lose it entirely. A 10% sample rate means you lose visibility into 90% of agent failures.

## Sample Standalone Agent Runs at 100%

When agent runs are the root span (cron jobs, queue consumers, CLI scripts), match on the `gen_ai.*` operation prefix:

### JavaScript

```javascript
Sentry.init({
dsn: process.env.SENTRY_DSN,
tracesSampler: ({ name, attributes, inheritOrSampleWith }) => {
if (
attributes?.["sentry.op"]?.startsWith("gen_ai.") ||
attributes?.["gen_ai.system"]
) {
return 1.0;
}

return inheritOrSampleWith(0.2);
},
});
```

### Python

```python
def traces_sampler(sampling_context):
op = sampling_context.get("transaction_context", {}).get("op", "")

if op.startswith("gen_ai."):
return 1.0

parent = sampling_context.get("parent_sampled")
if parent is not None:
return float(parent)

return 0.2

sentry_sdk.init(dsn="...", traces_sampler=traces_sampler)
```

## Sample HTTP Routes That Serve AI Features

When agent runs are nested inside HTTP request handlers (the more common case), the root span is the HTTP transaction. You need to identify the routes that trigger AI work and sample those at 100%:

### JavaScript

```javascript
Sentry.init({
dsn: process.env.SENTRY_DSN,
tracesSampler: ({ name, attributes, inheritOrSampleWith }) => {
// Standalone gen_ai root spans
if (
attributes?.["sentry.op"]?.startsWith("gen_ai.") ||
attributes?.["gen_ai.system"]
) {
return 1.0;
}

// HTTP routes that serve AI features
if (
name?.includes("/api/chat") ||
name?.includes("/api/agent") ||
name?.includes("/api/generate")
) {
return 1.0;
}

return inheritOrSampleWith(0.2);
},
});
```

### Python

```python
def traces_sampler(sampling_context):
tx_context = sampling_context.get("transaction_context", {})
op = tx_context.get("op", "")
name = tx_context.get("name", "")

# Standalone gen_ai root spans
if op.startswith("gen_ai."):
return 1.0

# HTTP routes that serve AI features
if op == "http.server" and any(
p in name for p in ["/api/chat", "/api/agent", "/api/generate"]
):
return 1.0

# Honor parent decision in distributed traces
parent = sampling_context.get("parent_sampled")
if parent is not None:
return float(parent)

return 0.2

sentry_sdk.init(dsn="...", traces_sampler=traces_sampler)
```

<Alert level="info">

Replace `/api/chat`, `/api/agent`, and `/api/generate` with the actual routes in your application that handle AI requests.

</Alert>

## Cost Comparison

LLM API calls cost significantly more per agent run than the trace events Sentry ingests for the same run. Dropping AI traces to save on observability doesn't make sense when the LLM calls behind them cost orders of magnitude more.

## Supplement With Metrics and Logs

If 100% trace sampling isn't feasible at your scale, you can supplement lower trace rates with metrics and structured logs that are emitted on every LLM call, regardless of sampling.

### Metrics

Emit [custom metrics](/product/explore/metrics/) on every LLM call to track token usage, latency, and error rates independently of traces:

```javascript
import * as Sentry from "@sentry/node";

Sentry.metrics.distribution("gen_ai.token_usage", result.usage.totalTokens, {
unit: "none",
attributes: {
model: "claude-sonnet-4-6",
endpoint: "/api/chat",
},
});

Sentry.metrics.distribution("gen_ai.latency", responseTimeMs, {
unit: "millisecond",
attributes: { model: "claude-sonnet-4-6" },
});
```

```python
import sentry_sdk

sentry_sdk.metrics.distribution(
"gen_ai.token_usage",
result.usage.total_tokens,
attributes={
"model": "claude-sonnet-4-6",
"endpoint": "/api/chat",
},
)

sentry_sdk.metrics.distribution(
"gen_ai.latency",
response_time_ms,
unit="millisecond",
attributes={"model": "claude-sonnet-4-6"},
)
```

### Structured Logs

Use [Sentry structured logging](/product/explore/logs/) to capture per-call details:

```javascript
Sentry.logger.info("LLM call completed", {
model: "claude-sonnet-4-6",
input_tokens: result.usage.promptTokens,
output_tokens: result.usage.completionTokens,
latency_ms: responseTimeMs,
status: "success",
});
```

```python
sentry_sdk.logger.info(
"LLM call completed",
model="claude-sonnet-4-6",
input_tokens=result.usage.prompt_tokens,
output_tokens=result.usage.completion_tokens,
latency_ms=response_time_ms,
status="success",
)
```

## Next Steps

- [Getting Started](/ai/monitoring/agents/getting-started/)
- [Model Costs](/ai/monitoring/agents/costs/)
- [AI Agents Dashboard](/ai/monitoring/agents/dashboards/)
Loading