Skip to content

Make session replay data agent-actionable #907

@dcramer

Description

@dcramer

Summary

Make Session Replay data in sentry-cli highly actionable for coding agents by adding first-class replay segment fetching, local caching, normalized event extraction, and inspection commands.

The CLI should not try to be the agent or own an ask command as the primary interface. Instead, it should become the replay data plane: a reliable way for agents to fetch replay segments, cache them, inspect DOM/rrweb/custom events, search timelines, and pull evidence windows around user actions. Agents can then compose those tools, plus a bundled skill that explains the replay data model, to answer questions such as:

  • "When the user clicked X, what happened next?"
  • "Where did the user spend the most time?"
  • "What hangups caused the user to struggle?"
  • "Were there failed requests, console errors, dead clicks, rage clicks, or DOM changes around this action?"

Current State

The CLI currently has basic replay support:

  • sentry replay list queries replay metadata.
  • sentry replay view fetches replay detail, related issues/traces, and a very small activity preview from recording segments.
  • sentry explore --dataset replays exposes replay index fields.

Relevant CLI files:

  • src/commands/replay/list.ts
  • src/commands/replay/view.ts
  • src/lib/api/replays.ts
  • src/lib/formatters/replay.ts
  • src/lib/replay-search.ts
  • src/types/replay.ts

The main limitation is that replay segments are treated as display garnish. replay view currently extracts only a handful of activity events from raw segments, capped to a tiny preview. There is no replay-specific local cache, no normalized event stream, no segment index, and no way for agents to inspect all DOM/rrweb/custom events.

There is also a likely correctness gap: the Sentry recording-segments endpoint is paginated, while the CLI currently downloads it as a single request. The frontend fetches segment pages with per_page=100 until count_segments is exhausted. The CLI should mirror that behavior so long replays are fully available.

Relevant Sentry/rrweb Model

Sentry replay data has several useful layers:

  1. Replay metadata from org/project replay endpoints: duration, urls, counts, errors, traces, clicks, user/browser/os/sdk/device, tags, etc.
  2. Recording segments from the project-scoped recording-segments endpoint. These are compressed/packed storage blobs returned as rrweb/custom JSON when downloaded.
  3. rrweb events: full snapshots, incremental DOM mutations, mouse interactions, inputs, scrolls, viewport changes, media interactions, console logs, and more.
  4. Sentry custom replay frames: breadcrumbs, performance spans, options, video, web vitals, network breadcrumbs, console breadcrumbs, mobile events, etc.
  5. Related data: error events, feedback, trace ids, logs, click selector endpoints, and existing Seer replay summary APIs.

Useful Sentry references:

  • getsentry/sentry/src/sentry/replays/endpoints/project_replay_recording_segment_index.py
  • getsentry/sentry/src/sentry/replays/endpoints/project_replay_recording_segment_details.py
  • getsentry/sentry/src/sentry/replays/usecases/reader.py
  • getsentry/sentry/src/sentry/replays/usecases/pack.py
  • getsentry/sentry/src/sentry/replays/post_process.py
  • getsentry/sentry/src/sentry/replays/usecases/ingest/event_parser.py
  • getsentry/sentry/static/app/utils/replays/hooks/useReplayData.tsx
  • getsentry/sentry/static/app/utils/replays/hydrateFrames.tsx
  • getsentry/sentry/static/app/utils/replays/replayReader.tsx
  • @sentry-internal/rrweb-types, especially EventType, IncrementalSource, MouseInteractions, mutation/input/scroll/viewport payloads.

Proposal

Add a replay evidence system to the CLI with three parts:

  1. A local replay bundle cache.
  2. A normalized replay event model.
  3. Agent-friendly inspection commands and generated/bundled skill docs.

1. Replay Bundle Cache

Add a replay-specific cache under something like:

~/.sentry/cache/replays/{identity}/{org}/{project}/{replayId}/

Suggested contents:

metadata.json
segments/{segmentId}.json.gz
index/events.jsonl
index/navigation.json
index/interactions.json
index/network.json
index/problems.json
index/dom-summary.json

The raw segment payloads should live on disk, not in SQLite. SQLite can track manifests/cache lookup if useful, but segment blobs can be large and should be stored as private files.

Security/privacy requirements:

  • Cache directory should be 0700; files should be 0600.
  • Provide a way to bypass or clear replay cache.
  • Clear replay cache on auth/logout flows if appropriate.
  • Treat replay data as sensitive; do not attempt to unmask data that rrweb/Sentry masked.
  • Be explicit in outputs when text/DOM data is unavailable because it was masked or not captured.

Caching behavior:

  • Finished replays can generally be treated as immutable, subject to retention/privacy constraints.
  • Live or recently active replays should be refreshable because segment count may grow.
  • Avoid duplicating huge segment payloads in the generic HTTP response cache once replay-specific caching exists.

2. Normalized Replay Event Model

Introduce a normalized event schema that agents can rely on, regardless of whether the source was rrweb, a Sentry custom frame, a breadcrumb, a perf span, or related event data.

Example event:

{
  "replayId": "abc123",
  "segmentId": 12,
  "frameIndex": 184,
  "offsetMs": 83421,
  "timestamp": "2026-05-03T18:42:11.421Z",
  "kind": "click",
  "category": "interaction",
  "label": "button.checkout",
  "url": "/checkout",
  "selector": "button[data-test-id=checkout]",
  "nodeId": 982,
  "rawType": "IncrementalSnapshot",
  "rawSource": "MouseInteraction"
}

Initial event kinds should include:

  • navigation
  • click
  • tap
  • input
  • focus
  • blur
  • scroll
  • viewport
  • mutation
  • dom-snapshot
  • breadcrumb
  • network
  • console
  • error
  • span
  • web-vital
  • memory
  • video
  • mobile

Important implementation detail: centralize timestamp normalization. rrweb event timestamps are milliseconds, while breadcrumb/performance payload fields may use seconds. Sentry has existing frontend/backend logic for this; the CLI should port the relevant normalization and test it with fixtures.

3. Agent-Friendly Commands

sentry replay fetch <replay>

Fetch replay metadata, all recording segment pages, and optionally related errors/traces/logs. Build/update the local replay bundle and indexes.

Useful flags:

--force              Refresh cached data
--no-cache           Fetch but do not persist
--segments <range>   Fetch all or a subset for debugging
--include <list>     metadata,segments,errors,traces,logs,clicks
--json               Emit manifest and cache paths

sentry replay events <replay>

Primary agent primitive. Emit normalized replay events.

Useful flags:

--kind click,network,console,error,mutation
--from 01:20
--to 01:45
--contains checkout
--selector button.checkout
--url /checkout
--limit 200
--json
--jsonl
--raw                Include raw frame payload pointer or payload snippet

This command should make DOM/rrweb activity inspectable without requiring agents to know the raw segment layout.

sentry replay window <replay>

Return an evidence slice around a timestamp or event match.

Examples:

sentry replay window org/project/abc123 --at 01:23 --before 10s --after 30s
sentry replay window org/project/abc123 --contains checkout --before 5s --after 20s

Output should group nearby activity by category: interaction, navigation, network, console, errors, DOM mutations, spans, web vitals.

sentry replay search <replay> <query>

Fuzzy search over normalized event fields:

  • selectors
  • visible/unmasked text
  • URLs
  • network URLs
  • breadcrumb messages
  • console messages
  • error titles/messages
  • span descriptions

Return matching events with stable pointers: replay id, segment id, frame index, timestamp, offset.

sentry replay dom <replay>

Inspect DOM-related replay data.

Initial version can be event-based rather than full reconstruction:

sentry replay dom org/project/abc123 --at 01:23
sentry replay dom org/project/abc123 --from 01:20 --to 01:30 --kind mutation,input,scroll

Later versions can add best-effort DOM reconstruction from full snapshots + incremental mutations. A browser-backed rrweb player should be optional, not required for the core CLI flow.

sentry replay stats <replay>

Deterministic summary useful for orientation:

  • total duration
  • route/screen time
  • active vs idle time
  • most clicked selectors
  • slowest network calls
  • failed requests
  • console error count
  • rage/dead click count
  • largest DOM mutation bursts
  • poor web vital events

sentry replay struggles <replay>

Deterministic friction analysis. Rank likely struggle windows using signals such as:

  • dead clicks
  • rage clicks
  • repeated clicks on same element
  • clicks followed by no navigation/network/DOM change
  • failed fetch/xhr/resource requests
  • slow network requests
  • console errors
  • hydration errors
  • poor LCP/CLS
  • large mutation bursts
  • long idle periods immediately after interaction
  • repeated input/focus without success
  • back-and-forth navigation

Each finding should include evidence pointers and a recommended follow-up command.

Example output shape:

{
  "finding": "Repeated checkout clicks did not produce navigation",
  "severity": "medium",
  "window": {"fromOffsetMs": 81200, "toOffsetMs": 94600},
  "evidence": [
    {"kind": "click", "offsetMs": 83421, "segmentId": 12, "frameIndex": 184},
    {"kind": "click", "offsetMs": 87820, "segmentId": 12, "frameIndex": 211},
    {"kind": "network", "offsetMs": 88110, "status": 500, "url": "/api/checkout"}
  ],
  "nextCommand": "sentry replay window org/project/abc123 --at 01:23 --before 10s --after 30s"
}

Bundled Skill / Agent Documentation

Add generated or maintained skill docs that teach agents how to use these replay commands.

The skill should explain:

  1. Start with sentry replay fetch.
  2. Use sentry replay stats for orientation.
  3. Use sentry replay struggles to find likely friction.
  4. Use sentry replay search to locate user actions.
  5. Use sentry replay window around relevant timestamps.
  6. Use sentry replay events --kind ... for detailed evidence.
  7. Use sentry replay dom for DOM/mutation/input/scroll inspection.
  8. Cite segmentId, frameIndex, and offsets in conclusions.
  9. Treat masked text and absent DOM data as uncertainty.
  10. Do not invent user-visible text or DOM state when it was not captured.

This keeps the reasoning layer outside the CLI while making the CLI highly usable by agents.

Implementation Plan

Phase 1: Correct segment fetching

  • Add paginated segment download support with per_page=100.
  • Use count_segments from replay detail when available.
  • Add tests for multi-page segment responses.
  • Preserve current replay view behavior, but ensure it uses complete segments or explicitly reports partial data.

Phase 2: Add replay cache and fetch command

  • Add replay bundle storage helpers.
  • Add sentry replay fetch.
  • Store replay metadata and raw segment pages/segments.
  • Add cache manifest/versioning.
  • Add privacy-safe permissions.
  • Decide how replay cache interacts with generic response cache.

Phase 3: Normalize events

  • Define typed rrweb/custom event constants and minimal schemas.
  • Flatten segment frames into sorted normalized events.
  • Port relevant hydration behavior from Sentry frontend where appropriate.
  • Add fixtures for rrweb snapshots, mutations, inputs, clicks, network breadcrumbs, console breadcrumbs, perf spans, and web vitals.

Phase 4: Add inspection commands

  • Add replay events.
  • Add replay window.
  • Add replay search.
  • Add replay stats.
  • Add replay struggles.
  • Keep JSON/JSONL output stable and evidence-first.

Phase 5: DOM inspection

  • Start with DOM event summaries: snapshots, mutations, inputs, scrolls, viewport changes, node ids/selectors where available.
  • Later add best-effort DOM reconstruction from rrweb snapshots and mutations.
  • Avoid making a browser/rrweb replayer mandatory for core CLI use.

Phase 6: Agent skill/docs

  • Add or update Sentry CLI skill documentation for replay investigation workflows.
  • Ensure generated command docs include examples for agent workflows.
  • Include sample workflows for common questions.

Open Questions

  • What should the default replay cache TTL be, especially for sensitive replay payloads?
  • Should replay cache be cleared automatically on logout, or should it be scoped only by identity fingerprint?
  • Should generic response-cache skip replay segment payloads once replay-specific cache exists?
  • How much related data should replay fetch include by default: errors, traces, logs, click selectors?
  • Should replay events --raw include payload snippets or only stable pointers by default?
  • Do we want JSONL as a first-class output mode for very large event streams?
  • How much DOM reconstruction should be implemented in the CLI before delegating to browser-backed tooling?

Acceptance Criteria

  • Agents can download and cache a full replay locally.
  • Long replays with multiple segment pages are handled correctly.
  • Agents can list normalized replay events by kind/time/search query.
  • Agents can pull a compact evidence window around an action or timestamp.
  • Agents can identify likely user struggle windows without needing an LLM.
  • Outputs include stable evidence pointers: replay id, segment id, frame index, timestamp/offset.
  • Masked or unavailable DOM/text data is represented honestly.
  • The CLI remains the data/inspection layer; agent reasoning is documented in bundled skill guidance.

Follow-up: Feedback From a Real Multi-Replay Analysis

We tested the current replay command set against a generalized product analytics question:

How often do users open a route and leave before interacting with it?

In this case the route happened to be Sentry dashboard detail pages, but the workflow is intentionally broader: identify sessions that visited a route pattern, inspect the event timeline for each session, and classify whether any interaction occurred before the next route or replay end.

The current commands made this possible, but the workflow required too much external orchestration:

  1. Use replay list to find replay IDs by URL/path.
  2. Extract IDs with shell tooling.
  3. Run replay event list once per replay, hundreds of times.
  4. Join replay metadata and event timelines externally.
  5. Classify route windows with custom jq logic.
  6. Separately compute session-level and user-level rates.

That is a good sign that the primitives are useful, but a poor experience for agents and humans trying to answer broad replay questions.

Generalized Improvements Suggested by This Exercise

Batch replay event inspection

replay event list should support inspecting multiple replays in one command, using stdin or a file of replay IDs. The CLI should handle bounded concurrency, retries, and partial failures internally.

Example shape:

sentry replay event list sentry/javascript \
  --ids-file replays.txt \
  --kind navigation,click,tap,input,focus,scroll \
  --jsonl

Each emitted row should include replayId, and the command should provide a final summary of successes, failures, and truncation.

Route visit summaries

Add a generic route/session summary primitive that turns replay timelines into route windows. This should not be dashboard-specific; it should work for any route glob/path matcher.

Example shape:

sentry replay route list sentry/javascript \
  --path "/dashboard/*" \
  --period 24h \
  --json

Useful output per route visit:

  • replayId
  • user or stable user key when available
  • path
  • enteredAtOffsetMs
  • leftAtOffsetMs or replay end
  • durationMs
  • counts by normalized event kind: clicks, taps, inputs, scrolls, focuses, navigations
  • booleans such as hadInteraction, hadInput, hadScroll, leftWithoutInteraction
  • nextPath when the user navigated away

This would answer many questions of the form "users visited X, then did/did not do Y" without teaching the CLI a product-specific concept.

Replay-level event aggregation

Add a generic aggregation mode for normalized replay events. For example, group counts by replay, user, route, or route visit.

Example shape:

sentry replay event summary sentry/javascript \
  --query "url:*dashboard*" \
  --path "/dashboard/*" \
  --group-by replay \
  --json

This should produce compact machine-readable rows rather than requiring callers to fetch and process every event manually.

Session vs user accounting

The CLI should make it easy to report both session-level and user-level rates. In the real workflow, we had to refetch replay list output with user fields and join that with event classifications externally.

A generalized summary command should expose both:

  • total matching replay sessions
  • classifiable replay sessions
  • distinct known users
  • sessions/users matching a predicate
  • explicit handling for missing user identity

Predicate-friendly output, not product-specific commands

The CLI does not need a dashboard abandonment command. It needs general replay predicates that agents can compose, such as:

  • route matched pattern
  • no click/tap/input/scroll before next route
  • no interaction before replay end
  • first interaction occurred after N seconds
  • route duration less than N seconds
  • next route matched pattern

These can live as fields in route summaries or as filters over those summaries.

Agent-friendly fan-out and failure handling

The CLI should own the mechanics of high-cardinality replay analysis:

  • bounded concurrency
  • retry/backoff
  • per-replay error rows
  • partial-result summaries
  • stable JSONL for large outputs
  • explicit truncated and classifiable fields

Shelling out with xargs -P works, but it makes agents responsible for operational details that the CLI can handle more reliably.

Acceptance Criteria Additions

  • Agents can inspect events for many replay IDs with one command.
  • Agents can produce route-window summaries for arbitrary route/path patterns.
  • Agents can compute route-level interaction counts without custom timeline joining.
  • Outputs distinguish session-level and user-level counts when user identity is available.
  • Large replay analyses support JSONL, bounded concurrency, retries, and partial-failure reporting.
  • The feature remains generic: it supports arbitrary route patterns and event predicates rather than hard-coded dashboard/product analytics questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions