Skip to content

[Live Debugger] Add 10ms snapshot capture timeout circuit breaker#4512

Open
watson wants to merge 1 commit into
mainfrom
watson/DEBUG-5300/add-max-time-circut-breaker
Open

[Live Debugger] Add 10ms snapshot capture timeout circuit breaker#4512
watson wants to merge 1 commit into
mainfrom
watson/DEBUG-5300/add-max-time-circut-breaker

Conversation

@watson
Copy link
Copy Markdown
Collaborator

@watson watson commented Apr 21, 2026

Motivation

Snapshot capture in the Live Debugger serializes JavaScript values synchronously by walking object graphs recursively. For large or deeply nested objects, this can block the main thread for an unacceptable amount of time. We need a circuit breaker that aborts capture when it takes too long, preventing the debugger from degrading application performance.

Changes

Adds a 10ms cooperative timeout to the snapshot capture path in the Browser Debugger SDK.

capture.ts

  • Introduces a CaptureContext interface (deadline + timedOut flag) threaded through capture(), captureFields(), and all internal recursive helpers.
  • Both public functions now require a CaptureContext argument.
  • isTimedOut() checks the flag first (fast path), then calls performance.now() against the deadline. Checks are placed at coarse logical boundaries: before each object property, array element, map entry, set item, and typed array element iteration.
  • When timed out, captureValue returns { type: <real typeof>, notCapturedReason: 'timeout' }, consistent with how all other Datadog tracers handle notCapturedReason.

api.ts

  • Each hook (onEntry, onReturn, onThrow) computes a single deadline before the probe loop and shares it across all probes. This means if one snapshot probe exhausts the budget, subsequent snapshot probes exit immediately, while non-snapshot probes are unaffected.
  • When capture times out, the snapshot event is dropped entirely (not sent).

Test instructions

Run the debugger unit tests:

yarn test:unit packages/debugger

Key test scenarios:

  • Entry/return/throw capture timeout drops the snapshot
  • Non-snapshot probes still fire even after a timeout
  • Shared deadline causes second snapshot probe to exit immediately
  • No active entry leaks when entry capture times out
  • capture() returns the real typeof value (including 'null' not 'object') on timeout

Checklist

  • Tested locally
  • Tested on staging
  • Added unit tests for this change.
  • Added e2e/integration tests for this change.
  • Updated documentation and/or relevant AGENTS.md file

Copy link
Copy Markdown
Collaborator Author

watson commented Apr 21, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@datadog-datadog-prod-us1
Copy link
Copy Markdown

datadog-datadog-prod-us1 Bot commented Apr 21, 2026

Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 67.39%
Overall Coverage: 76.62% (-0.09%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 77b798f | Docs | Datadog PR Page | Give us feedback!

@cit-pr-commenter-54b7da
Copy link
Copy Markdown

cit-pr-commenter-54b7da Bot commented Apr 21, 2026

Bundles Sizes Evolution

📦 Bundle Name Base Size Local Size 𝚫 𝚫% Status
Rum 179.56 KiB 179.56 KiB 0 B 0.00%
Rum Profiler 6.04 KiB 6.04 KiB 0 B 0.00%
Rum Recorder 27.03 KiB 27.03 KiB 0 B 0.00%
Logs 56.69 KiB 56.69 KiB 0 B 0.00%
Rum Slim 135.41 KiB 135.41 KiB 0 B 0.00%
Worker 23.63 KiB 23.63 KiB 0 B 0.00%
🚀 CPU Performance
Action Name Base CPU Time (ms) Local CPU Time (ms) 𝚫%
RUM - add global context 0.005 0.004 -20.00%
RUM - add action 0.014133333333333333 0.0126 -10.85%
RUM - add error 0.013566666666666666 0.0118 -13.02%
RUM - add timing 0.0028333333333333335 0.0025 -11.76%
RUM - start view 0.0131 0.0113 -13.74%
RUM - start/stop session replay recording 0.0007333333333333333 0.0006 -18.18%
Logs - log message 0.014933333333333333 0.0135 -9.60%
🧠 Memory Performance
Action Name Base Memory Consumption Local Memory Consumption 𝚫
RUM - add global context 31.23 KiB 31.24 KiB +10 B
RUM - add action 54.86 KiB 54.79 KiB -73 B
RUM - add timing 32.96 KiB 32.41 KiB -564 B
RUM - add error 60.66 KiB 61.05 KiB +401 B
RUM - start/stop session replay recording 32.15 KiB 32.37 KiB +227 B
RUM - start view 483.79 KiB 484.10 KiB +317 B
Logs - log message 48.88 KiB 49.90 KiB +1.02 KiB

🔗 RealWorld

@watson watson changed the base branch from watson/DEBUG-5296/add-live-debugger to graphite-base/4512 April 21, 2026 07:35
@watson watson force-pushed the graphite-base/4512 branch from bf9a162 to b9da4f3 Compare April 21, 2026 07:39
@watson watson force-pushed the watson/DEBUG-5300/add-max-time-circut-breaker branch from 54767dd to 57a3752 Compare April 21, 2026 07:39
@watson watson changed the base branch from graphite-base/4512 to watson/DEBUG-5296/add-live-debugger April 21, 2026 07:39
@watson watson changed the base branch from watson/DEBUG-5296/add-live-debugger to graphite-base/4512 April 21, 2026 08:17
@watson watson force-pushed the watson/DEBUG-5300/add-max-time-circut-breaker branch from 57a3752 to 9967454 Compare April 21, 2026 08:19
@watson watson force-pushed the graphite-base/4512 branch from b9da4f3 to 0040ee8 Compare April 21, 2026 08:19
@watson watson changed the base branch from graphite-base/4512 to watson/DEBUG-5296/add-live-debugger April 21, 2026 08:19
@watson watson changed the base branch from watson/DEBUG-5296/add-live-debugger to graphite-base/4512 April 21, 2026 09:12
@watson watson force-pushed the watson/DEBUG-5300/add-max-time-circut-breaker branch from 9967454 to 6d55f19 Compare April 21, 2026 09:14
@watson watson force-pushed the graphite-base/4512 branch from 0040ee8 to 653aee9 Compare April 21, 2026 09:14
@watson watson changed the base branch from graphite-base/4512 to watson/DEBUG-5296/add-live-debugger April 21, 2026 09:14
@watson watson force-pushed the watson/DEBUG-5300/add-max-time-circut-breaker branch from 6d55f19 to 92a1c3c Compare April 21, 2026 09:25
@watson watson changed the base branch from watson/DEBUG-5296/add-live-debugger to graphite-base/4512 April 22, 2026 12:25
@watson watson force-pushed the graphite-base/4512 branch from 653aee9 to 7d1efe7 Compare April 22, 2026 21:03
@watson watson force-pushed the watson/DEBUG-5300/add-max-time-circut-breaker branch from 92a1c3c to 46622ab Compare April 22, 2026 21:03
@watson watson changed the base branch from graphite-base/4512 to watson/DEBUG-5296/add-live-debugger April 22, 2026 21:04
@watson watson changed the base branch from watson/DEBUG-5296/add-live-debugger to graphite-base/4512 April 23, 2026 10:42
@watson watson force-pushed the watson/DEBUG-5300/add-max-time-circut-breaker branch from 46622ab to 79c3335 Compare April 23, 2026 11:00
@watson watson changed the base branch from graphite-base/4512 to watson/DEBUG-5296/add-live-debugger April 23, 2026 11:00
@watson watson changed the base branch from watson/DEBUG-5296/add-live-debugger to graphite-base/4512 May 7, 2026 05:31
Thread a CaptureContext with a performance.now() deadline through the
capture walker so that snapshot serialization aborts early when the
budget is exceeded. A single deadline is shared across all probes in
a hook invocation; non-snapshot probes are unaffected.

When the timeout fires, the snapshot is dropped entirely and capture
stops at the next logical checkpoint (property, array element, map
entry, etc.). The timeout value uses the real typeof the value,
consistent with how all other Datadog tracers handle notCapturedReason.
@watson watson force-pushed the graphite-base/4512 branch from 8130067 to e739e7e Compare May 12, 2026 11:00
@watson watson force-pushed the watson/DEBUG-5300/add-max-time-circut-breaker branch from 79c3335 to 77b798f Compare May 12, 2026 11:00
@watson watson changed the base branch from graphite-base/4512 to main May 12, 2026 11:00
@watson watson marked this pull request as ready for review May 12, 2026 11:07
@watson watson requested a review from a team as a code owner May 12, 2026 11:07
@watson watson requested a review from a team as a code owner May 12, 2026 11:07
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 77b798f25e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

*/
export function onEntry(probes: InitializedProbe[], self: any, args: Record<string, any>): void {
const start = performance.now()
const captureCtx: CaptureContext = { deadline: start + SNAPSHOT_TIMEOUT_MS, timedOut: false }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Start the snapshot deadline when capture begins

Because this deadline is created before iterating probes and before ENTRY condition/template evaluation, time spent in an earlier probe can consume the entire 10 ms budget before any snapshot capture runs. For example, if a non-snapshot probe on the same method has an expensive condition or template, the following snapshot probe will see performance.now() >= deadline on its first capture() call and be dropped even though its own capture did not exceed the timeout. The circuit breaker should measure snapshot capture work rather than unrelated probe evaluation.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant