[Live Debugger] Add 10ms snapshot capture timeout circuit breaker#4512
[Live Debugger] Add 10ms snapshot capture timeout circuit breaker#4512watson wants to merge 1 commit into
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: 77b798f | Docs | Datadog PR Page | Give us feedback! |
Bundles Sizes Evolution
🚀 CPU Performance
🧠 Memory Performance
|
bf9a162 to
b9da4f3
Compare
54767dd to
57a3752
Compare
57a3752 to
9967454
Compare
b9da4f3 to
0040ee8
Compare
9967454 to
6d55f19
Compare
0040ee8 to
653aee9
Compare
6d55f19 to
92a1c3c
Compare
653aee9 to
7d1efe7
Compare
92a1c3c to
46622ab
Compare
46622ab to
79c3335
Compare
Thread a CaptureContext with a performance.now() deadline through the capture walker so that snapshot serialization aborts early when the budget is exceeded. A single deadline is shared across all probes in a hook invocation; non-snapshot probes are unaffected. When the timeout fires, the snapshot is dropped entirely and capture stops at the next logical checkpoint (property, array element, map entry, etc.). The timeout value uses the real typeof the value, consistent with how all other Datadog tracers handle notCapturedReason.
8130067 to
e739e7e
Compare
79c3335 to
77b798f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 77b798f25e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| */ | ||
| export function onEntry(probes: InitializedProbe[], self: any, args: Record<string, any>): void { | ||
| const start = performance.now() | ||
| const captureCtx: CaptureContext = { deadline: start + SNAPSHOT_TIMEOUT_MS, timedOut: false } |
There was a problem hiding this comment.
Start the snapshot deadline when capture begins
Because this deadline is created before iterating probes and before ENTRY condition/template evaluation, time spent in an earlier probe can consume the entire 10 ms budget before any snapshot capture runs. For example, if a non-snapshot probe on the same method has an expensive condition or template, the following snapshot probe will see performance.now() >= deadline on its first capture() call and be dropped even though its own capture did not exceed the timeout. The circuit breaker should measure snapshot capture work rather than unrelated probe evaluation.
Useful? React with 👍 / 👎.

Motivation
Snapshot capture in the Live Debugger serializes JavaScript values synchronously by walking object graphs recursively. For large or deeply nested objects, this can block the main thread for an unacceptable amount of time. We need a circuit breaker that aborts capture when it takes too long, preventing the debugger from degrading application performance.
Changes
Adds a 10ms cooperative timeout to the snapshot capture path in the Browser Debugger SDK.
capture.tsCaptureContextinterface (deadline+timedOutflag) threaded throughcapture(),captureFields(), and all internal recursive helpers.CaptureContextargument.isTimedOut()checks the flag first (fast path), then callsperformance.now()against the deadline. Checks are placed at coarse logical boundaries: before each object property, array element, map entry, set item, and typed array element iteration.captureValuereturns{ type: <real typeof>, notCapturedReason: 'timeout' }, consistent with how all other Datadog tracers handlenotCapturedReason.api.tsonEntry,onReturn,onThrow) computes a single deadline before the probe loop and shares it across all probes. This means if one snapshot probe exhausts the budget, subsequent snapshot probes exit immediately, while non-snapshot probes are unaffected.Test instructions
Run the debugger unit tests:
Key test scenarios:
capture()returns the realtypeofvalue (including'null'not'object') on timeoutChecklist