-
Notifications
You must be signed in to change notification settings - Fork 576
feat(container-runtime): enhance DuplicateBatch telemetry diagnostics #27355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3141,11 +3141,26 @@ export class ContainerRuntime | |
| eventName: "DuplicateBatch", | ||
| details: { | ||
| batchId: batchStart.batchId, | ||
| batchIdExplicit: batchStart.batchId !== undefined, | ||
| clientId: batchStart.clientId, | ||
| batchStartCsn: batchStart.batchStartCsn, | ||
| size: inboundResult.length, | ||
| duplicateBatchSequenceNumber: result.otherSequenceNumber, | ||
| // Identifying info for the ORIGINAL occurrence of this batch, so we can | ||
| // disambiguate the duplicate's source (e.g. resubmit vs fresh submit, same | ||
| // vs different wire clientId). Undefined fields indicate the original was | ||
| // loaded from a summary snapshot rather than seen at runtime. | ||
| otherClientId: result.otherBatchInfo?.clientId, | ||
| otherBatchStartCsn: result.otherBatchInfo?.batchStartCsn, | ||
| otherBatchIdExplicit: result.otherBatchInfo?.batchIdExplicit, | ||
| otherFromSnapshot: result.otherBatchInfo === undefined, | ||
| ...extractSafePropertiesFromMessage(batchStart.keyMessage), | ||
| // For grouped batches, `keyMessage` is one of the sub-messages produced by | ||
| // `OpGroupingManager.ungroupOp`, which overwrites `clientSequenceNumber` | ||
| // with a synthetic counter (1, 2, 3, ...). Override with the real outer | ||
| // envelope's clientSequenceNumber so downstream telemetry doesn't get a | ||
| // misleading "fake csn" value. | ||
| messageClientSequenceNumber: batchStart.batchStartCsn, | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Deep Review: The rollout/opt-in path for the new diagnostic fields is undocumented. PR #27351 (daesunp, merged 2026-05-20T00:20:38Z — ~2h before this PR was created) reverted #27262 and shifted The PR description's success criterion — "the next event will tell us which it is" — implicitly assumes the tenant population emitting these events is opted in. The gated path is reachable via Suggested fix: add a one-paragraph note to the PR description stating which enablement gate will be set for the affected tenant population and roughly when — either the existing flags will be turned on for those tenants, or this PR is staged ahead of a separate re-enablement. |
||
| }, | ||
| }, | ||
| error, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deep Review: This override silently changes the meaning of an already-emitted telemetry property.
...extractSafePropertiesFromMessage(batchStart.keyMessage)already emitsmessageClientSequenceNumber(the post-ungroupOpsynthetic counter); the subsequentmessageClientSequenceNumber: batchStart.batchStartCsnreplaces that value with the outer-envelope CSN.The PR description confirms the field is in production dashboards today: "Across 50 production events over 7 days, every
DuplicateBatchshowed ...MessageClientSequenceNumber == 1". Any KQL query, alert, or dashboard tile keyed offDuplicateBatch.MessageClientSequenceNumber == 1will stop matching once this ships.Before merging, either:
(a) notify the on-call/telemetry owner (cc kian-thompson — historical reviewer on telemetry shape, see #22454) that the field's semantics are changing, and call out the back-compat impact in the PR description; or
(b) emit the corrected value under a distinct field name (e.g.
outerBatchClientSequenceNumber) and leave the inherited spread value untouched.(a) is preferred — the existing value is known-misleading and is exactly what this PR is trying to stop surfacing as authoritative — but it requires the explicit communication step. Also worth extending the inline comment at the override site to flag that this overrides an inherited field rather than adding a new one.