Feat/webinar llm latencies 2#5002
Conversation
|
This pull request is automatically being deployed by Amplify Hosting (learn more). |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 838cc6f657
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const syncResponseReceivedAt = shouldCollectMetrics ? performance.now() : 0; | ||
|
|
||
| // Record pending metrics and complete them when the matching LLM broadcast arrives. | ||
| if (shouldCollectMetrics && syncResponse !== null) { |
There was a problem hiding this comment.
Record pending metrics when /sync returns no body
sendSyncRequestToLocus() explicitly returns null when Locus replies with an empty body and says the actual data will arrive via messages, but performSync() only stores pendingSyncMetrics when syncResponse !== null. In the empty-body path, there is no pending entry for tryCompletePendingSyncMetrics() to finish later, so client.locus.sync.complete latency telemetry is silently dropped for those syncs (the exact flow this feature is trying to measure).
Useful? React with 👍 / 👎.
| for (const dataSet of message.dataSets) { | ||
| const pending = this.pendingSyncMetrics.get(dataSet.name); | ||
|
|
||
| if (pending && dataSet.version >= pending.dataSetVersion) { |
There was a problem hiding this comment.
I think Locus team wanted us to match the received event with the sync request by using the tracking id, see this page:
https://confluence-eng-gpk2.cisco.com/conf/spaces/CAL/pages/801680811/5K+Webinar+Latencies+Tracking
I think it's important for them in cases if there is a sync storm
| */ | ||
| // @ts-ignore - Fix type | ||
| this.locusInfo = new LocusInfo(this.updateMeetingObject.bind(this), this.webex, this.id); | ||
| this.locusInfo.syncMetricsCallback = (metrics: { |
There was a problem hiding this comment.
I think it would be cleaner to pass the callback in the constructor. We are already passing 1 callback in the constructor, so we could have a callbacks argument object with multiple callbacks. Same for how the callback is passed between LocusInfo and HashTreeParser class.
By requiring the callback in the constructor we can make sure it's always there so we then don't have to check for it when calling it
| breakoutMoveId: eventInfo.breakoutMoveId, | ||
| breakoutSessionId: eventInfo?.currentSession?.sessionId, | ||
| breakoutGroupId: eventInfo?.currentSession?.groupId, | ||
| ...(eventInfo?.llmWebsocketUrl && {llmWebsocketUrl: eventInfo.llmWebsocketUrl}), |
There was a problem hiding this comment.
nitpick: personally I don't like this syntax, I think it's not very readable. Wouldn't it be better like this?
const {llmWebsocketUrl, llmLatency} = eventInfo;
const payload: any = {
llmLatency,
identifiers: {
breakoutMoveId: eventInfo.breakoutMoveId,
breakoutSessionId: eventInfo?.currentSession?.sessionId,
breakoutGroupId: eventInfo?.currentSession?.groupId,
llmWebsocketUrl,
},
};
We know that eventInfo must be defined, otherwise we would have returned in line 37
I see same syntax pattern in many other places in this PR, most of them could be made simpler by having the const with the same name as the field in the created object defined earlier.
|
|
||
| dataSet.timer = setTimeout(() => { | ||
| dataSet.timer = undefined; | ||
| dataSet.lastBackoffTime = delay - dataSet.idleMs; |
There was a problem hiding this comment.
I feel that it would be cleaner to have:
const randomBackOff = this.getWeightedBackoffTime(dataSet.backoff);
const delay = dataSet.idleMs + randomBackOff;
dataSet.lastBackoffTime = randomBackOff;
a few lines above instead of doing the subtraction here. Do you agree? Also, feels a bit cleaner to be setting dataSet.lastBackoffTime before the call to setTimeout instead of inside the setTimeout callback.
| throw error; | ||
| } | ||
|
|
||
| hashtreeResponseTime = shouldCollectMetrics |
There was a problem hiding this comment.
this is not exactly the right place to measure the /hashtree response time - this should be done inside getHashesFromLocus() immediately after getting the response. Right now getHashesFromLocus doesn't do much after getting the response, but that's not guaranteed to be always the case, someone might add some async or time consuming stuff to it and then the calculation here won't give the correct result.
Same for hashtreePrepStart that's used for hashtreePrepTime calculations - it should be done inside getHashesFromLocus() immediately before calling this.webexRequest
| }); | ||
| } | ||
|
|
||
| const syncResponseReceivedAt = shouldCollectMetrics ? performance.now() : 0; |
There was a problem hiding this comment.
having all that code that calculates latencies spread out across the code and mixed in with other logic feels quite messy. It would be better to have a more generic mechanism - a class that maintains the latency values and does the calculations and in the HashTreeParser we would just call that class to tell it about various milestones being reached like "hash tree prep start/end", "hash tree request/response" etc. In fact we already have a class for doing this sort of stuff - CallDiagnosticLatencies - see for example how shareDuration is calculated for client.share.stopped event. You can add more "internal" events to InternalEvent - they are internal to SDK as the name suggests.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1b82b71b8e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| this.pendingSyncMetrics.set(dataSet.name, { | ||
| syncResponseReceivedAt, | ||
| totalStartTime: totalStart, | ||
| randomBackoffTime: Math.round(dataSet.lastBackoffTime || 0), |
There was a problem hiding this comment.
Capture syncAll backoff before reporting metrics
When LLM syncs are kicked off by syncAllDatasets({onlyLLM: true}) after connecting the LLM channel, the backoff delay is computed and slept in syncAllDatasets, but lastBackoffTime is never updated on that path; it is only set in the timer callback inside runSyncAlgorithm. As a result, these new client.locus.sync.complete metrics report randomBackoffTime as 0 or a stale value from an unrelated timer, even though the sync just waited for a randomized backoff.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f15504d61e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (event === 'client.breakout-session.join.response' && !llmLatency) { | ||
| return; |
There was a problem hiding this comment.
Preserve breakout join metrics without LLM latencies
In meetings where automatic LLM is disabled (the default in config.ts), Breakouts.updateBreakout() still calls onBreakoutJoinResponse without llmLatency, and no later updateLLMConnection() path emits the replacement metric because that path is gated on enableAutomaticLLM. This new guard therefore drops the existing client.breakout-session.join.response telemetry for normal breakout moves that do not collect LLM timing, instead of only adding latency fields when they are available.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3335dc8100
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| this.locusSyncLatencies.set(dataSetName, { | ||
| randomBackoffTime, | ||
| syncStart: value, | ||
| }); |
There was a problem hiding this comment.
Key sync latency records by meeting as well
When two Meeting instances sync the same dataset concurrently (for example, two webinars both syncing main), they share the single webex.internal.newMetrics.callDiagnosticLatencies tracker, but this record is keyed only by dataSetName. A second internal.client.locus.sync.start overwrites the first meeting's record, so the first parser's completion can report the other meeting's timings or lose its telemetry after the record is cleared, corrupting/dropping client.locus.sync.complete metrics. Include the meeting/locus identity in the key or use a per-parser tracker record.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 35ba5211a3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| key: 'internal.client.locus.sync.start', | ||
| options: { | ||
| dataSetName: dataSet.name, | ||
| randomBackoffTime: Math.round(dataSet.lastBackoffTime || 0), |
There was a problem hiding this comment.
Preserve watchdog backoff before reporting sync metrics
When a sync is initiated by the heartbeat watchdog, resetHeartbeatWatchdogs() computes a fresh backoffTime and waits heartbeatIntervalMs + backoffTime, but that path enqueues the sync without updating dataSet.lastBackoffTime. As a result, this metric records randomBackoffTime as 0 or a stale value from an earlier root-hash timer even though the watchdog just waited a randomized backoff, corrupting client.locus.sync.complete latency data for watchdog-triggered syncs.
Useful? React with 👍 / 👎.
This pull request addresses
https://jira-eng-gpk2.cisco.com/jira/browse/SPARK-774846
by making the following changes
added latency telemetry for webinar LLM connect and Locus sync flows, including timing collection, websocket/dataset context in payloads, callback wiring from parser to meeting.
Change Type
The following scenarios were tested
< ENUMERATE TESTS PERFORMED, WHETHER MANUAL OR AUTOMATED >
The GAI Coding Policy And Copyright Annotation Best Practices
I certified that
Make sure to have followed the contributing guidelines before submitting.