Skip to content

User Timestamp Parsing#1812

Draft
chenosaurus wants to merge 10 commits intomainfrom
dc/feature/user_timestamp
Draft

User Timestamp Parsing#1812
chenosaurus wants to merge 10 commits intomainfrom
dc/feature/user_timestamp

Conversation

@chenosaurus
Copy link

No description provided.

@changeset-bot
Copy link

changeset-bot bot commented Feb 18, 2026

⚠️ No Changeset found

Latest commit: 45820bd

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

<span id="e2ee-${identity}" class="e2ee-on"></span>
</div>
</div>
<div id="user-ts-${identity}" class="user-ts-overlay">

Check warning

Code scanning / CodeQL

Unsafe HTML constructed from library input Medium

This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.
This HTML construction which depends on
library input
might later allow
cross-site scripting
.

Copilot Autofix

AI 6 days ago

At a high level, the problem is that renderParticipant constructs a chunk of HTML using innerHTML and directly interpolates participant.identity into the markup. To avoid DOM XSS, any untrusted data must not be concatenated into HTML strings that are assigned to innerHTML; instead, we should construct elements using document.createElement, set attributes via setAttribute/properties, and assign untrusted values only to non-HTML contexts (like textContent or value). This avoids the need to manually escape HTML.

The best fix without changing existing behavior is to refactor the HTML construction in renderParticipant so that it no longer uses a template literal with innerHTML. We can keep the same structure and IDs, but create each element programmatically. For the conditional volume control vs progress bar, we can still use DOM APIs to append the right controls. No changes are needed in LocalParticipant, RemoteParticipant, Participant, or Room because they only emit events / carry data; CodeQL’s taint path ends at the unsafe sink in demo.ts, so securing that sink addresses all variants.

Concretely:

  • In examples/demo/demo.ts, replace the block:

    div.className = 'participant';
    div.innerHTML = `
      <video id="video-${identity}"></video>
      ...
    `;
    container.appendChild(div);

    with explicit DOM construction:

    div.className = 'participant';
    
    const videoEl = document.createElement('video');
    videoEl.id = `video-${identity}`;
    div.appendChild(videoEl);
    // etc. for audio, info bar, name, codec/size/bitrate spans, right side icons,
    // user-ts overlay div, and the conditional volume or progress control.
  • Preserve all existing IDs and class names so that later code (querySelector, updateVideoSize, CSS) continues to work unchanged.

  • Ensure that identity is only ever used to form attribute values via id = ... or setAttribute, and never via innerHTML interpolation.

No new imports or helper functions are strictly necessary; we can use standard DOM APIs already available in the browser environment.


Suggested changeset 1
examples/demo/demo.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/examples/demo/demo.ts b/examples/demo/demo.ts
--- a/examples/demo/demo.ts
+++ b/examples/demo/demo.ts
@@ -830,44 +830,100 @@
     div = document.createElement('div');
     div.id = `participant-${identity}`;
     div.className = 'participant';
-    div.innerHTML = `
-      <video id="video-${identity}"></video>
-      <audio id="audio-${identity}"></audio>
-      <div class="info-bar">
-        <div id="name-${identity}" class="name">
-        </div>
-        <div style="text-align: center;">
-          <span id="codec-${identity}" class="codec">
-          </span>
-          <span id="size-${identity}" class="size">
-          </span>
-          <span id="bitrate-${identity}" class="bitrate">
-          </span>
-        </div>
-        <div class="right">
-          <span id="signal-${identity}"></span>
-          <span id="mic-${identity}" class="mic-on"></span>
-          <span id="e2ee-${identity}" class="e2ee-on"></span>
-        </div>
-      </div>
-      <div id="user-ts-${identity}" class="user-ts-overlay">
-      </div>
-      ${
-        !isLocalParticipant(participant)
-          ? `<div class="volume-control">
-        <input id="volume-${identity}" type="range" min="0" max="1" step="0.1" value="1" orient="vertical" />
-      </div>`
-          : `<progress id="local-volume" max="1" value="0" />`
-      }
 
-    `;
+    const videoElm = document.createElement('video');
+    videoElm.id = `video-${identity}`;
+    div.appendChild(videoElm);
+
+    const audioElm = document.createElement('audio');
+    audioElm.id = `audio-${identity}`;
+    div.appendChild(audioElm);
+
+    const infoBar = document.createElement('div');
+    infoBar.className = 'info-bar';
+
+    const nameDiv = document.createElement('div');
+    nameDiv.id = `name-${identity}`;
+    nameDiv.className = 'name';
+    infoBar.appendChild(nameDiv);
+
+    const centerDiv = document.createElement('div');
+    centerDiv.style.textAlign = 'center';
+
+    const codecSpan = document.createElement('span');
+    codecSpan.id = `codec-${identity}`;
+    codecSpan.className = 'codec';
+    centerDiv.appendChild(codecSpan);
+
+    const sizeSpan = document.createElement('span');
+    sizeSpan.id = `size-${identity}`;
+    sizeSpan.className = 'size';
+    centerDiv.appendChild(sizeSpan);
+
+    const bitrateSpan = document.createElement('span');
+    bitrateSpan.id = `bitrate-${identity}`;
+    bitrateSpan.className = 'bitrate';
+    centerDiv.appendChild(bitrateSpan);
+
+    infoBar.appendChild(centerDiv);
+
+    const rightDiv = document.createElement('div');
+    rightDiv.className = 'right';
+
+    const signalSpan = document.createElement('span');
+    signalSpan.id = `signal-${identity}`;
+    rightDiv.appendChild(signalSpan);
+
+    const micSpan = document.createElement('span');
+    micSpan.id = `mic-${identity}`;
+    micSpan.className = 'mic-on';
+    rightDiv.appendChild(micSpan);
+
+    const e2eeSpan = document.createElement('span');
+    e2eeSpan.id = `e2ee-${identity}`;
+    e2eeSpan.className = 'e2ee-on';
+    rightDiv.appendChild(e2eeSpan);
+
+    infoBar.appendChild(rightDiv);
+    div.appendChild(infoBar);
+
+    const userTsDiv = document.createElement('div');
+    userTsDiv.id = `user-ts-${identity}`;
+    userTsDiv.className = 'user-ts-overlay';
+    div.appendChild(userTsDiv);
+
+    if (!isLocalParticipant(participant)) {
+      const volumeContainer = document.createElement('div');
+      volumeContainer.className = 'volume-control';
+
+      const volumeInput = document.createElement('input');
+      volumeInput.id = `volume-${identity}`;
+      volumeInput.type = 'range';
+      volumeInput.min = '0';
+      volumeInput.max = '1';
+      volumeInput.step = '0.1';
+      volumeInput.value = '1';
+      (volumeInput as any).orient = 'vertical';
+
+      volumeContainer.appendChild(volumeInput);
+      div.appendChild(volumeContainer);
+    } else {
+      const progress = document.createElement('progress');
+      progress.id = 'local-volume';
+      progress.max = 1;
+      progress.value = 0;
+      div.appendChild(progress);
+    }
+
     container.appendChild(div);
 
     const sizeElm = container.querySelector(`#size-${identity}`);
-    const videoElm = <HTMLVideoElement>container.querySelector(`#video-${identity}`);
-    videoElm.onresize = () => {
-      updateVideoSize(videoElm!, sizeElm!);
-    };
+    const createdVideoElm = div.querySelector(`#video-${identity}`) as HTMLVideoElement | null;
+    if (createdVideoElm && sizeElm) {
+      createdVideoElm.onresize = () => {
+        updateVideoSize(createdVideoElm, sizeElm);
+      };
+    }
   }
   const videoElm = <HTMLVideoElement>container.querySelector(`#video-${identity}`);
   const audioElm = <HTMLAudioElement>container.querySelector(`#audio-${identity}`);
EOF
Copilot is powered by AI and may make mistakes. Always verify output.
@github-actions
Copy link
Contributor

github-actions bot commented Feb 22, 2026

size-limit report 📦

Path Size
dist/livekit-client.esm.mjs 87.68 KB (+1.05% 🔺)
dist/livekit-client.umd.js 98.24 KB (+0.96% 🔺)

Copy link
Contributor

@1egoman 1egoman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be possible I'm reviewing this prematurely - Shijing had asked I take a look at this. That being said, IMO there's still quite a bit of work to be done here and I'm not sure I'm fully following how this all works right now.

Comment on lines +464 to +489
// Always attempt to strip LKTS packet trailer before any e2ee
// processing. On the send side, the trailer is appended *after* encryption,
// so it must be removed *before* decryption.
if (isVideoFrame(encodedFrame) && encodedFrame.data.byteLength > 0) {
try {
const packetTrailerResult = stripPacketTrailerFromEncodedFrame(
encodedFrame as RTCEncodedVideoFrame,
);
if (packetTrailerResult !== undefined && this.trackId && this.participantIdentity) {
const msg: PacketTrailerMessage = {
kind: 'packetTrailer',
data: {
trackId: this.trackId,
participantIdentity: this.participantIdentity,
timestampUs: packetTrailerResult.timestampUs,
frameId: packetTrailerResult.frameId,
rtpTimestamp: packetTrailerResult.rtpTimestamp,
},
};
postMessage(msg);
}
} catch {
// Best-effort: never break media pipeline if timestamp parsing fails.
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Is the FrameCryptor (which I believe runs as part of the e2ee worker, which may not be loaded if e2ee is not enabled) the best place for this logic to live?

Is there a reason it has to run in a web worker context (ie, maybe it is super computationally expensive / etc)? Is this feature supposed to work if e2ee is disabled?

Comment on lines +244 to +252
private handleUserTimestamp(
trackId: string,
participantIdentity: string,
timestampUs: number,
frameId?: number,
rtpTimestamp?: number,
) {
if (!this.room) {
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Similar to my last comment - I don't know a ton about the user timestamp feature, but my suspicion is that putting handleUserTimestamp inside the E2eeManager is mixing concerns unnecesarily.

Comment on lines +1 to +9
/**
* Lightweight worker for stripping LKTS packet trailers from inbound
* encoded video frames via RTCRtpScriptTransform.
*
* When a valid trailer is found, the metadata is posted back to the main
* thread so the SDK can store the RTP-to-frame-metadata mapping on the
* corresponding RemoteVideoTrack.
*/
import {
Copy link
Contributor

@1egoman 1egoman Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: I'm a bit confused now - so this change is introducing a second webworker exclusively for packet trailer stripping, but somehow this worker ties into the existing E2eeManager / e2ee worker? Assuming this isn't in an incomplete state mid migration from one approach to another, I might need some more background context to better understand this.

As a heads up - introducing another webworker is a fairly significant change. I think it might be viable because this packet trailer stuff is fully additive, but it's something we've used sparingly, so I just want to make sure there's a good reason to use it here.

More related background context I wrote in another context where somebody was proposing introducing a new webworker, maybe you would find it useful:

Starting a webworker requires that you pass a url to another script that you want to run to the new Worker("https://path.to/worker/script.js") (docs) constructor when creating the new worker, but because livekit is a library we don't have control over the end user's build process / have no way to force them to generate another file in their end application, let alone get a predictable path to where it might be hosted.

The one place that is using a webworker right now in the web sdk is end to end encryption, and the way it gets around this problem (and the way we'd have to do this for compression) is by bubbling up the web worker construction to the end user, forcing them to build / host a separate "worker script" containing the e2ee worker code, call new Worker("...") + pass the url to this "worker script", and then pass the fully constructed Worker object into the main Livekit SDK. For example, this is how this is done in the client-sdk-js demo app. This is super cumbersome / not an easy thing to instruct a user to do.

*
* @internal
*/
setUserTimestamp(timestampUs: number, rtpTimestamp?: number, frameId?: number) {
Copy link
Contributor

@1egoman 1egoman Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: As a heads up, in data tracks, the userTimestamp field is a bigint. It might make sense to use that here too.

Comment on lines +100 to +103
/**
* Look up the frame metadata associated with a given RTP timestamp.
* The entry is consumed (removed) after lookup, matching the Rust SDK
* behaviour.
Copy link
Contributor

@1egoman 1egoman Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I suspect this was probably LLM generated with a prompt like "port the rust implementation over to the web sdk" - it would be good to strip out some of these egregious comparisons to the rust sdk before this would end up getting merged.

question: Also in this context, is actually the "consuming the entry like rust" type behavior a good idea? In rust there are benefits to avoid extra cloning, but it has the serious tradeoff of making requests non idempotent. Given the web doesn't have these same data ownership concerns it is a fairly serious rough edge for from what I can tell no real benefit, assuming there is some other way the entries in the userTimestampMap are cleared so it doesn't grow in an unbounded fashion.

}

function createPacketTrailerWorker(): Worker {
return new Worker(new URL('./livekit-client.packet-trailer.worker.js', import.meta.url));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: See my comment here - this type of thing isn't viable and won't work for all downstream consumers of the package transparently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants