encoded_video_ingest (sderosa)#1048
Conversation
9324761 to
9619309
Compare
No changeset foundThis PR modifies the following packages but doesn't include a changeset: Directly changed:
Click here to create a changeset The link pre-populates a changeset file with If this change doesn't require a version bump, add the |
|
This project involves API design and changes, and might touch lots of code. |
|
I would think the team will need to align on the API designs and architecture. |
| // Wrap the real encoder construction in a lazy shim so we can branch | ||
| // between passthrough and a real encoder based on the first VideoFrame's | ||
| // id. The builder is called at most once and only for non-passthrough | ||
| // tracks; passthrough tracks never instantiate the SimulcastEncoderAdapter. |
There was a problem hiding this comment.
If it's determined at the first VideoFrame, how does libwebrtc knows which codec to offer in the sdp?
There was a problem hiding this comment.
Ah, good callout. I think this comment is a bit misleading. My understanding is that SDP codec selection happens earlier, from the encoder factory’s advertised capabilities plus the codec preferences we set on the transceiver. For encoded video, the Rust publish path already knows the source codec, overrides the publish option to match it, and then set_codec_preferences narrows the offer to that codec.
The first-frame check here is only deciding which encoder implementation to use for that already-negotiated codec: passthrough for an encoded source, or the normal encoder path for raw frames. The VideoFrame::id() is just how we identify that the frame came from an encoded source.
| .set_rotation(webrtc::kVideoRotation_0) | ||
| .set_timestamp_us(capture_time_us != 0 ? capture_time_us | ||
| : webrtc::TimeMicros()) | ||
| .set_id(source_id_) |
There was a problem hiding this comment.
Is there a less hacky way to detect pre-encoded sources?
There was a problem hiding this comment.
hm, what do you suggest?
| let info = EncodedFrameInfo { | ||
| is_keyframe, | ||
| has_sps_pps: false, // the source scans+prepends SPS/PPS as needed | ||
| width: args.width, |
There was a problem hiding this comment.
question: What happens if width or height are zero?
There was a problem hiding this comment.
If width or height are zero, they’re treated as “no resolution update” in the capture path.
In the normal example path, resolution is created from CLI args and used both to create the source and in EncodedFrameInfo. If either dimension is zero there, the source is also initialized with a zero dimension, so frames may still be accepted, but the queued encoded frames can carry 0 dimensions downstream. This is the same behavior as NativeVideoSource
ee97225 to
c9aee7e
Compare
Introduce a video track source that accepts pre-encoded frames and a matching WebRTC encoder that forwards them unchanged, bypassing real encoding while preserving RTP, pacing, and congestion control. Per-track routing uses VideoFrame::id() as a side channel plus a global EncodedSourceRegistry. A LazyVideoEncoder picks between the passthrough and the real encoder on the first Encode() call. Single-layer only; callers manage simulcast with multiple sources.
Rust wrapper around webrtc-sys::EncodedVideoTrackSource. Adds the Encoded variant to RtcVideoSource, VideoCodec/EncodedFrameInfo types, and an EncodedVideoSourceObserver trait for keyframe-request callbacks from the C++ side. PeerConnectionFactory gains create_video_track_from_encoded_source.
Dispatch RtcVideoSource::Encoded through the new PCF path in LocalVideoTrack, and normalize TrackPublishOptions for encoded sources in LocalParticipant::publish_track — simulcast is forced off and the codec is pinned to the source's codec, with warnings on override.
Protobuf: * NewVideoSourceRequest.encoded_options + VideoSourceType.Encoded * CaptureEncodedVideoFrame request/response * EncodedVideoSourceEvent (keyframe requested, target bitrate) * VideoSourceInfo.encoded_source_id Server wires the new variant through FfiVideoSource, forwards observer callbacks to FfiEvent, and rejects capture_frame on encoded sources.
Encoded track source now scans incoming frames for SPS/PPS (H.264) or VPS/SPS/PPS (H.265), caches the latest seen set, and prepends them to any keyframe that arrives without inline params. This makes hardware encoders and camera feeds that only emit parameter sets on stream start usable as-is, without requiring producers to replicate them on every IDR. Producers still get a clear warning if the very first keyframe has no parameter sets and the cache is empty. The caller-supplied has_sps_pps flag becomes a hint only; the scanner is the source of truth so double-prepending is impossible. Also fix a stale `src->get()` reference left over from the SetRates refactor in PassthroughVideoEncoder::Encode. examples: H.264, H.265, VP8, AV1. VP9 not supported yet
…rack private helper, use resolution throughout, gate args against 0 dims
c9aee7e to
a2022db
Compare
Overview
This PR adds encoded video ingest support, allowing callers to publish pre-compressed video frames through a VIDEO_SOURCE_ENCODED source instead of sending raw frames through WebRTC’s normal encoder path.
It introduces the FFI/protobuf surface for creating encoded video sources and pushing encoded access units, wires those sources into WebRTC using a passthrough encoder, and forwards encoder-side feedback such as keyframe requests and target bitrate changes. For H.264/H.265, the native source also caches parameter sets and prepends them to keyframes when needed.
This solves the need to ingest externally encoded video without decoding and re-encoding it inside the SDK. To reproduce the original limitation, attempt to publish an already-encoded H.264/H.265/VPx/AV1 stream through the existing raw video source APIs; the SDK only accepted raw video frames and would route them through normal encoding.
Breaking changes
None.
MSRV
No MSRV changes.
Testing
Added/updated tests for:
Async
No changes to the runtime model were made, it uses the existing livekit video track mechanisms.
API Exampls
There are two producer APIs: the helper TCP ingest API for common “external encoder over TCP” workflows, and the base encoded source API for applications that already own demuxing, frame boundaries, and encoder control.
Base API: NativeEncodedVideoSource
Use the base API when the application already has complete encoded access units and wants to push them directly.
Using the EncodedTcpIngest handles is a more complete tool for the user.