Add uniform_sample option to VideoReaderDecoder#6258
Add uniform_sample option to VideoReaderDecoder#6258jantonguirao wants to merge 9 commits intoNVIDIA:mainfrom
Conversation
Adds a `uniform_sample` argument that samples exactly `sequence_length` frames uniformly across the full video (or ROI), using linspace rounding. Each video produces one sample per epoch; `stride` and `step` are ignored. Signed-off-by: Joaquin Anton Guirao <janton@nvidia.com>
|
!build |
Greptile SummaryThis PR adds a Concerns raised in prior review rounds have all been addressed in this revision:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[PrepareMetadataImpl] --> B[samples_.clear\nall_frame_idxs_.clear]
B --> C{uniform_sample_?}
C -->|yes| D[all_frame_idxs_.reserve\nvideo_files_info_.size]
C -->|no| E[skip reserve]
D --> F[for each video entry]
E --> F
F --> G{entry.start_frame\n>= entry.end_frame?}
G -->|yes| H[DALI_WARN skip]
H --> F
G -->|no| I{uniform_sample_?}
I -->|yes| J[compute total_frames\nallocate inner vector]
J --> K[for i in 0..seq_len\nt = i / seq_len-1\nidx = start + round\nt * total_frames-1]
K --> L[s.frame_idxs_ = span\nsamples_.emplace_back s]
I -->|no| M[stride-based loop\nsamples_.emplace_back]
L --> F
M --> N[optional padded tail]
N --> F
subgraph Prefetch
P[for each sample] --> Q{frame_idxs_.empty?}
Q -->|no, uniform| R[DecodeFrames span frame_idxs_]
Q -->|yes, stride| S{roi != full video?}
S -->|yes| T[expand to explicit\nframe idx list\nDecodeFrames span]
S -->|no| U[DecodeFrames\nstart end stride]
end
Reviews (7): Last reviewed commit: "Fix test_uniform_sample_sequence_length_..." | Re-trigger Greptile |
| else: | ||
| frame_counts.append(reader.get_metadata()["epoch_size"]) | ||
|
|
||
| # Step 3: run uniform reader, verify one sample per video, check frame indices and pixels. |
There was a problem hiding this comment.
The comment on this line says "# Step 3" but there are no "Step 1" or "Step 2" comments anywhere in the function. This was likely a leftover from an earlier draft of the test. It makes the code harder to read and understand the overall structure.
| # Step 3: run uniform reader, verify one sample per video, check frame indices and pixels. | |
| # Run the uniform reader, verify one sample per video, check frame indices and pixels. |
| if (uniform_sample_) { | ||
| if (spec.HasArgument("stride")) { | ||
| DALI_WARN("uniform_sample=True: the `stride` argument is ignored."); | ||
| } | ||
| if (spec.HasArgument("step")) { | ||
| DALI_WARN("uniform_sample=True: the `step` argument is ignored."); | ||
| } | ||
| } |
There was a problem hiding this comment.
No warning when
pad_mode is non-default with uniform_sample=True
The code correctly warns when stride or step are explicitly provided alongside uniform_sample=True, but no analogous warning is emitted when pad_mode is set to a non-"none" value. When uniform_sample=True, the boundary_type_ derived from pad_mode is passed to DecodeFrames but is effectively a no-op because all computed frame indices are guaranteed to be in-bounds. A user who sets pad_mode='constant' expecting fill-value padding when sequence_length exceeds the number of frames will instead silently get frame repetition (as the docstring states). A warning here would mirror the pattern already established for stride/step:
if (uniform_sample_) {
if (spec.HasArgument("stride")) {
DALI_WARN("uniform_sample=True: the `stride` argument is ignored.");
}
if (spec.HasArgument("step")) {
DALI_WARN("uniform_sample=True: the `step` argument is ignored.");
}
if (spec.HasArgument("pad_mode")) {
DALI_WARN("uniform_sample=True: the `pad_mode` argument is ignored. "
"Frames are repeated when sequence_length exceeds the number of available frames.");
}
}| .AddOptionalArg("uniform_sample", | ||
| R"code(If set to True, uniformly samples ``sequence_length`` frames from the full video | ||
| (or from the video range defined by ``file_list``), regardless of the video length. | ||
|
|
||
| The sampled frame indices correspond to ``numpy.linspace(start, end-1, sequence_length)``, | ||
| rounded to the nearest integer. | ||
|
|
||
| If ``sequence_length`` exceeds the number of frames in the video, frames are repeated | ||
| rather than padded. For example, sampling 5 frames from a 3-frame video yields indices | ||
| ``[0, 1, 1, 2, 2]``. A single-frame video always produces a sequence of identical frames. | ||
|
|
||
| When enabled, each video file produces exactly one sample per epoch. | ||
| The ``stride`` and ``step`` arguments are ignored.)code", | ||
| false) |
There was a problem hiding this comment.
uniform_sample docstring is silent about pad_mode interaction
The docstring says "The stride and step arguments are ignored" when uniform_sample=True, but makes no mention of pad_mode. Since uniform indices are always in-bounds, the pad/boundary logic is a silent no-op. When sequence_length > num_frames, frames are silently repeated (as stated earlier in the docstring), but users reading only the final paragraph about ignored arguments won't know that pad_mode is also irrelevant. Adding a sentence here avoids confusion:
The ``stride``, ``step``, and ``pad_mode`` arguments are ignored.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
CI MESSAGE: [46331662]: BUILD STARTED |
There was a problem hiding this comment.
Pull request overview
Adds a new uniform_sample option to the experimental video reader/decoder so users can request exactly sequence_length frames sampled uniformly across the full video (or the file_list ROI), producing one sample per video per epoch.
Changes:
- Extended the C++ video reader/decoder to support decoding from an explicit frame index list when
uniform_sample=True. - Added operator schema documentation for the new
uniform_sampleargument. - Added Python test coverage validating sampled frame indices (and GPU pixel equivalence vs full decode).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
dali/operators/video/reader/video_reader_decoder_op.cc |
Adds uniform_sample plumbing: stores explicit frame indices in sample metadata and decodes using DecodeFrames(span<const int>); updates schema docs. |
dali/test/python/test_video_reader.py |
Adds test_uniform_sample verifying epoch sizing, sampled indices, and GPU pixel correctness at the sampled positions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -90,7 +91,7 @@ struct VideoSample : public VideoSampleDesc { | |||
| data_.set_pinned(std::is_same_v<Backend, GPUBackend>); | |||
| } | |||
|
|
|||
| VideoSample(const VideoSampleDesc &other) noexcept | |||
| VideoSample(const VideoSampleDesc &other) | |||
| : VideoSampleDesc(other) { | |||
| data_.set_pinned(std::is_same_v<Backend, GPUBackend>); | |||
| } | |||
|
CI MESSAGE: [46331662]: BUILD PASSED |
- Add DALI_WARN when pad_mode is specified alongside uniform_sample=True - Clarify docstring: pad_mode is ignored; rounding uses std::round (floor(x+0.5)), not NumPy banker's rounding - Fix orphaned "# Step 3" comment in test - Add test_uniform_sample_file_list_roi to cover file_list ROI with non-zero start_frame
VideoSampleDesc::frame_idxs_ changes from std::vector<int> to span<const int>. Owned storage moves to all_frame_idxs_ (a vector<vector<int>>) in the loader. PrepareMetadataImpl runs exactly once (guarded by loading_flag_), so all_frame_idxs_ and samples_ are stable for the loader's lifetime. Moving a std::vector preserves data(), so spans into all_frame_idxs_ elements remain valid across any reallocation of the outer vector. The copy in ReadSample is then a cheap 16-byte span copy instead of a heap allocation + memcpy.
|
CI MESSAGE: [46595262]: BUILD STARTED |
| list_file = tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False) | ||
| list_file.write(f"{video_file} 0 {start_frame} {end_frame}\n") | ||
| list_file.close() |
There was a problem hiding this comment.
Temp file not cleaned up after test
The temporary file is created with delete=False but os.unlink(list_file.name) is never called. If this test fails before reaching the assertions the file will persist. Wrapping in a try/finally ensures cleanup regardless of outcome:
| list_file = tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False) | |
| list_file.write(f"{video_file} 0 {start_frame} {end_frame}\n") | |
| list_file.close() | |
| list_file = tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False) | |
| list_file.write(f"{video_file} 0 {start_frame} {end_frame}\n") | |
| list_file.close() | |
| try: |
…and at the end of the function:
finally:
os.unlink(list_file.name)|
CI MESSAGE: [46595878]: BUILD STARTED |
| @cartesian_params( | ||
| devices, | ||
| [5, 1], # sequence lengths including edge case N=1 | ||
| ) | ||
| def test_uniform_sample_file_list_roi(device, sequence_length): | ||
| """Verify uniform_sample with file_list ROI (non-zero start_frame).""" | ||
| video_file = sorted(VIDEO_FILES)[0] | ||
|
|
||
| # Get total frame count. | ||
| reader = ndd.experimental.readers.Video( | ||
| device=device, filenames=[video_file], sequence_length=1, stride=1, step=1 | ||
| ) | ||
| total_frames = reader.get_metadata()["epoch_size"] | ||
|
|
||
| # Define a ROI that excludes the first and last few frames. | ||
| start_frame = max(1, total_frames // 5) | ||
| end_frame = min(total_frames - 1, total_frames * 4 // 5) | ||
| roi_frames = end_frame - start_frame | ||
| assert roi_frames >= sequence_length, "ROI too small for this test" | ||
|
|
||
| # Write a file_list with the ROI. | ||
| list_file = tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False) | ||
| list_file.write(f"{video_file} 0 {start_frame} {end_frame}\n") | ||
| list_file.close() | ||
|
|
||
| uniform_reader = ndd.experimental.readers.Video( | ||
| device=device, | ||
| file_list=list_file.name, | ||
| file_list_format="frames", | ||
| sequence_length=sequence_length, | ||
| uniform_sample=True, | ||
| enable_frame_num="sequence", | ||
| ) | ||
| samples = list(uniform_reader.next_epoch()) | ||
| assert len(samples) == 1, f"Expected 1 sample (one per video), got {len(samples)}" | ||
|
|
||
| video, frame_num = samples[0] | ||
| fn_arr = np.array(frame_num.evaluate().cpu()).flatten() | ||
| assert len(fn_arr) == sequence_length | ||
|
|
||
| # Frame indices must be absolute (offset from start_frame, not zero). | ||
| assert fn_arr[0] == start_frame, f"First index should be {start_frame}, got {fn_arr[0]}" | ||
| if sequence_length > 1: | ||
| assert ( | ||
| fn_arr[-1] == end_frame - 1 | ||
| ), f"Last index should be {end_frame - 1}, got {fn_arr[-1]}" | ||
|
|
||
| expected_idxs = start_frame + np.floor( | ||
| np.linspace(0, roi_frames - 1, sequence_length) + 0.5 | ||
| ).astype(np.int32) | ||
| np.testing.assert_array_equal( | ||
| fn_arr, | ||
| expected_idxs, | ||
| err_msg=f"Frame index mismatch (start={start_frame}, end={end_frame})", | ||
| ) |
There was a problem hiding this comment.
enable_frame_num="scalar" with uniform_sample is untested
Neither test_uniform_sample nor test_uniform_sample_file_list_roi exercises enable_frame_num="scalar" (or enable_frame_num=True) together with uniform_sample=True. The FrameNumPolicy::Scalar path in Prefetch outputs s.start_ (line 571 in the op file), which for a uniform sample is set to entry.start_frame. This is coincidentally equal to frame_idxs_[0] (since i=0 → t=0 → round(0 * (total_frames-1)) = 0), so the output is correct — but it is undocumented and untested.
A user could reasonably call:
ndd.experimental.readers.Video(
filenames=[...], sequence_length=5, uniform_sample=True,
enable_frame_num=True # "scalar" mode
)and expect the first sampled frame's index. With ROI this would return start_frame, which is the correct first uniform index — but exercising this in a test (especially with a non-zero start_frame) would make the invariant explicit and prevent future regressions if the sampling formula changes.
| samples = list(uniform_reader.next_epoch()) | ||
| assert len(samples) == 1, f"Expected 1 sample (one per video), got {len(samples)}" | ||
|
|
||
| video, frame_num = samples[0] | ||
| fn_arr = np.array(frame_num.evaluate().cpu()).flatten() | ||
| assert len(fn_arr) == sequence_length | ||
|
|
||
| # Frame indices must be absolute (offset from start_frame, not zero). | ||
| assert fn_arr[0] == start_frame, f"First index should be {start_frame}, got {fn_arr[0]}" | ||
| if sequence_length > 1: | ||
| assert ( | ||
| fn_arr[-1] == end_frame - 1 | ||
| ), f"Last index should be {end_frame - 1}, got {fn_arr[-1]}" | ||
|
|
||
| expected_idxs = start_frame + np.floor( | ||
| np.linspace(0, roi_frames - 1, sequence_length) + 0.5 | ||
| ).astype(np.int32) | ||
| np.testing.assert_array_equal( | ||
| fn_arr, | ||
| expected_idxs, | ||
| err_msg=f"Frame index mismatch (start={start_frame}, end={end_frame})", | ||
| ) |
There was a problem hiding this comment.
GPU pixel comparison missing from ROI test
test_uniform_sample performs a pixel-level comparison on GPU to confirm that the uniform reader actually decodes the correct frames (not just reports correct indices). test_uniform_sample_file_list_roi only checks frame indices and does not verify pixel content on GPU, meaning an off-by-one in the ROI offset calculation (e.g., entry.start_frame + in the index computation) could produce wrong pixels while still reporting correct frame_num values.
Adding a GPU pixel check — similar to what test_uniform_sample already does — would give the same confidence for the ROI path as for the filenames path:
if device == "gpu":
uniform_frames = np.array(video.evaluate().cpu())
# Decode the full video with stride=1 and verify specific frames
full_reader = ndd.experimental.readers.Video(
device=device, filenames=[video_file], sequence_length=1, stride=1, step=1
)
all_frames = np.stack([np.array(f.evaluate().cpu())[0] for (f,) in full_reader.next_epoch()])
expected_frames = all_frames[expected_idxs]
np.testing.assert_array_equal(uniform_frames, expected_frames,
err_msg=f"Pixel mismatch with ROI start={start_frame}, end={end_frame}")|
CI MESSAGE: [46595262]: BUILD FAILED |
- Explicitly zero-initialize frame_idxs_ in VideoSampleDesc constructor - Reserve all_frame_idxs_ before PrepareMetadataImpl loop - Add comment explaining HandleBoundary is a no-op for uniform indices - Fix temp file leak in test: use context manager instead of delete=False - Add test_uniform_sample_file_list_roi: scalar frame_num check with non-zero ROI
|
CI MESSAGE: [46604696]: BUILD STARTED |
| std::vector<VideoFileMeta> video_files_info_; | ||
| std::vector<VideoSampleDesc> samples_; | ||
| std::vector<std::vector<int>> all_frame_idxs_; // owns frame index data; samples_ hold spans into these | ||
| CUDAStreamLease cuda_stream_; |
There was a problem hiding this comment.
all_frame_idxs_ declared after samples_, destroyed first
samples_ is declared before all_frame_idxs_, so C++ destructs them in reverse order — all_frame_idxs_ (which owns the heap data the spans point into) is destroyed before samples_. This is currently safe because span<const int>'s destructor is trivial and never dereferences the stored pointer. However, it's a fragile invariant: any future change that causes VideoSampleDesc destruction, copy, or reset to read through frame_idxs_ would silently become use-after-free.
The fix is to declare all_frame_idxs_ before samples_, so it outlives the spans that reference its data:
std::vector<VideoFileMeta> video_files_info_;
std::vector<std::vector<int>> all_frame_idxs_; // owns frame index data; samples_ hold spans into these
std::vector<VideoSampleDesc> samples_;
CUDAStreamLease cuda_stream_;| void PrepareMetadataImpl() override { | ||
| LOG_LINE << "Starting PrepareMetadataImpl" << std::endl; | ||
| if (uniform_sample_) | ||
| all_frame_idxs_.reserve(video_files_info_.size()); |
There was a problem hiding this comment.
all_frame_idxs_ not cleared before rebuilding
PrepareMetadataImpl appends to all_frame_idxs_ (and to samples_) but never clears either before the loop. If PrepareMetadataImpl is ever called more than once — for example, if the loader is reset or re-initialized — all_frame_idxs_ accumulates entries from previous calls. The old spans stored in the old samples_ entries would still point to valid (non-freed) data because all_frame_idxs_ only grows, but the index mapping between samples_[i] and all_frame_idxs_[j] breaks: the reserve on line 262 would use a stale size, and the emplace calls would push to positions misaligned with the newly added samples_.
The same latent issue exists for samples_ in the pre-existing code, but adding all_frame_idxs_ to the mix makes it harder to reason about. A defensive guard is straightforward:
void PrepareMetadataImpl() override {
LOG_LINE << "Starting PrepareMetadataImpl" << std::endl;
if (uniform_sample_) {
all_frame_idxs_.clear();
all_frame_idxs_.reserve(video_files_info_.size());
}| with tempfile.NamedTemporaryFile(mode="w", suffix=".txt") as list_file: | ||
| list_file.write(f"{video_file} 0 {start_frame} {end_frame}\n") | ||
| list_file.flush() |
There was a problem hiding this comment.
NamedTemporaryFile on Windows blocks re-open
tempfile.NamedTemporaryFile(mode="w", ...) without delete=False keeps the file open while the context manager is active. On Linux this is fine — the DALI reader can open the same path while it's already open. On Windows, NamedTemporaryFile holds an exclusive handle and any attempt by another process (or the same process via a different handle) to open the file will fail with PermissionError.
DALI is primarily Linux/CUDA, but the pattern is worth noting since the file_list path reads the file by name during reader construction. Using delete=False with an explicit os.unlink in a try/finally makes the intent clear and avoids the Windows edge case:
list_file = tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False)
try:
list_file.write(f"{video_file} 0 {start_frame} {end_frame}\n")
list_file.flush()
list_file.close()
# ... create readers and run assertions ...
finally:
os.unlink(list_file.name)|
CI MESSAGE: [46595878]: BUILD FAILED |
|
CI MESSAGE: [46604696]: BUILD FAILED |
file_list entries include a label column, so the reader returns (frames, label, frame_num) — a 3-tuple, not 2.
|
CI MESSAGE: [46774439]: BUILD STARTED |
|
CI MESSAGE: [46774439]: BUILD FAILED |
- Use frame_idxs_() instead of frame_idxs_{} for style consistency
- Add DALI_ENFORCE for sequence_length >= 1 when uniform_sample=True
- Add test_uniform_sample_sequence_length_zero_raises: verifies sequence_length=0 raises an error
- Add test_uniform_sample_stride_step_ignored: verifies stride/step are ignored when uniform_sample=True
- Swap samples_/all_frame_idxs_ declaration order so the owner (all_frame_idxs_) is destroyed after samples_, preventing dangling spans on destruction - Clear samples_ and all_frame_idxs_ at the start of PrepareMetadataImpl to avoid stale entries on re-initialization
|
CI MESSAGE: [46871621]: BUILD STARTED |
|
CI MESSAGE: [46871621]: BUILD FAILED |
…d init The ndd API uses lazy backend initialization, so just constructing ndd.experimental.readers.Video() doesn't instantiate the C++ operator. Call get_metadata() to force initialization and trigger the DALI_ENFORCE check.
|
CI MESSAGE: [46886608]: BUILD STARTED |
| reader_default = ndd.experimental.readers.Video( | ||
| device=device, | ||
| filenames=[video_file], | ||
| sequence_length=sequence_length, | ||
| uniform_sample=True, | ||
| enable_frame_num="sequence", | ||
| ) |
There was a problem hiding this comment.
just wanted to say that ndd looks like a game-changer for writing DALI tests :D
| auto& idxs = all_frame_idxs_.back(); | ||
| for (int i = 0; i < sequence_len_; ++i) { | ||
| double t = (sequence_len_ > 1) ? (double)i / (sequence_len_ - 1) : 0.0; | ||
| idxs[i] = entry.start_frame + static_cast<int>(std::round(t * (total_frames - 1))); |
There was a problem hiding this comment.
Just to double check, the entry.end_frame is not a valid index of a last frame in the video but sth like past one?
idxs[sequence_len - 1] = start_frame + int(round(1 * (total_frames - 1))) = start_frame + total_frames - 1 = start_frame + end_frame - start_frame - 1 = end_frame - 1
|
CI MESSAGE: [46886608]: BUILD PASSED |
Category:
New feature (non-breaking change which adds functionality)
Description:
Adds a
uniform_sampleboolean argument tofn.experimental.readers.video(issue #5926).When enabled, exactly
sequence_lengthframes are sampled uniformly across the full video(or the ROI defined by
file_list), using indices equivalent tonumpy.linspace(start, end-1, sequence_length)rounded to the nearest integer(ties round away from zero, matching C++ std::round).
Each video produces exactly one sample per epoch regardless of video length.
The
strideandsteparguments are ignored whenuniform_sample=True.Additional information:
Affected modules and functionalities:
uniform_sampleschemaargument; extended VideoSampleDesc with an explicit frame index list; added uniform
sampling branch in PrepareMetadataImpl; updated Prefetch to dispatch to
DecodeFrames(span) when explicit indices are present.
uniform_sampleoptional argument(default False), documented in the schema docstring.
Key points relevant for the review:
instead of start_/end_/stride_. All existing stride-based paths are unchanged.
Tests:
DALI decoder (stride=1 reader) as ground truth — no reliance on cv2 metadata
seek inaccuracy with H.264 B-frames is a pre-existing decoder limitation,
not a regression)
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: DALI-4273