[benchmark] Add Video Benchmarks#1430
Conversation
Signed-off-by: Ao Tang <aot@nvidia.com>
Signed-off-by: Ao Tang <aot@nvidia.com>
Signed-off-by: Ao Tang <aot@nvidia.com>
Signed-off-by: Ao Tang <aot@nvidia.com>
Signed-off-by: Ao Tang <aot@nvidia.com>
Signed-off-by: Ao Tang <aot@nvidia.com>
Signed-off-by: Ao Tang <aot@nvidia.com>
Signed-off-by: Ao Tang <aot@nvidia.com>
Signed-off-by: Ao Tang <aot@nvidia.com>
Signed-off-by: Ao Tang <aot@nvidia.com>
Greptile OverviewGreptile SummaryThis PR adds comprehensive video processing benchmarks to the nightly benchmark suite. The implementation properly reuses existing tutorial code by extracting the argparser and pipeline creation functions into reusable components. Key changes:
Minor issue:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Benchmark as video_pipeline_benchmark.py
participant Utils as utils.py
participant Tutorial as video_split_clip_example.py
participant Pipeline as Video Pipeline
participant Executor as Xenna/RayData Executor
User->>Benchmark: Run benchmark with args
Benchmark->>Tutorial: create_video_splitting_argparser()
Tutorial-->>Benchmark: ArgumentParser
Benchmark->>Benchmark: Add benchmark args (--benchmark-results-path, --executor)
Benchmark->>Benchmark: parse_args()
Benchmark->>Utils: setup_executor(args.executor)
Utils-->>Benchmark: Executor instance
Benchmark->>Tutorial: create_video_splitting_pipeline(args)
Tutorial->>Pipeline: Create Pipeline("video_splitting")
Tutorial->>Pipeline: Add VideoReader stage
Tutorial->>Pipeline: Add splitting stage (FixedStride/TransNetV2)
Tutorial->>Pipeline: Add ClipTranscodingStage
alt Generate Embeddings
Tutorial->>Pipeline: Add embedding stage (CosmosEmbed1/InternVideo2)
end
alt Generate Captions
Tutorial->>Pipeline: Add VideoFrameCaptioningStage
alt Enhance Captions
Tutorial->>Pipeline: Add LLMCaptionImprovementStage
end
end
alt Motion/Aesthetic Filtering
Tutorial->>Pipeline: Add VideoMotionFilterStage
Tutorial->>Pipeline: Add VideoAestheticFilterStage
end
Tutorial->>Pipeline: Add ClipWriterStage
Tutorial-->>Benchmark: Pipeline object
Benchmark->>Pipeline: pipeline.run(executor)
Pipeline->>Executor: Process video tasks
Executor-->>Pipeline: output_tasks
Pipeline-->>Benchmark: output_tasks
Benchmark->>Benchmark: Calculate metrics (videos processed, clips generated, throughput)
Benchmark->>Utils: write_benchmark_results(results, path)
Utils->>Utils: Write params.json, metrics.json, tasks.pkl
Utils-->>Benchmark: Success
Benchmark-->>User: Exit code (0=success, 1=failure)
|
| results_path: /raid/aot/output/curator_benchmark | ||
| datasets_path: /raid/aot/datasets |
There was a problem hiding this comment.
Check that these paths (/raid/aot/...) are appropriate for the shared benchmark configuration, as they appear specific to a local development environment.
|
|
||
| # Calculate metrics from output tasks | ||
| # Count unique videos by their input_video path | ||
| unique_videos = {task.data.input_video for task in output_tasks if task.data and task.data.input_video} |
There was a problem hiding this comment.
Potential AttributeError if task.data is None or doesn't have input_video attribute
| unique_videos = {task.data.input_video for task in output_tasks if task.data and task.data.input_video} | |
| unique_videos = {task.data.input_video for task in output_tasks if task.data and hasattr(task.data, 'input_video') and task.data.input_video} |
| # Count unique videos by their input_video path | ||
| unique_videos = {task.data.input_video for task in output_tasks if task.data and task.data.input_video} | ||
| num_videos_processed = len(unique_videos) | ||
| num_clips_generated = sum(len(task.data.clips) for task in output_tasks if task.data and task.data.clips) |
There was a problem hiding this comment.
Same defensive check needed here for clips attribute
| num_clips_generated = sum(len(task.data.clips) for task in output_tasks if task.data and task.data.clips) | |
| num_clips_generated = sum(len(task.data.clips) for task in output_tasks if task.data and hasattr(task.data, 'clips') and task.data.clips) |
Signed-off-by: Ao Tang <aot@nvidia.com>
Signed-off-by: Ao Tang <aot@nvidia.com>
Signed-off-by: Ao Tang <aot@nvidia.com>
| unique_videos = {task.data.input_video for task in output_tasks if task.data and task.data.input_video} | ||
| num_videos_processed = len(unique_videos) | ||
| num_clips_generated = sum(len(task.data.clips) for task in output_tasks if task.data and task.data.clips) |
There was a problem hiding this comment.
Need defensive checks for task.data, task.data.input_video, and task.data.clips to handle potential None values or missing attributes more robustly.
| unique_videos = {task.data.input_video for task in output_tasks if task.data and task.data.input_video} | |
| num_videos_processed = len(unique_videos) | |
| num_clips_generated = sum(len(task.data.clips) for task in output_tasks if task.data and task.data.clips) | |
| unique_videos = {task.data.input_video for task in output_tasks if task.data and hasattr(task.data, 'input_video') and task.data.input_video} | |
| num_videos_processed = len(unique_videos) | |
| num_clips_generated = sum(len(task.data.clips) for task in output_tasks if task.data and hasattr(task.data, 'clips') and task.data.clips) |
Signed-off-by: Ao Tang <aot@nvidia.com>
| parser.add_argument("--video-limit", type=int, default=None, help="Limit the number of videos to read") | ||
| parser.add_argument("--verbose", action="store_true", default=False) | ||
| parser.add_argument("--output-clip-path", type=str, help="Path to output clips", required=True) | ||
| parser.add_argument("--output-path", type=str, help="Path to output clips", required=True) |
There was a problem hiding this comment.
The argument was renamed from --output-clip-path to --output-path, but README.md in this directory still uses the old name in all examples (lines 20, 36, 47, 80). Update the README to use --output-path instead.
| timeout_s: 1800 | ||
| ray: | ||
| num_cpus: 64 | ||
| num_gpus: 1 |
| requirements: | ||
| # ensure the total number of documents processed is correct | ||
| - metric: num_clips_generated | ||
| exact_value: 300 # TODO: update this value after benchmarking |
There was a problem hiding this comment.
placeholder value (300) needs updating after actual benchmarking
Description
Usage
# Add snippet demonstrating usageChecklist