PyTorch Distributed DDP Benchmark for Run:AI

This repository contains a run:ai deployment of the PyTorch Distributed Data Parallel (DDP) benchmark suite from the official PyTorch repository.

What This Benchmark Does

Runs distributed PyTorch training benchmarks using DistributedDataParallel (DDP)
Tests multiple model architectures: ResNet-50, ResNet-101, ResNeXt-50, ResNeXt-101
Measures training iteration time and throughput across different GPU configurations
Supports both single-node multi-GPU and multi-node distributed setups
Outputs performance metrics and can generate JSON reports for analysis
Leverages run:ai for efficient GPU resource management and workload orchestration

Quick Start

Build the container image:
```
./build.sh
```

Optional: Push to registry (for multi-node clusters):

podman tag pytorch-ddp-benchmark:latest <your-registry>/pytorch-ddp-benchmark:latest
podman push <your-registry>/pytorch-ddp-benchmark:latest

Run a single GPU test:

runai submit single-gpu-test --image pytorch-ddp-benchmark:latest --gpu 1 --command '/workspace/test_single_gpu.sh'

Run multi-GPU on single node:

runai submit multi-gpu-test --image pytorch-ddp-benchmark:latest --gpu 4 -e WORLD_SIZE=4

Run distributed multi-node setup:

runai submit distributed-test --image pytorch-ddp-benchmark:latest --gpu 8 --workers 2 -e WORLD_SIZE=8

Files in This Repository

Dockerfile - Complete container definition with PyTorch, CUDA, and benchmark code (compatible with podman)
build.sh - Simple script to build the image using run:ai
runai-job-template.yaml - Customizable run:ai job template for advanced deployments
USAGE.md - Comprehensive usage examples and configuration options
README.md - This file

What's Inside the Container

Base: NVIDIA CUDA 12.1 with Ubuntu 22.04
PyTorch: Latest stable version with CUDA support
Benchmark Code: Full PyTorch DDP benchmark suite from the official repository
Run:AI Integration: Configured for run:ai workload management
Models: Pre-configured ResNet and ResNeXt models for testing

Key Features

🚀 Easy Setup: Single run:ai command to launch benchmarks
🔧 Configurable: Environment variables for all benchmark parameters
📊 Comprehensive Output: Detailed performance metrics and JSON export
🌐 Multi-Node Ready: Run:ai handles distributed training orchestration
📋 Multiple Models: Tests various popular deep learning architectures
⚡ GPU-Optimized: Leverages run:ai for efficient GPU resource allocation
🎛️ Resource Management: Automatic scaling and resource optimization

Prerequisites

Podman installed (for building the container image)
Run:AI CLI installed and configured
Access to a run:ai cluster with GPU nodes
CUDA-compatible GPU(s) in the cluster

Performance Metrics

The benchmark measures:

Iteration Time: Time per training step
Throughput: Images processed per second
Scaling Efficiency: Performance across different GPU counts
Statistical Analysis: P50, P75, P90, P95 percentiles

Use Cases

Performance Regression Testing: Compare PyTorch versions or configurations
Hardware Evaluation: Benchmark different GPU setups
Network Analysis: Test distributed training across different network topologies
Scaling Studies: Understand how performance scales with GPU count
Research Baselines: Establish performance baselines for new techniques

Example Output

Benchmark: resnet50 with batch size 32

                            sec/iter    ex/sec      sec/iter    ex/sec
   1 GPUs --   no ddp:  p50:  0.097s     329/s  p75:  0.097s     329/s
   1 GPUs --    1M/1G:  p50:  0.100s     319/s  p75:  0.100s     318/s
   2 GPUs --    1M/2G:  p50:  0.103s     310/s  p75:  0.103s     310/s
   4 GPUs --    1M/4G:  p50:  0.103s     310/s  p75:  0.103s     310/s
   8 GPUs --    1M/8G:  p50:  0.104s     307/s  p75:  0.104s     307/s

Advanced Usage

See USAGE.md for detailed examples including:

Multi-node distributed training strategies
Run:AI job templates and resource management
Result analysis and comparison
Custom model testing
Performance tuning and monitoring
Resource quotas and scheduling options

For advanced deployments, customize and use the provided job template:

runai submit -f runai-job-template.yaml

Contributing

This container is based on the official PyTorch benchmark suite. For issues with the underlying benchmark code, please refer to the PyTorch repository.

For container-specific improvements or issues, please open an issue in this repository.

License

This project follows the same license as PyTorch (BSD-3-Clause). The benchmark code is from the official PyTorch repository.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.dockerignore		.dockerignore
Dockerfile		Dockerfile
README.md		README.md
USAGE.md		USAGE.md
benchmark.py		benchmark.py
build.sh		build.sh
launch_runai.sh		launch_runai.sh
run_benchmark.sh		run_benchmark.sh
runai-job-template.yaml		runai-job-template.yaml
test_single_gpu.sh		test_single_gpu.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyTorch Distributed DDP Benchmark for Run:AI

What This Benchmark Does

Quick Start

Files in This Repository

What's Inside the Container

Key Features

Prerequisites

Performance Metrics

Use Cases

Example Output

Advanced Usage

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PyTorch Distributed DDP Benchmark for Run:AI

What This Benchmark Does

Quick Start

Files in This Repository

What's Inside the Container

Key Features

Prerequisites

Performance Metrics

Use Cases

Example Output

Advanced Usage

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages