| layout | default |
|---|---|
| title | Home |
| nav_order | 1 |
| permalink | / |
{: .fs-10 .fw-700 .text-center }
High-performance DAG-based GPU Image Processing Pipeline {: .fs-6 .fw-300 .text-center .text-grey-dk-000 }
High-performance GPU image processing framework with DAG task scheduling, multi-stream execution, and CUDA-accelerated operators. Designed for real-time video and batch image processing workflows. {: .fs-5 .text-center .mt-4 }
[Get Started]({{ '/docs/getting-started' | relative_url }}){: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 } View on GitHub{: .btn .fs-5 .mb-4 .mb-md-0 .mr-2 } [API Reference]({{ '/docs/api' | relative_url }}){: .btn .fs-5 .mb-4 .mb-md-0 }
{: .section-title } Feature Comparison
| Feature | Mini-ImagePipe | OpenCV GPU | Custom CUDA |
|---|---|---|---|
| DAG Scheduling | ✅ | ❌ | Manual |
| Memory Pool | ✅ | Manual | |
| Multi-Stream | ✅ | Manual | |
| Zero-copy Pipeline | ✅ | ❌ | Manual |
| Easy API | ✅ | ✅ | ❌ |
| Error Propagation | ✅ | ❌ | Manual |
Full CUDA implementation with asynchronous kernel execution. Leverages the full power of NVIDIA GPUs for real-time image processing.
Directed acyclic graph-based task dependency management with automatic parallelization. Optimize execution order automatically.
Concurrent CUDA stream execution for independent tasks. Maximum GPU utilization through intelligent stream assignment.
Pinned and device memory pools with best-fit allocation strategy. Minimize allocation overhead across pipeline runs.
Gaussian blur optimized with separable horizontal and vertical passes. Significant performance improvement for large kernels.
Task failures automatically propagate downstream along the DAG. Robust error handling and recovery.
{: .performance-section } Benchmarks
| Operator | Image Size | Throughput | Latency |
|---|---|---|---|
| GaussianBlur 5×5 | 1920×1080 | 850+ FPS | ~1.2ms |
| Sobel Edge | 1920×1080 | 1200+ FPS | ~0.8ms |
| Resize (2× down) | 1920×1080 | 1500+ FPS | ~0.7ms |
| ColorConvert | 1920×1080 | 2000+ FPS | ~0.5ms |
| Pipeline (4 ops) | 1920×1080 | 400+ FPS | ~2.5ms |
Benchmarked on NVIDIA RTX 3090. Your results may vary based on GPU model and configuration.
graph LR
Input[📥 Input Image] --> Resize[🔄 Resize]
Resize --> Blur[🌀 Gaussian Blur]
Blur --> Sobel[📐 Sobel Edge]
Sobel --> Output[📤 Output]
style Input fill:#76B900,stroke:#5A8F00,color:#1a1a1a
style Output fill:#76B900,stroke:#5A8F00,color:#1a1a1a
style Resize fill:#2d2d2d,stroke:#76B900
style Blur fill:#2d2d2d,stroke:#76B900
style Sobel fill:#2d2d2d,stroke:#76B900
📹 Real-time Video Processing {: .use-case } Live video filters, effects, and streaming applications
🚗 Autonomous Driving {: .use-case } Perception pipeline preprocessing for sensor fusion
🏥 Medical Imaging {: .use-case } DICOM image processing and analysis workflows
🤖 Embedded AI {: .use-case } Jetson platform deployment for edge computing
# Clone the repository
git clone https://github.com/LessUp/mini-image-pipe.git
cd mini-image-pipe
# Build with CMake presets (Release)
cmake --preset release
cmake --build --preset release
# Run tests
ctest --preset release
# Run demo
./build/demo_pipeline#include "pipeline.h"
#include "operators/resize.h"
#include "operators/gaussian_blur.h"
#include "operators/sobel.h"
using namespace mini_image_pipe;
int main() {
// Configure pipeline with 4 CUDA streams
PipelineConfig config;
config.numStreams = 4;
Pipeline pipeline(config);
// Create operators
auto resize = std::make_shared<ResizeOperator>(320, 240);
auto blur = std::make_shared<GaussianBlurOperator>(GaussianKernelSize::KERNEL_5x5);
auto sobel = std::make_shared<SobelOperator>();
// Build the DAG
int n1 = pipeline.addOperator("Resize", resize);
int n2 = pipeline.addOperator("Blur", blur);
int n3 = pipeline.addOperator("Sobel", sobel);
pipeline.connect(n1, n2); // Resize → Blur
pipeline.connect(n2, n3); // Blur → Sobel
// Set input and execute
pipeline.setInput(n1, d_input, width, height, channels);
pipeline.execute();
// Get output
void* output = pipeline.getOutput(n3);
return 0;
}| Architecture | Compute | Example GPUs |
|---|---|---|
| Volta | sm_70 | V100 |
| Turing | sm_75 | RTX 2080, T4 |
| Ampere | sm_80, sm_86 | A100, RTX 3090 |
| Ada Lovelace | sm_89 | RTX 4090, L40 |
| Hopper | sm_90 | H100 |
| Operator | Function | Features |
|---|---|---|
| GaussianBlur | Gaussian blur | 3×3/5×5/7×7 separable filter, reflection boundary padding |
| Sobel | Edge detection | 3×3 Sobel kernels, gradient magnitude output |
| Resize | Image scaling | Bilinear / nearest-neighbor interpolation |
| ColorConvert | Color conversion | RGB↔Gray, BGR↔RGB, RGBA→RGB |
- CMake >= 3.18
- CUDA Toolkit >= 11.0
- C++17 compatible compiler
- GTest v1.14.0 (auto-fetched via FetchContent)
This project is licensed under the MIT License.
We welcome contributions! Please see our Contributing Guide for details.