Skip to content

Latest commit

 

History

History
245 lines (175 loc) · 6.99 KB

File metadata and controls

245 lines (175 loc) · 6.99 KB
layout default
title Home
nav_order 1
permalink /

Mini-ImagePipe

{: .fs-10 .fw-700 .text-center }

High-performance DAG-based GPU Image Processing Pipeline {: .fs-6 .fw-300 .text-center .text-grey-dk-000 }

High-performance GPU image processing framework with DAG task scheduling, multi-stream execution, and CUDA-accelerated operators. Designed for real-time video and batch image processing workflows. {: .fs-5 .text-center .mt-4 }

[Get Started]({{ '/docs/getting-started' | relative_url }}){: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 } View on GitHub{: .btn .fs-5 .mb-4 .mb-md-0 .mr-2 } [API Reference]({{ '/docs/api' | relative_url }}){: .btn .fs-5 .mb-4 .mb-md-0 }


Why Mini-ImagePipe?

{: .section-title } Feature Comparison

Feature Mini-ImagePipe OpenCV GPU Custom CUDA
DAG Scheduling Manual
Memory Pool ⚠️ Limited Manual
Multi-Stream ⚠️ Limited Manual
Zero-copy Pipeline Manual
Easy API
Error Propagation Manual

Features

🚀 GPU Accelerated

Full CUDA implementation with asynchronous kernel execution. Leverages the full power of NVIDIA GPUs for real-time image processing.

🕸️ DAG Scheduling

Directed acyclic graph-based task dependency management with automatic parallelization. Optimize execution order automatically.

⚡ Multi-Stream Execution

Concurrent CUDA stream execution for independent tasks. Maximum GPU utilization through intelligent stream assignment.

🧠 Memory Efficient

Pinned and device memory pools with best-fit allocation strategy. Minimize allocation overhead across pipeline runs.

🔧 Separable Filtering

Gaussian blur optimized with separable horizontal and vertical passes. Significant performance improvement for large kernels.

🛡️ Error Propagation

Task failures automatically propagate downstream along the DAG. Robust error handling and recovery.


Performance Benchmarks

{: .performance-section } Benchmarks

Operator Image Size Throughput Latency
GaussianBlur 5×5 1920×1080 850+ FPS ~1.2ms
Sobel Edge 1920×1080 1200+ FPS ~0.8ms
Resize (2× down) 1920×1080 1500+ FPS ~0.7ms
ColorConvert 1920×1080 2000+ FPS ~0.5ms
Pipeline (4 ops) 1920×1080 400+ FPS ~2.5ms

Benchmarked on NVIDIA RTX 3090. Your results may vary based on GPU model and configuration.


Pipeline Architecture

graph LR
    Input[📥 Input Image] --> Resize[🔄 Resize]
    Resize --> Blur[🌀 Gaussian Blur]
    Blur --> Sobel[📐 Sobel Edge]
    Sobel --> Output[📤 Output]
    
    style Input fill:#76B900,stroke:#5A8F00,color:#1a1a1a
    style Output fill:#76B900,stroke:#5A8F00,color:#1a1a1a
    style Resize fill:#2d2d2d,stroke:#76B900
    style Blur fill:#2d2d2d,stroke:#76B900
    style Sobel fill:#2d2d2d,stroke:#76B900
Loading

Use Cases

📹 Real-time Video Processing {: .use-case } Live video filters, effects, and streaming applications

🚗 Autonomous Driving {: .use-case } Perception pipeline preprocessing for sensor fusion

🏥 Medical Imaging {: .use-case } DICOM image processing and analysis workflows

🤖 Embedded AI {: .use-case } Jetson platform deployment for edge computing


Quick Start

# Clone the repository
git clone https://github.com/LessUp/mini-image-pipe.git
cd mini-image-pipe

# Build with CMake presets (Release)
cmake --preset release
cmake --build --preset release

# Run tests
ctest --preset release

# Run demo
./build/demo_pipeline

Usage Example

#include "pipeline.h"
#include "operators/resize.h"
#include "operators/gaussian_blur.h"
#include "operators/sobel.h"

using namespace mini_image_pipe;

int main() {
    // Configure pipeline with 4 CUDA streams
    PipelineConfig config;
    config.numStreams = 4;
    Pipeline pipeline(config);

    // Create operators
    auto resize = std::make_shared<ResizeOperator>(320, 240);
    auto blur   = std::make_shared<GaussianBlurOperator>(GaussianKernelSize::KERNEL_5x5);
    auto sobel  = std::make_shared<SobelOperator>();

    // Build the DAG
    int n1 = pipeline.addOperator("Resize", resize);
    int n2 = pipeline.addOperator("Blur",   blur);
    int n3 = pipeline.addOperator("Sobel",  sobel);

    pipeline.connect(n1, n2);  // Resize → Blur
    pipeline.connect(n2, n3);  // Blur → Sobel

    // Set input and execute
    pipeline.setInput(n1, d_input, width, height, channels);
    pipeline.execute();

    // Get output
    void* output = pipeline.getOutput(n3);
    return 0;
}

GPU Architecture Support

Architecture Compute Example GPUs
Volta sm_70 V100
Turing sm_75 RTX 2080, T4
Ampere sm_80, sm_86 A100, RTX 3090
Ada Lovelace sm_89 RTX 4090, L40
Hopper sm_90 H100

Available Operators

Operator Function Features
GaussianBlur Gaussian blur 3×3/5×5/7×7 separable filter, reflection boundary padding
Sobel Edge detection 3×3 Sobel kernels, gradient magnitude output
Resize Image scaling Bilinear / nearest-neighbor interpolation
ColorConvert Color conversion RGB↔Gray, BGR↔RGB, RGBA→RGB

Documentation

[📖 Getting Started]({{ '/docs/getting-started' | relative_url }}) [💡 Usage Examples]({{ '/docs/usage' | relative_url }}) [🏗️ Architecture]({{ '/docs/architecture' | relative_url }}) [📚 API Reference]({{ '/docs/api' | relative_url }}) [ℹ️ About]({{ '/about' | relative_url }})

Requirements

  • CMake >= 3.18
  • CUDA Toolkit >= 11.0
  • C++17 compatible compiler
  • GTest v1.14.0 (auto-fetched via FetchContent)

License

This project is licensed under the MIT License.


Contributing

We welcome contributions! Please see our Contributing Guide for details.