English | 简体中文
A DAG-based heterogeneous image processing pipeline with multi-stream scheduling and pinned memory pool, built with CUDA C++.
- GPU Accelerated: Full CUDA implementation with async execution
- DAG Scheduling: Directed acyclic graph-based task dependency management
- Multi-Stream Execution: Concurrent CUDA stream execution for independent tasks
- Memory Management: Pinned / device memory pools with best-fit allocation, automatic reuse
- Separable Filtering: Gaussian blur optimized with separable horizontal + vertical passes
- Error Propagation: Task failures automatically propagate downstream along the DAG
| Operator | Function | Features |
|---|---|---|
| GaussianBlur | Gaussian blur | 3×3/5×5/7×7 separable filter, reflection boundary padding |
| Sobel | Edge detection | 3×3 Sobel kernels, gradient magnitude output |
| Resize | Image scaling | Bilinear / nearest-neighbor interpolation |
| ColorConvert | Color conversion | RGB↔Gray, BGR↔RGB, RGBA→RGB |
- CMake >= 3.18
- CUDA Toolkit >= 11.0
- GTest v1.14.0 (auto-fetched via FetchContent)
# Debug build
cmake --preset default
cmake --build --preset default
# Release build
cmake --preset release
cmake --build --preset release
# Native GPU arch only (faster compile)
cmake --preset minimal
cmake --build --preset minimalmkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build . -j$(nproc)./build/demo_pipelinectest --preset default
# or
./build/mini_image_pipe_tests| Architecture | Compute Capability | Example GPUs |
|---|---|---|
| Volta | sm_70 | V100 |
| Turing | sm_75 | RTX 2080, T4 |
| Ampere | sm_80, sm_86 | A100, RTX 3090 |
| Ada Lovelace | sm_89 | RTX 4090, L40 |
| Hopper | sm_90 | H100 |
#include "pipeline.h"
#include "operators/resize.h"
#include "operators/color_convert.h"
#include "operators/gaussian_blur.h"
#include "operators/sobel.h"
using namespace mini_image_pipe;
int main() {
PipelineConfig config;
config.numStreams = 4;
Pipeline pipeline(config);
// Add operators
auto resize = std::make_shared<ResizeOperator>(320, 240, InterpolationMode::BILINEAR);
auto gray = std::make_shared<ColorConvertOperator>(ColorConversionType::RGB_TO_GRAY);
auto blur = std::make_shared<GaussianBlurOperator>(GaussianKernelSize::KERNEL_5x5);
auto sobel = std::make_shared<SobelOperator>();
int n1 = pipeline.addOperator("Resize", resize);
int n2 = pipeline.addOperator("Gray", gray);
int n3 = pipeline.addOperator("Blur", blur);
int n4 = pipeline.addOperator("Sobel", sobel);
// Connect: Resize -> Gray -> Blur -> Sobel
pipeline.connect(n1, n2);
pipeline.connect(n2, n3);
pipeline.connect(n3, n4);
// Set input and execute
pipeline.setInput(n1, d_input, width, height, channels);
pipeline.execute();
void* output = pipeline.getOutput(n4);
return 0;
}mini-image-pipe/
├── include/
│ ├── types.h # Data types, enums, KernelConfig
│ ├── operator.h # IOperator abstract base class
│ ├── memory_manager.h # Pinned/Device memory pool manager
│ ├── task_graph.h # DAG task graph (topological sort, cycle detection)
│ ├── scheduler.h # CUDA multi-stream DAG scheduler
│ ├── pipeline.h # Pipeline builder and execution entry
│ └── operators/
│ ├── color_convert.h # Color space conversion operator
│ ├── resize.h # Image resize operator
│ ├── sobel.h # Sobel edge detection operator
│ └── gaussian_blur.h # Gaussian blur operator (separable filter)
├── src/
│ ├── memory_manager.cu # Memory pool (best-fit strategy)
│ ├── task_graph.cpp # Kahn topological sort, DFS cycle detection
│ ├── scheduler.cu # Stream assignment, event sync, error propagation
│ ├── pipeline.cpp # Buffer allocation, dimension inference, batch processing
│ └── operators/
│ ├── color_convert.cu # RGB/BGR/RGBA/Gray conversion kernels
│ ├── resize.cu # Nearest-neighbor / bilinear interpolation kernels
│ ├── sobel.cu # 3×3 Sobel gradient kernel (__constant__ weights)
│ └── gaussian_blur.cu # Separable Gaussian kernel (horizontal + vertical pass)
├── tests/ # GTest property tests (100 random iterations per operator)
├── examples/
│ └── demo_pipeline.cpp # End-to-end pipeline demo
├── .clang-format # Code format rules
├── .editorconfig # Editor format rules
├── CMakeLists.txt # Build configuration
└── CMakePresets.json # CMake presets (default/release/minimal)
┌───────────────────────────────────────────────────────┐
│ Pipeline API │
├───────────────────────────────────────────────────────┤
│ TaskGraph │ DAGScheduler │ MemoryManager │
├───────────────────────────────────────────────────────┤
│ Operators: Gaussian │ Sobel │ Resize │ ColorConvert │
├───────────────────────────────────────────────────────┤
│ CUDA Streams │ CUDA Events │ Shared Memory │
└───────────────────────────────────────────────────────┘
- Modern CMake:
target_include_directories, generator expressions, FetchContent, MSVC compatibility - CI: GitHub Actions (CUDA container build + clang-format check + ctest)
- Memory Safety: Pooled memory management, best-fit allocation, automatic reuse
- Error Handling: Full CUDA API error checking, DAG failure propagation
- Code Standards:
.clang-format(Google style, 4-space indent, 100 col) - Test Coverage: 100-iteration randomized property tests per operator/component
MIT License