Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
84eac06
Initial set of bug.fixes and cosmetic changes
ChSonnabend Mar 15, 2025
2191649
Please consider the following formatting changes
alibuild Mar 15, 2025
5be779c
Merge pull request #18 from alibuild/alibot-cleanup-14069
ChSonnabend Mar 15, 2025
b742c50
Adjusting eval sizes. Makes code neater and avoids some calculations
ChSonnabend Mar 15, 2025
c0bc918
Merge branch 'dev' into gpu_clusterizer_bug_fixes
ChSonnabend Mar 19, 2025
0c1cfb7
Adding separate functions. Now the host process only needs one instan…
ChSonnabend Mar 20, 2025
83c004f
First version of CCDB implementation
ChSonnabend Mar 22, 2025
d767ed1
Working CCDB API calls (tested with test-ccdb)
ChSonnabend Mar 23, 2025
ad4b22b
Improve fetching, but have to pass settings by value, not const ref
ChSonnabend Mar 24, 2025
81c646b
Using const ref and moving CCDB calls to host initialization
ChSonnabend Mar 24, 2025
566ddb7
Simplifications and renaming
ChSonnabend Mar 25, 2025
a9c33b5
Please consider the following formatting changes
alibuild Mar 25, 2025
0ed7d25
Merge pull request #19 from alibuild/alibot-cleanup-14069
ChSonnabend Mar 25, 2025
9037ea6
First version of GPU stream implementation. Still needs testing.
ChSonnabend Mar 27, 2025
64c19d5
Fixes
ChSonnabend Mar 27, 2025
8a5bb69
Please consider the following formatting changes
alibuild Mar 27, 2025
e657928
Merge pull request #20 from alibuild/alibot-cleanup-14117
ChSonnabend Mar 27, 2025
46fb1e1
Adding the lane variable. This PR will in any case conflict with #14069
ChSonnabend Mar 27, 2025
70320c3
Compiles on EPNs. Need to add shadow processors next. But for this, I…
ChSonnabend Mar 29, 2025
3174e39
Merge branch 'gpu_clusterizer_bug_fixes' into onnx_gpu_implementation
ChSonnabend Mar 29, 2025
9d9267f
Adding shadow instance. Not sure if this correctly allocates GPU memo…
ChSonnabend Mar 29, 2025
007a4a1
This runs, but will eventually fill up the VRAM. Need to include a me…
ChSonnabend Apr 1, 2025
4ef35fc
Found the stream allocation issue. Now starting optimizations
ChSonnabend Apr 1, 2025
4faaa4a
Improve readability and adapt for some comments
ChSonnabend Apr 1, 2025
2801c2e
Fixing memory assignment issue. Reconstruction runs through with FP32…
ChSonnabend Apr 2, 2025
1dcb1da
Major reworkings to add FP16 support
ChSonnabend Apr 2, 2025
7da3793
Merge branch 'dev' into onnx_gpu_implementation
ChSonnabend Apr 2, 2025
381955a
Bug-fixes
ChSonnabend Apr 3, 2025
19b5bd5
Improved data filling speeds by factor 3
ChSonnabend Apr 3, 2025
83d0257
Limiting threads for ONNX evaluation
ChSonnabend Apr 3, 2025
fff6dc3
Bug-fix for correct thread assignment and input data filling
ChSonnabend Apr 3, 2025
b437e38
Minor changes
ChSonnabend Apr 4, 2025
710993a
Adding I** inference, potentally needed for CNN + FC inference
ChSonnabend Apr 5, 2025
77c1691
CCDB fetching of NNs ported to GPUWorkflowSpec
ChSonnabend Apr 7, 2025
a985798
Adjusting CPU threads and ORT copmile definitions
ChSonnabend Apr 10, 2025
fb08f18
About 10x speed-up due to explicit io binding
ChSonnabend Apr 10, 2025
b1c88f0
Changes for synchronization and consistency. No performance loss.
ChSonnabend Apr 11, 2025
32cab70
Please consider the following formatting changes
alibuild Apr 11, 2025
5f741fc
Merge pull request #21 from alibuild/alibot-cleanup-14117
ChSonnabend Apr 11, 2025
70907aa
Fixing warnings (errors due to size_t)
ChSonnabend Apr 11, 2025
e46cdfa
Fixing linker issues
ChSonnabend Apr 13, 2025
37955fa
Merge branch 'dev' into onnx_gpu_implementation
ChSonnabend Apr 15, 2025
4b0825a
Adding volatile memory allocation and MockedOrtAllocator. Removing pr…
ChSonnabend Apr 16, 2025
497a9d4
Please consider the following formatting changes
alibuild Apr 16, 2025
aabddb7
Merge pull request #22 from alibuild/alibot-cleanup-14117
ChSonnabend Apr 16, 2025
cfdc15f
Merge dev + fixes
ChSonnabend Apr 16, 2025
a67b634
Circumvent "unused result" warning and build failure
ChSonnabend Apr 16, 2025
938a1ed
Adjust for comments
ChSonnabend Apr 19, 2025
7b07496
Please consider the following formatting changes
alibuild Apr 19, 2025
4d3f54d
Merge pull request #23 from alibuild/alibot-cleanup-14117
ChSonnabend Apr 19, 2025
af89c9a
Fixing build flags
ChSonnabend Apr 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 7 additions & 14 deletions Common/ML/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,21 +9,14 @@
# granted to it by virtue of its status as an Intergovernmental Organization
# or submit itself to any jurisdiction.

# Pass ORT variables as a preprocessor definition
if(ORT_ROCM_BUILD)
add_compile_definitions(ORT_ROCM_BUILD=1)
endif()
if(ORT_CUDA_BUILD)
add_compile_definitions(ORT_CUDA_BUILD=1)
endif()
if(ORT_MIGRAPHX_BUILD)
add_compile_definitions(ORT_MIGRAPHX_BUILD=1)
endif()
if(ORT_TENSORRT_BUILD)
add_compile_definitions(ORT_TENSORRT_BUILD=1)
endif()

o2_add_library(ML
SOURCES src/OrtInterface.cxx
TARGETVARNAME targetName
PRIVATE_LINK_LIBRARIES O2::Framework ONNXRuntime::ONNXRuntime)

# Pass ORT variables as a preprocessor definition
target_compile_definitions(${targetName} PRIVATE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of setting 0/1 definitions, I would set only the =1 definition, if the CMake variable is set.
Then, in the code further below you don't need
#if defined(FOO) && FOO=1, but you can simply use #ifdef FOO

$<$<BOOL:${ORT_ROCM_BUILD}>:ORT_ROCM_BUILD>
$<$<BOOL:${ORT_CUDA_BUILD}>:ORT_CUDA_BUILD>
$<$<BOOL:${ORT_MIGRAPHX_BUILD}>:ORT_MIGRAPHX_BUILD>
$<$<BOOL:${ORT_TENSORRT_BUILD}>:ORT_TENSORRT_BUILD>)
2 changes: 1 addition & 1 deletion Common/ML/include/ML/3rdparty/GPUORTFloat16.h
Original file line number Diff line number Diff line change
Expand Up @@ -882,4 +882,4 @@ static_assert(sizeof(BFloat16_t) == sizeof(uint16_t), "Sizes must match");
} // namespace OrtDataType

} // namespace o2
#endif
#endif
86 changes: 63 additions & 23 deletions Common/ML/include/ML/OrtInterface.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,13 @@
// O2 includes
#include "Framework/Logger.h"

namespace Ort
{
struct SessionOptions;
struct MemoryInfo;
struct Env;
} // namespace Ort

namespace o2
{

Expand All @@ -36,14 +43,52 @@ class OrtModel
{

public:
// Constructor
// Constructors & destructors
OrtModel() = default;
OrtModel(std::unordered_map<std::string, std::string> optionsMap) { reset(optionsMap); }
void init(std::unordered_map<std::string, std::string> optionsMap) { reset(optionsMap); }
void reset(std::unordered_map<std::string, std::string>);
OrtModel(std::unordered_map<std::string, std::string> optionsMap) { init(optionsMap); }
void init(std::unordered_map<std::string, std::string> optionsMap)
{
initOptions(optionsMap);
initEnvironment();
}
virtual ~OrtModel() = default;

// General purpose
void initOptions(std::unordered_map<std::string, std::string> optionsMap);
void initEnvironment();
void initSession();
void memoryOnDevice(int32_t = 0);
bool isInitialized() { return mInitialized; }
void resetSession();

virtual ~OrtModel() = default;
// Getters
std::vector<std::vector<int64_t>> getNumInputNodes() const { return mInputShapes; }
std::vector<std::vector<int64_t>> getNumOutputNodes() const { return mOutputShapes; }
std::vector<std::string> getInputNames() const { return mInputNames; }
std::vector<std::string> getOutputNames() const { return mOutputNames; }
Ort::SessionOptions* getSessionOptions();
Ort::MemoryInfo* getMemoryInfo();
Ort::Env* getEnv();
int32_t getIntraOpNumThreads() const { return intraOpNumThreads; }
int32_t getInterOpNumThreads() const { return interOpNumThreads; }

// Setters
void setDeviceId(int32_t id) { deviceId = id; }
void setIO();
void setActiveThreads(int threads) { intraOpNumThreads = threads; }
void setIntraOpNumThreads(int threads)
{
if (deviceType == "CPU") {
intraOpNumThreads = threads;
}
}
void setInterOpNumThreads(int threads)
{
if (deviceType == "CPU") {
interOpNumThreads = threads;
}
}
void setEnv(Ort::Env*);

// Conversion
template <class I, class O>
Expand All @@ -53,41 +98,36 @@ class OrtModel
template <class I, class O> // class I is the input data type, e.g. float, class O is the output data type, e.g. OrtDataType::Float16_t from O2/Common/ML/include/ML/GPUORTFloat16.h
std::vector<O> inference(std::vector<I>&);

template <class I, class O> // class I is the input data type, e.g. float, class O is the output data type, e.g. O2::gpu::OrtDataType::Float16_t from O2/GPU/GPUTracking/ML/convert_float16.h
template <class I, class O>
std::vector<O> inference(std::vector<std::vector<I>>&);

template <class I, class O> // class I is the input data type, e.g. float, class O is the output data type, e.g. OrtDataType::Float16_t from O2/Common/ML/include/ML/GPUORTFloat16.h
void inference(I*, size_t, O*);

// template<class I, class T, class O> // class I is the input data type, e.g. float, class T the throughput data type and class O is the output data type
// std::vector<O> inference(std::vector<I>&);

// Reset session
void resetSession();
template <class I, class O>
void inference(I*, int64_t, O*);

std::vector<std::vector<int64_t>> getNumInputNodes() const { return mInputShapes; }
std::vector<std::vector<int64_t>> getNumOutputNodes() const { return mOutputShapes; }
std::vector<std::string> getInputNames() const { return mInputNames; }
std::vector<std::string> getOutputNames() const { return mOutputNames; }
template <class I, class O>
void inference(I**, int64_t, O*);

void setActiveThreads(int threads) { intraOpNumThreads = threads; }
void release(bool = false);

private:
// ORT variables -> need to be hidden as Pimpl
// ORT variables -> need to be hidden as pImpl
struct OrtVariables;
OrtVariables* pImplOrt;

// Input & Output specifications of the loaded network
std::vector<const char*> inputNamesChar, outputNamesChar;
std::vector<std::string> mInputNames, mOutputNames;
std::vector<std::vector<int64_t>> mInputShapes, mOutputShapes;
std::vector<std::vector<int64_t>> mInputShapes, mOutputShapes, inputShapesCopy, outputShapesCopy; // Input shapes
std::vector<int64_t> inputSizePerNode, outputSizePerNode; // Output shapes
int32_t mInputsTotal = 0, mOutputsTotal = 0; // Total number of inputs and outputs

// Environment settings
bool mInitialized = false;
std::string modelPath, device = "cpu", dtype = "float", thread_affinity = ""; // device options should be cpu, rocm, migraphx, cuda
int intraOpNumThreads = 1, interOpNumThreads = 1, deviceId = 0, enableProfiling = 0, loggingLevel = 0, allocateDeviceMemory = 0, enableOptimizations = 0;
std::string modelPath, envName = "", deviceType = "CPU", thread_affinity = ""; // device options should be cpu, rocm, migraphx, cuda
int32_t intraOpNumThreads = 1, interOpNumThreads = 1, deviceId = -1, enableProfiling = 0, loggingLevel = 0, allocateDeviceMemory = 0, enableOptimizations = 0;

std::string printShape(const std::vector<int64_t>&);
std::string printShape(const std::vector<std::vector<int64_t>>&, std::vector<std::string>&);
};

} // namespace ml
Expand Down
Loading