You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Copying kernels to implement NN clusterizer
* First version of clusterizer in GPU code
* Adding a compiling and running version with single-threaded ONNX model executions. Clusters are not getting published yet (FIXME)
* Clusters now working by a hack
* Working implementation of settings via GPUSettings.h and --configKeyValues "GPU_proc.[setting]=...;..."
* Modifying the onnx_interface to include the right headers
* Adjusting initialization for new ONNXRuntime version
* Adjusting global settings and CF code for several settings
* Adding return statement if cluster is rejected
* Adding some statements back
* Update to latest status of gpu clusterization
* Fixing uchar -> uint8_t
* Adding utils header
* Updating kernels.cmake to uint8_t
* Please consider the following formatting changes
* Adding an ONNX CPU library in the O2 framework
* Please consider the following formatting changes
* Fixing macOS build issues with calling O*.data()
* Fixing compiler issues and char -> uint8_t
* Fixing curly braces
* Fixing std::make_shared
* Changing order for <CommonUtils/StringUtils.h>
* Bug-fixing file name
* Making NN clusterizer more efficient
* Changing constexpr
* Fixing build issues
* Major changes to make clusterizer parallelizable. Problem remains: different sizes of nnClusterizerBatchedMode lead to different number of clusters if nnClusterizerBatchedMode < clusterer.mPmemory->counters.nClusters
* Adjusting for default CF regression
* Bug-fix for application of CF regression and logging message
* Adding is_boundary check earlier to avoid out-of-bounds access
* Bug-fixes for boundary reading
* Updating to use explicit calls to kernels instead of if-statements
* Bug-fix for class label application
* Explicit casting solves regression issues. To be done: Correct publishing for class2 regression
* Bug-fixes
* Adding some documentation
* Please consider the following formatting changes
* Modifying for Davids comments
* Modifications from comments on PR
* Please consider the following formatting changes
* iSlice -> iSector
* mISlice -> mISector
* Minor bug-fixes
* Adjusting for comments
* Bug-fix for fullCI build
* Adding GPUd() for on-device functions
* Fixing compile issues, only thing mssing: conversion of float to float16
* Let's see if this does the trick
* Making functions (constructors) GPUd() (GPUdDefault())
* GPU kernels should now be findable
* Adding ifdefs for standalone build and header exclusions in GPUORTFloat16
* Modifying the approach to not use std:: types. Still needs to be tested and need to do proper memory allocation
* New version of clusterizer. Compiles locally, but segfaults in fillInput kernel. Testing with the CI now.
* Please consider the following formatting changes
* Adjust for comments
* Please consider the following formatting changes
* Merging dev and adjusting build issues
* Adjusting for comments
* Fixing incorrect #endif
* Please consider the following formatting changes
* Fix indentation, remove duplicate define
* Fixing one memory issue. Segfault / memory leak persists
* Adjusting for new toNative function
* Fixing .finalize
* Adjusting CMakeLIsts and other bugs
* Adding GPUCA_HAS_ONNX only to tracking
* Changing to fixed size for number of clusters
* Fixed segfault. Not producing the right number of clusters yet.
* Network now accepts clusters over all sectors
* Whitespaces...
* Some weird formatting
* Please consider the following formatting changes
* Removing white-spaces
* Adding necessary if-statement to avoid automatic model loading
* Removing GPUConstantMem, adding interOpNumThreads option
* Found the bug where I loose clusters
* Editor configured for whitespaces at EOF
---------
Co-authored-by: ALICE Action Bot <alibuild@cern.ch>
Co-authored-by: David Rohr <github@jwdt.org>
template <classI, classO> // class I is the input data type, e.g. float, class O is the output data type, e.g. O2::gpu::OrtDataType::Float16_t from O2/GPU/GPUTracking/ML/convert_float16.h
template <classI, classO> // class I is the input data type, e.g. float, class O is the output data type, e.g. OrtDataType::Float16_t from O2/Common/ML/include/ML/GPUORTFloat16.h
60
+
voidinference(I*, size_t, O*);
61
+
58
62
// template<class I, class T, class O> // class I is the input data type, e.g. float, class T the throughput data type and class O is the output data type
template <classI, classO> // class I is the input data type, e.g. float, class O is the output data type, e.g. O2::gpu::OrtDataType::Float16_t from O2/GPU/GPUTracking/ML/convert_float16.h
template <classI, classO> // class I is the input data type, e.g. float, class O is the output data type, e.g. O2::gpu::OrtDataType::Float16_t from O2/GPU/GPUTracking/ML/convert_float16.h
(pImplOrt->session)->Run(pImplOrt->runOptions, inputNamesChar.data(), &inputTensor, 1, outputNamesChar.data(), &outputTensor, outputNamesChar.size()); // TODO: Not sure if 1 is correct here
0 commit comments