[tmva][sofie] Add support for parsing external ONNX data file for weights by lmoneta · Pull Request #21048 · root-project/root

lmoneta · 2026-01-27T15:17:10Z

Support external_data when parsing the initialized tensor from an ONNX file To test the feture, run tutorial RMVA_SOFIE_ONNX.py changing when exporting to the ONNX file from external_data=False to external_data=True

github-actions · 2026-01-27T19:54:43Z

Test Results

22 files 22 suites 3d 14h 51m 37s ⏱️
3 770 tests 3 770 ✅ 0 💤 0 ❌
75 890 runs 75 890 ✅ 0 💤 0 ❌

Results for commit 9e5074b.

sanjibansg

Thanks for this implement, couple of comments on using a different local for loop variable.

sanjibansg · 2026-01-30T16:28:43Z

tmva/sofie_parsers/src/RModelParser_ONNX.cxx

   for (int i = 0; i < graph.initializer_size(); i++) {
      onnx::TensorProto *tensorproto = const_cast<onnx::TensorProto *>(&graph.initializer(i));
      std::vector<std::size_t> shape;
-      std::size_t fLength = 1;
+      std::size_t tensor_length = 1;
      for (int j = 0; j < tensorproto->dims_size(); j++) {
         shape.push_back(tensorproto->dims(j));
-         fLength *= tensorproto->dims(j);
+         tensor_length *= tensorproto->dims(j);
      }
      // in case of scalars keep an empty shape but with length =1

-      std::string input_name = graph.initializer(i).name();
+      std::string tensor_name = graph.initializer(i).name();

      if (verbose)
-         std::cout << "\t initializer " << i << " name " << input_name << " type " << graph.initializer(i).data_type()
-                   << std::endl;
+         std::cout << "\t initializer " << i << " name " << tensor_name << " type " << graph.initializer(i).data_type()
+                   << " and length " << tensor_length << std::endl;
+

      // register also the initialized tensors
      auto tensor_type = static_cast<ETensorType>(graph.initializer(i).data_type());
-      RegisterTensorType(input_name, tensor_type);
-
-      switch (tensor_type) {
-      case ETensorType::FLOAT: {
-         std::shared_ptr<void> data = GetInitializedTensorData<float>(tensorproto, fLength);
-         if (verbose) std::cout << "add FLOAT initialized tensor " << input_name << " shape " << ConvertShapeToString(shape) << std::endl;
-         rmodel.AddInitializedTensor(input_name, ETensorType::FLOAT, shape, data);
-         allInitializedTensors[input_name] = i;
-         break;
-      }
-      case ETensorType::DOUBLE: {
-         std::shared_ptr<void> data = GetInitializedTensorData<double>(tensorproto, fLength);
-         if (verbose) std::cout << "add DOUBLE initialized tensor " << input_name << " shape " << ConvertShapeToString(shape) << std::endl;
-         rmodel.AddInitializedTensor(input_name, ETensorType::DOUBLE, shape, data);
-         allInitializedTensors[input_name] = i;
-         break;
-      }
-      case ETensorType::INT32: {
-         std::shared_ptr<void> data = GetInitializedTensorData<int32_t>(tensorproto, fLength);
-         if (verbose) std::cout << "add INT32 initialized tensor " << input_name << " shape " << ConvertShapeToString(shape) << std::endl;
-         rmodel.AddInitializedTensor(input_name, ETensorType::INT32, shape, data);
-         allInitializedTensors[input_name] = i;
-         break;
-      }
-      case ETensorType::INT64: {
-         std::shared_ptr<void> data = GetInitializedTensorData<int64_t>(tensorproto, fLength);
-         if (verbose) std::cout << "add INT64 initialized tensor " << input_name << " shape " << ConvertShapeToString(shape) << std::endl;
-         rmodel.AddInitializedTensor(input_name, ETensorType::INT64, shape, data);
-         allInitializedTensors[input_name] = i;
-         break;
-      }
-      default:
-         throw std::runtime_error("Data type in weight tensor " + graph.initializer(i).name() + " not supported!\n");
+      RegisterTensorType(tensor_name, tensor_type);
+
+      std::shared_ptr<void> data = GetInitializedTensorData(tensorproto, tensor_length * GetTypeSize(tensor_type), tensor_type);
+      rmodel.AddInitializedTensor(tensor_name, tensor_type, shape, data);
+      allInitializedTensors[tensor_name] = i;
+
+      if (verbose) {
+         std::cout << "add initialized tensor " << tensor_name << "with shape " << ConvertShapeToString(shape) << "and  ";
+         if (tensor_type == ETensorType::FLOAT) {
+            std::cout << " float data: ";
+            for (int i = 0; i < std::min(int(tensor_length),3); i++) std::cout << static_cast<float*>(data.get())[0] << "  ";


As the warning says, probably better to use something else than i here for the for loop variable, since this shadows the previous one which might cause errors in the future.

sanjibansg · 2026-01-30T16:29:02Z

tmva/sofie_parsers/src/RModelParser_ONNX.cxx

+         }
+         else if (tensor_type == ETensorType::INT64) {
+            std::cout << " int64 data: ";
+            for (int i = 0; i < std::min(int(tensor_length),3); i++) std::cout << static_cast<int64_t*>(data.get())[0] << "  ";


Same comment on using a different loop variable here.

Add an implementation of ScatterND operator which is needed to parse the MLPF model from CMS Include also 3 tests to probe the different type of scattering wich can be performed

- Fix in operator Reduce to return a scalar and not a tensor of shape [1] - Fix handling of output boolean type in Cast. Do not convert type in a string, because a boolean is converted to a uint8_t which can be a native uint8_t or a bool. Avoid then calling function ConvertStrigToType if possible - Fix fusion of operators. Perform fusion not at first op encountered but at the last onem in order to parse before all operators which can provide an input to last fused one. This was the case in MLPF where there was a MatMul + Constant + Add, where COnstant is an input to Add. - remove check in Generate on empty shapes because scalars tensors have empty shapes

Fix also a bug when doing Gemm and applying the bias in case of stacked matrix multiplications. The bias was not correctly broadcasted in this case

The exponent in the softmax (and also the log in the logsoftmax) can be implemented using the vdt fast_exo and fast_log funcitons which are faster

Apply fixes to the new impelmentation of Where added in one of the previous commits

When doing Transpose do not loop on a global index, but per dimension and compute the correct input/output indices using the strides. This is much more efficient then computing the coordinate indices using modulo and divisions Add also fast case when last dimensions are contigous Clean up also some other files by removing some debug statements

Fix some bugs in those operators used by MLPF model.

guitargeek · 2026-03-31T21:57:47Z

This is great! It tripped me up several times that you get silently wrong results with SOFIE when you use external ONNX data files.

Is there anything blocking this PR?

Add a new test where the matrix B is changing for each batch DO not use empty scopes in Gemm (it is a problem for CLAD on MacOS) and instead configure the spacing depending if using stack multiplication

Add Clip operator and a test for it

In case of ParT the Gemm is computed using as input a bias fector (constructed from the interaction feature) which is not an initialized tensor and has rank > 2. Add in SOFIE support for bias vector with larger rank and for dynamic shapes This fixes parsing of the ParT model

Add new GELU operator and corrisponding tests Add also Atan and Floor operators as new unary operators Include in ONNX parser support for bool initialized data type when reading the model initialized data

sanjibansg

LGTM! Thanks for this implementation.

…itons

…ghts Support external_data when parsing the initialized tensor from an ONNX file To test the feture, run tutorial RMVA_SOFIE_ONNX.py changing when exporting to the ONNX file from external_data=False to external_data=True

lmoneta requested a review from sanjibansg January 27, 2026 15:17

lmoneta self-assigned this Jan 27, 2026

sanjibansg reviewed Jan 30, 2026

View reviewed changes

guitargeek added the in:TMVA label Feb 23, 2026

guitargeek added in:SOFIE and removed in:TMVA labels Mar 4, 2026

lmoneta added 8 commits March 31, 2026 10:13

[tmva][sofie] Add new ScatterND operator

bdd3aba

Add an implementation of ScatterND operator which is needed to parse the MLPF model from CMS Include also 3 tests to probe the different type of scattering wich can be performed

[tmva][sofie] Add support for dynamic tensors in Where operator

7364a71

[tmva][sofie] Apply some fixes needed for MLPF model

4dcd2c6

Fix also a bug when doing Gemm and applying the bias in case of stacked matrix multiplications. The bias was not correctly broadcasted in this case

[tmva][sofie] Add option to use vdt in softmax operator

7b8630d

The exponent in the softmax (and also the log in the logsoftmax) can be implemented using the vdt fast_exo and fast_log funcitons which are faster

[tmva][sofie] Apply fixes to new Where implementation

e73e4bc

Apply fixes to the new impelmentation of Where added in one of the previous commits

[tmva][sofie] Fix bugs in Expand, Gemm, Shape and Slice operators

700f950

Fix some bugs in those operators used by MLPF model.

guitargeek changed the title ~~[tmv][sofie] Add support for parsing external ONNX data file for weights~~ [tmva][sofie] Add support for parsing external ONNX data file for weights Mar 31, 2026

lmoneta added 4 commits April 1, 2026 15:30

[tmva][sofie] Add a new test of stacked matric multiplication

af1c2c3

Add a new test where the matrix B is changing for each batch DO not use empty scopes in Gemm (it is a problem for CLAD on MacOS) and instead configure the spacing depending if using stack multiplication

[tmva][sofie] Add new Clip operator

c49e614

Add Clip operator and a test for it

[tmva][sofie] Add support for new operators (GELU, Atan, Floor)

17c12d2

Add new GELU operator and corrisponding tests Add also Atan and Floor operators as new unary operators Include in ONNX parser support for bool initialized data type when reading the model initialized data

lmoneta force-pushed the tmva_sofie_fix_parser_external_data branch 2 times, most recently from b30d212 to 941b3d1 Compare April 10, 2026 13:32

lmoneta requested review from bellenot, guitargeek and vepadulano as code owners April 10, 2026 13:32

sanjibansg approved these changes Apr 10, 2026

View reviewed changes

lmoneta added 2 commits April 10, 2026 18:07

[tmva][sofie] Fix bug in handling bias offsets for stacked multiplica…

2ade67e

…itons

lmoneta force-pushed the tmva_sofie_fix_parser_external_data branch from 941b3d1 to 986600e Compare April 10, 2026 16:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tmva][sofie] Add support for parsing external ONNX data file for weights#21048

[tmva][sofie] Add support for parsing external ONNX data file for weights#21048
lmoneta wants to merge 14 commits intoroot-project:masterfrom
lmoneta:tmva_sofie_fix_parser_external_data

lmoneta commented Jan 27, 2026

Uh oh!

github-actions bot commented Jan 27, 2026

Uh oh!

sanjibansg left a comment

Uh oh!

sanjibansg Jan 30, 2026

Uh oh!

sanjibansg Jan 30, 2026

Uh oh!

guitargeek commented Mar 31, 2026

Uh oh!

sanjibansg left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lmoneta commented Jan 27, 2026

Uh oh!

github-actions bot commented Jan 27, 2026

Test Results

Uh oh!

sanjibansg left a comment

Choose a reason for hiding this comment

Uh oh!

sanjibansg Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

sanjibansg Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

guitargeek commented Mar 31, 2026

Uh oh!

sanjibansg left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants