Skip to content

Adding option for multithreaded query mode#27

Open
GeorgiLH wants to merge 6 commits intocompiler-research:developfrom
GeorgiLH:rdataframe
Open

Adding option for multithreaded query mode#27
GeorgiLH wants to merge 6 commits intocompiler-research:developfrom
GeorgiLH:rdataframe

Conversation

@GeorgiLH
Copy link
Copy Markdown

@GeorgiLH GeorgiLH commented Mar 7, 2026

Currently only prints details about the supplied file.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread inc/ramcore/RDF_RAMNTupleView.h Outdated
@@ -0,0 +1,9 @@
#ifndef RAMCORE_RDF_RAMNTUPLEVIEW_H
#define RAMCORE_RDF_RAMNTUPLEVIEW_H
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: header guard does not follow preferred style [llvm-header-guard]

Suggested change
#define RAMCORE_RDF_RAMNTUPLEVIEW_H
#ifndef GITHUB_WORKSPACE_INC_RAMCORE_RDF_RAMNTUPLEVIEW_H
#define GITHUB_WORKSPACE_INC_RAMCORE_RDF_RAMNTUPLEVIEW_H

inc/ramcore/RDF_RAMNTupleView.h:7:

- #endif  //RAMCORE_RDF_RAMNTUPLEVIEW_H
+ #endif // GITHUB_WORKSPACE_INC_RAMCORE_RDF_RAMNTUPLEVIEW_H

Comment thread inc/ramcore/RDF_RAMNTupleView.h Outdated
@@ -0,0 +1,9 @@
#ifndef RAMCORE_RDF_RAMNTUPLEVIEW_H
#define RAMCORE_RDF_RAMNTUPLEVIEW_H
#include <RtypesCore.h>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'RtypesCore.h' file not found [clang-diagnostic-error]

#include <RtypesCore.h>
         ^

Comment thread src/ramcore/RDF_RAMNTupleView.cxx Outdated
@@ -0,0 +1,13 @@
#include <ROOT/RDataFrame.hxx>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'ROOT/RDataFrame.hxx' file not found [clang-diagnostic-error]

#include <ROOT/RDataFrame.hxx>
         ^

Comment thread src/ramcore/RDF_RAMNTupleView.cxx Outdated
#include <ROOT/RDataFrame.hxx>
#include <ramcore/RDF_RAMNTupleView.h>

ULong64_t rdf_ramntupleview(const char *file, const char *query, bool cache, bool perfstats, const char *perfstatsfilename){
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'rdf_ramntupleview' can be made static or moved into an anonymous namespace to enforce internal linkage [misc-use-internal-linkage]

Suggested change
ULong64_t rdf_ramntupleview(const char *file, const char *query, bool cache, bool perfstats, const char *perfstatsfilename){
ULong64_t rdf_ramntupleviewstatic (const char *file, const char *query, bool cache, bool perfstats, const char *perfstatsfilename){

Comment thread tools/rdf_ramntupleview.cxx Outdated


#include <iostream>
#include <ramcore/RDF_RAMNTupleview.h>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'ramcore/RDF_RAMNTupleview.h' file not found [clang-diagnostic-error]

#include <ramcore/RDF_RAMNTupleview.h>
         ^

Comment thread tools/rdf_ramntupleview.cxx Outdated
int main(int argc, char **argv){

if (argc < 2){
std::cerr << "Usage: " << argv[0] << " <file.root> [rname:start-end]\n";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: do not use pointer arithmetic [cppcoreguidelines-pro-bounds-pointer-arithmetic]

      std::cerr << "Usage: " << argv[0] << " <file.root> [rname:start-end]\n";
                                ^

Comment thread tools/rdf_ramntupleview.cxx Outdated

if (argc < 2){
std::cerr << "Usage: " << argv[0] << " <file.root> [rname:start-end]\n";
std::cerr << "Example: " << argv[0] << " output.root chr1:1000-2000\n";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: do not use pointer arithmetic [cppcoreguidelines-pro-bounds-pointer-arithmetic]

      std::cerr << "Example: " << argv[0] << " output.root chr1:1000-2000\n";
                                  ^

Comment thread tools/rdf_ramntupleview.cxx Outdated
return 1;
}

const char* file = argv[1];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: do not use pointer arithmetic [cppcoreguidelines-pro-bounds-pointer-arithmetic]

const char* file = argv[1];
                   ^

Comment thread tools/rdf_ramntupleview.cxx Outdated

const char* file = argv[1];

const char* query = argv[2];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: do not use pointer arithmetic [cppcoreguidelines-pro-bounds-pointer-arithmetic]

const char* query = argv[2];
                    ^

@AdityaPandeyCN
Copy link
Copy Markdown

@GeorgiLH We can make changes in the existing RAMNTupleview(https://github.com/compiler-research/ramtools/blob/develop/src/ramcore/RAMNTupleView.cxx) tool file with this approach, it would be better to have a single tool for viewing and querying( condition applied that this approach is faster than the previous one). Also make sure that the tests pass locally and also apply clang formatting by running this git-clang-format HEAD~

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread inc/ramcore/RDF_RAMNTupleView.h Outdated
@@ -0,0 +1,8 @@
#ifndef RAMCORE_RDF_RAMNTUPLEVIEW_H
#define RAMCORE_RDF_RAMNTUPLEVIEW_H
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: header guard does not follow preferred style [llvm-header-guard]

Suggested change
#define RAMCORE_RDF_RAMNTUPLEVIEW_H
#ifndef GITHUB_WORKSPACE_INC_RAMCORE_RDF_RAMNTUPLEVIEW_H
#define GITHUB_WORKSPACE_INC_RAMCORE_RDF_RAMNTUPLEVIEW_H

inc/ramcore/RDF_RAMNTupleView.h:7:

- #endif // RAMCORE_RDF_RAMNTUPLEVIEW_H
+ #endif // GITHUB_WORKSPACE_INC_RAMCORE_RDF_RAMNTUPLEVIEW_H

Comment thread src/ramcore/RDF_RAMNTupleView.cxx Outdated
#include <ramcore/RDF_RAMNTupleView.h>

ULong64_t
rdf_ramntupleview(const char *file, const char *query, bool cache, bool perfstats, const char *perfstatsfilename)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'rdf_ramntupleview' can be made static or moved into an anonymous namespace to enforce internal linkage [misc-use-internal-linkage]

Suggested change
rdf_ramntupleview(const char *file, const char *query, bool cache, bool perfstats, const char *perfstatsfilename)
rdf_ramntupleviewstatic (const char *file, const char *query, bool cache, bool perfstats, const char *perfstatsfilename)

Comment thread tools/ramntupleview.cxx
@@ -1,4 +1,5 @@
#include "ramcore/RAMNTupleView.h"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'ramcore/RAMNTupleView.h' file not found [clang-diagnostic-error]

#include "ramcore/RAMNTupleView.h"
         ^

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 14, 2026

Codecov Report

❌ Patch coverage is 0% with 42 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.19%. Comparing base (5cec39c) to head (dc72994).

Files with missing lines Patch % Lines
src/ramcore/RAMNTupleView.cxx 0.00% 31 Missing ⚠️
tools/ramntupleview.cxx 0.00% 11 Missing ⚠️

❌ Your patch status has failed because the patch coverage (0.00%) is below the target coverage (85.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop      #27      +/-   ##
===========================================
- Coverage    64.78%   63.19%   -1.60%     
===========================================
  Files           16       16              
  Lines         1525     1565      +40     
  Branches       632      659      +27     
===========================================
+ Hits           988      989       +1     
- Misses         462      502      +40     
+ Partials        75       74       -1     
Flag Coverage Δ
unittests 63.19% <0.00%> (-1.60%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
tools/ramntupleview.cxx 0.00% <0.00%> (ø)
src/ramcore/RAMNTupleView.cxx 64.70% <0.00%> (-22.80%) ⬇️

... and 1 file with indirect coverage changes

Files with missing lines Coverage Δ
tools/ramntupleview.cxx 0.00% <0.00%> (ø)
src/ramcore/RAMNTupleView.cxx 64.70% <0.00%> (-22.80%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@GeorgiLH GeorgiLH marked this pull request as ready for review March 25, 2026 17:58
@GeorgiLH
Copy link
Copy Markdown
Author

Gives the same results as samtools but uses multiple threads

@GeorgiLH GeorgiLH changed the title The RDataFrame query tool made for concurrent io Adding option for multithreaded query mode Mar 25, 2026
@vgvassilev
Copy link
Copy Markdown

@AdityaPandeyCN, can you take a look?

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

There were too many comments to post at once. Showing the first 10 out of 26. Check the log or trigger a new build to see more.

Comment thread inc/ramcore/RDF_RAMNTupleView.h Outdated
#define RAMCORE_RDF_RAMNTUPLEVIEW_H
#include <RtypesCore.h>

Long64_t rdf_ramntupleview(const int num_threads, const char *file, const char *query = "", bool cache = true,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: parameter 'num_threads' is const-qualified in the function declaration; const-qualification of parameters only has an effect in function definitions [readability-avoid-const-params-in-decls]

Suggested change
Long64_t rdf_ramntupleview(const int num_threads, const char *file, const char *query = "", bool cache = true,
Long64_t rdf_ramntupleview(int num_threads, const char *file, const char *query = "", bool cache = true,

Comment thread src/ramcore/RDF_RAMNTupleView.cxx Outdated
using namespace ROOT::RDF;
namespace rdf {

static int GetRefId(ROOT::RDataFrame &df, const std::string &rname)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no header providing "std::string" is directly included [misc-include-cleaner]

src/ramcore/RDF_RAMNTupleView.cxx:2:

- using namespace ROOT::RDF;
+ #include <string>
+ using namespace ROOT::RDF;

Comment thread src/ramcore/RDF_RAMNTupleView.cxx Outdated
if (rname == "*")
return -1;

auto refs = df.Take<std::vector<std::string>>("rname_refs");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no header providing "std::vector" is directly included [misc-include-cleaner]

src/ramcore/RDF_RAMNTupleView.cxx:2:

- using namespace ROOT::RDF;
+ #include <vector>
+ using namespace ROOT::RDF;

Comment thread src/ramcore/RDF_RAMNTupleView.cxx Outdated
auto refs = df.Take<std::vector<std::string>>("rname_refs");
const auto &vec = refs.GetValue()[0];

auto it = std::find(vec.begin(), vec.end(), rname);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no header providing "std::find" is directly included [misc-include-cleaner]

src/ramcore/RDF_RAMNTupleView.cxx:1:

- #include <ramcore/RDF_RAMNTupleView.h>
+ #include <algorithm>
+ #include <ramcore/RDF_RAMNTupleView.h>

Comment thread src/ramcore/RDF_RAMNTupleView.cxx Outdated
const auto &vec = refs.GetValue()[0];

auto it = std::find(vec.begin(), vec.end(), rname);
return (it == vec.end()) ? -1 : std::distance(vec.begin(), it);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: narrowing conversion from 'typename iterator_traits<__normal_iterator<const basic_string *, vector<basic_string>>>::difference_type' (aka 'long') to signed type 'int' is implementation-defined [cppcoreguidelines-narrowing-conversions]

   return (it == vec.end()) ? -1 : std::distance(vec.begin(), it);
                                   ^

Comment thread src/ramcore/RDF_RAMNTupleView.cxx Outdated
const auto &vec = refs.GetValue()[0];

auto it = std::find(vec.begin(), vec.end(), rname);
return (it == vec.end()) ? -1 : std::distance(vec.begin(), it);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no header providing "std::distance" is directly included [misc-include-cleaner]

src/ramcore/RDF_RAMNTupleView.cxx:1:

- #include <ramcore/RDF_RAMNTupleView.h>
+ #include <iterator>
+ #include <ramcore/RDF_RAMNTupleView.h>

Comment thread src/ramcore/RDF_RAMNTupleView.cxx Outdated
}
} // namespace rdf

Long64_t rdf_ramntupleview(const int num_threads, const char *file, const char *query, bool cache, bool perfstats,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no header providing "Long64_t" is directly included [misc-include-cleaner]

src/ramcore/RDF_RAMNTupleView.cxx:1:

- #include <ramcore/RDF_RAMNTupleView.h>
+ #include <RtypesCore.h>
+ #include <ramcore/RDF_RAMNTupleView.h>

Comment thread src/ramcore/RDF_RAMNTupleView.cxx Outdated
Long64_t rdf_ramntupleview(const int num_threads, const char *file, const char *query, bool cache, bool perfstats,
const char *perfstatsfilename)
{
TStopwatch ts;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no header providing "TStopwatch" is directly included [misc-include-cleaner]

src/ramcore/RDF_RAMNTupleView.cxx:0:

- #include <ROOT/RDataFrame.hxx>
+ #include <TStopwatch.h>
+ #include <ROOT/RDataFrame.hxx>

Comment thread src/ramcore/RDF_RAMNTupleView.cxx Outdated
TStopwatch ts;
ts.Start();
std::string region = query;
int chrDelimiterPos = region.find(":");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'find' called with a string literal consisting of a single character; consider using the more effective overload accepting a character [performance-faster-string-find]

Suggested change
int chrDelimiterPos = region.find(":");
int chrDelimiterPos = region.find(':');

Comment thread src/ramcore/RDF_RAMNTupleView.cxx Outdated
TStopwatch ts;
ts.Start();
std::string region = query;
int chrDelimiterPos = region.find(":");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: narrowing conversion from 'size_type' (aka 'unsigned long') to signed type 'int' is implementation-defined [cppcoreguidelines-narrowing-conversions]

   int chrDelimiterPos = region.find(":");
                         ^

@AdityaPandeyCN
Copy link
Copy Markdown

@GeorgiLH We can make changes in the existing RAMNTupleview(https://github.com/compiler-research/ramtools/blob/develop/src/ramcore/RAMNTupleView.cxx) tool file with this approach, it would be better to have a single tool for viewing and querying( condition

We should have a single file for viewing

@GeorgiLH
Copy link
Copy Markdown
Author

GeorgiLH commented Apr 6, 2026

@GeorgiLH We can make changes in the existing RAMNTupleview(https://github.com/compiler-research/ramtools/blob/develop/src/ramcore/RAMNTupleView.cxx) tool file with this approach, it would be better to have a single tool for viewing and querying( condition

We should have a single file for viewing

Forgot about that. All done now

@GeorgiLH
Copy link
Copy Markdown
Author

@AdityaPandeyCN, @vgvassilev i made some changes; could you check them and run the workflows

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment thread inc/ramcore/RAMNTupleView.h Outdated
Long64_t ramntupleview(const char *file, const char *query = "", bool cache = true, bool perfstats = false,
const char *perfstatsfilename = "perf.root");

ULong64_t mt_ramntupleview(int numthreads, const char *file, const char *query = "", bool cache = true, bool perfstats = false,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no header providing "ULong64_t" is directly included [misc-include-cleaner]

ULong64_t mt_ramntupleview(int numthreads, const char *file, const char *query = "", bool cache = true, bool perfstats = false,
^

const auto &refids = refs.GetValue()[0];

auto it = std::find(refids.begin(), refids.end(), rname);
return (it == refids.end()) ? -1 : std::distance(refids.begin(), it);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: narrowing conversion from 'typename iterator_traits<__normal_iterator<const basic_string *, vector<basic_string>>>::difference_type' (aka 'long') to signed type 'int' is implementation-defined [cppcoreguidelines-narrowing-conversions]

   return (it == refids.end()) ? -1 : std::distance(refids.begin(), it);
                                      ^

const auto &refids = refs.GetValue()[0];

auto it = std::find(refids.begin(), refids.end(), rname);
return (it == refids.end()) ? -1 : std::distance(refids.begin(), it);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no header providing "std::distance" is directly included [misc-include-cleaner]

src/ramcore/RAMNTupleView.cxx:9:

- #include <string>
+ #include <iterator>
+ #include <string>

Comment thread src/ramcore/RAMNTupleView.cxx Outdated
} No newline at end of file
}
//NOLINTNEXTLINE(misc-use-internal-linkage)
ULong64_t mt_ramntupleview(const int numthreads, const char *file, const char *query, bool /*cache*/, bool /*perfstats*/,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no header providing "ULong64_t" is directly included [misc-include-cleaner]

ge)
    ^

Comment thread tools/ramntupleview.cxx
bool mt_set = false;
int num_threads = 0;
if (argc > 3) {
std::string mt_flag = argv[3];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: do not use pointer arithmetic [cppcoreguidelines-pro-bounds-pointer-arithmetic]

      std::string mt_flag = argv[3];
                            ^

Comment thread tools/ramntupleview.cxx
bool mt_set = false;
int num_threads = 0;
if (argc > 3) {
std::string mt_flag = argv[3];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no header providing "std::string" is directly included [misc-include-cleaner]

tools/ramntupleview.cxx:4:

- int main(int argc, char *argv[])
+ #include <string>
+ int main(int argc, char *argv[])

Comment thread tools/ramntupleview.cxx
std::string mt_flag = argv[3];
if (mt_flag.find("-m") != std::string::npos) {
mt_set = true;
num_threads = std::atoi(mt_flag.substr(2).data());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no header providing "std::atoi" is directly included [misc-include-cleaner]

tools/ramntupleview.cxx:1:

- #include <iostream>
+ #include <cstdlib>
+ #include <iostream>

Comment thread tools/ramntupleview.cxx
Long64_t read_count = ramntupleview(file, region_str);

printf("Found %lld records in region %s\n", read_count, region_str);
ULong64_t read_count;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: no header providing "ULong64_t" is directly included [misc-include-cleaner]

tools/ramntupleview.cxx:1:

- #include <iostream>
+ #include <RtypesCore.h>
+ #include <iostream>

Comment thread tools/ramntupleview.cxx
Long64_t read_count = ramntupleview(file, region_str);

printf("Found %lld records in region %s\n", read_count, region_str);
ULong64_t read_count;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: variable 'read_count' is not initialized [cppcoreguidelines-init-variables]

Suggested change
ULong64_t read_count;
ULong64_t read_count = 0;

Comment thread tools/ramntupleview.cxx
} else {
read_count = ramntupleview(file, region_str);
}
printf("Found %lld records in region %s", read_count, region_str);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: do not call c-style vararg functions [cppcoreguidelines-pro-type-vararg]

   printf("Found %lld records in region %s", read_count, region_str);
   ^

@GeorgiLH
Copy link
Copy Markdown
Author

Squashed last two commits and applied clang-format

@AdityaPandeyCN
Copy link
Copy Markdown

Can you attach a result showing the speedups?

@GeorgiLH
Copy link
Copy Markdown
Author

This version does not show speedups because it does not have indexing. I wanted to get this merged before making pull requests about flag filtering and indexing although I could just push the commits to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants