Skip to content

Commit d4deade

Browse files
miranov25miranov25
andauthored
Adding makeDiffWorkflow using path to local or alien workflow.config + treeFastCache.C+treeFastCacheND.C (#1964)
* ATO-648 - adding diff of workflow from the alien * Moving treeFastCache.C and treeFastCacheND.C from TPC gitlab to the O2DPG * Updated READMEtreeFastCache.md * Adding unit test for the tree diff and time series comparison * Adding getStat --------- Co-authored-by: miranov25 <marian.ivanov@cern.cg>
1 parent 7b07caf commit d4deade

File tree

4 files changed

+673
-0
lines changed

4 files changed

+673
-0
lines changed
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
# 📦 ROOT Tree Fast Cache System
2+
3+
This repository contains ROOT macros for fast lookup and interpolation of values from TTrees, using:
4+
5+
- `treeFastCache1D.C`: 1D cache with nearest-neighbor lookup
6+
- `treeFastCacheND.C`: ND cache with exact match in N–1 dimensions and nearest-neighbor in 1 dimension
7+
8+
They are designed for interactive analysis with `TTree::Draw`, e.g., QA plots, calibration overlays, or smoothed time series.
9+
10+
---
11+
12+
## 🔹 `treeFastCache1D.C`
13+
14+
### ✅ Features
15+
16+
- Register 1D lookup maps from TTrees
17+
- Nearest-neighbor lookup from `std::map<double, float>`
18+
- Can register by ID or string name
19+
- Fast evaluation inside `TTree::Draw`
20+
- Alias integration for interactive sessions
21+
22+
### 🧪 Example
23+
24+
```cpp
25+
TTree* tree = ...;
26+
int mapID = registerMap1DByName("dcar_vs_time", "time", "dcar_value", tree, "subentry==127");
27+
28+
tree->SetAlias("dcar_smooth", ("getNearest1D(time," + std::to_string(mapID) + ")").c_str());
29+
tree->Draw("dcar_value:dcar_smooth", "indexType==1", "colz", 10000);
30+
```
31+
32+
---
33+
34+
## 🔸 `treeFastCacheND.C`
35+
36+
### ✅ Features
37+
38+
- ND caching with:
39+
- **Exact match** in N–1 dimensions
40+
- **Nearest-neighbor** lookup in 1 dimension (e.g. `time`)
41+
- Uses full `double` precision for all keys
42+
- Alias support for `TTree::Draw`
43+
- Registration by name with hashed map ID
44+
- Variadic interface for direct use
45+
46+
### 🧪 Example: Time Series
47+
48+
```cpp
49+
TTree* tree = ...;
50+
int mapID = registerMapND("dcar_vs_time", tree, {"subentry"}, "time", "mTSITSTPC.mDCAr_A_NTracks_median", "1");
51+
setNearestNDAlias(tree, "dcar_smooth", "dcar_vs_time", "time", {"subentry"});
52+
53+
tree->Draw("mTSITSTPC.mDCAr_A_NTracks_median:dcar_smooth", "indexType==1", "colz", 10000);
54+
```
55+
56+
### 🖊️ Parameters for `registerMapND`
57+
```cpp
58+
int registerMapND(
59+
const std::string& name, // Unique name of the map
60+
TTree* tree, // Source TTree
61+
const std::vector<std::string>& exactDims, // Exact-match coordinate names
62+
const std::string& nearestDim, // Nearest-match dimension (e.g. time)
63+
const std::string& valueVar, // Variable to interpolate
64+
const std::string& selection // TTree selection
65+
);
66+
```
67+
68+
### 🖊️ Parameters for `setNearestNDAlias`
69+
```cpp
70+
void setNearestNDAlias(
71+
TTree* tree, // Target tree
72+
const std::string& aliasName, // Alias to create
73+
const std::string& mapName, // Name used in registration
74+
const std::string& nearestCoordExpr, // Nearest-match expression
75+
const std::vector<std::string>& exactCoordExprs // Exact match expressions
76+
);
77+
```
78+
79+
### ⚡️ Alternative: Direct expression
80+
```cpp
81+
tree->Draw("val:getNearestND(time,mapID,subentry)", ...);
82+
```
83+
84+
---
85+
86+
## 📊 Internal Storage
87+
88+
### 1D:
89+
```cpp
90+
std::map<int, std::map<double, float>> registeredMaps;
91+
std::map<std::string, int> nameToMapID;
92+
```
93+
94+
### ND:
95+
```cpp
96+
std::map<int, std::map<std::vector<double>, std::map<double, double>>> ndCaches;
97+
std::map<std::string, int> ndNameToID;
98+
```
99+
100+
---
101+
102+
## 📌 Best Practices
103+
104+
- Use aliases to simplify `TTree::Draw` expressions
105+
- Use double precision for stability in nearest search
106+
- Store maps by string name to simplify re-registration
107+
- Prefer `setNearestNDAlias()` over manual `getNearestND(...)` for readability
108+
109+
---
110+
111+
## 📤 Future Ideas
112+
113+
- Optional interpolation (linear, spline)
114+
- Graceful handling of unmatched keys
115+
- Caching diagnostics and summary statistics
116+
- C++ class wrapper for lifecycle + reusability
117+
118+
---
119+
120+
## 📜 License
121+
122+
Intended for use in internal physics analyses. No warranty implied.
123+
124+
---
125+
126+
For more details, see comments and examples inside `treeFastCache1D.C` and `treeFastCacheND.C`.

UTILS/Parsers/treeFastCache.C

Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
/*
2+
.L $O2DPG/UTILS/Parsers/treeFastCache.C
3+
*/
4+
5+
/*
6+
treeFastCache.C
7+
Simple caching system for fast lookup of 1D values from a TTree, using nearest-neighbor interpolation.
8+
This utility allows registration of (X, Y) pairs from a TTree into a std::map,
9+
indexed by a user-defined mapID or map name. The lookup function `getNearest1D(x, mapID)`
10+
retrieves the Y value for the X closest to the query.
11+
Features:
12+
- Register maps via string name or numeric ID
13+
- Query nearest-neighbor value for any X
14+
- Graceful error handling and range checking
15+
- Base for future ND extension
16+
*/
17+
18+
#include <TTree.h>
19+
#include <TTreeFormula.h>
20+
#include <map>
21+
#include <string>
22+
#include <cmath>
23+
#include <iostream>
24+
#include <functional>
25+
26+
using namespace std;
27+
28+
// Map: mapID -> map<X, Y>
29+
std::map<int, std::map<double, float>> registeredMaps;
30+
std::map<std::string, int> nameToMapID;
31+
32+
/// Hash a string to create a deterministic mapID
33+
int hashMapName(const std::string& name) {
34+
std::hash<std::string> hasher;
35+
return static_cast<int>(hasher(name));
36+
}
37+
38+
/// Register a 1D lookup map from TTree (X -> Y)
39+
/// @param valX Name of the X-axis variable (lookup key)
40+
/// @param valY Name of the Y-axis variable (value to retrieve)
41+
/// @param tree Pointer to TTree to extract data from
42+
/// @param selection Selection string (TTree::Draw-compatible)
43+
/// @param mapID Integer ID to associate with this map
44+
void registerMap1D(const std::string& valX, const std::string& valY, TTree* tree, const std::string& selection, int mapID) {
45+
if (!tree) {
46+
std::cerr << "[registerMap1D] Null TTree pointer." << std::endl;
47+
return;
48+
}
49+
50+
int entries = tree->Draw((valY + ":" + valX).c_str(), selection.c_str(), "goff");
51+
if (entries <= 0) {
52+
std::cerr << "[registerMap1D] No entries matched for mapID=" << mapID << std::endl;
53+
return;
54+
}
55+
56+
if (!tree->GetV1() || !tree->GetV2()) {
57+
std::cerr << "[registerMap1D] Internal Draw buffer pointers are null." << std::endl;
58+
return;
59+
}
60+
61+
std::map<double, float> newMap;
62+
for (int i = 0; i < entries; ++i) {
63+
if (i >= tree->GetSelectedRows()) {
64+
std::cerr << "[registerMap1D] Index out of range at i=" << i << std::endl;
65+
break;
66+
}
67+
double x = tree->GetV2()[i]; // valX
68+
float y = tree->GetV1()[i]; // valY
69+
newMap[x] = y;
70+
}
71+
72+
registeredMaps[mapID] = std::move(newMap);
73+
std::cout << "[registerMap1D] Registered map " << mapID << " with " << entries << " entries." << std::endl;
74+
}
75+
76+
/// Register by name; returns mapID computed from name
77+
int registerMap1DByName(const std::string& mapName, const std::string& valX, const std::string& valY, TTree* tree, const std::string& selection) {
78+
int mapID = hashMapName(mapName);
79+
nameToMapID[mapName] = mapID;
80+
registerMap1D(valX, valY, tree, selection, mapID);
81+
return mapID;
82+
}
83+
84+
/// Get the nearest Y for a given X from the map registered with mapID
85+
/// @param x Query value along X axis
86+
/// @param mapID Map identifier used in registration
87+
/// @return Y value corresponding to nearest X in the map
88+
float getNearest1D(float x, int mapID) {
89+
const auto itMap = registeredMaps.find(mapID);
90+
if (itMap == registeredMaps.end()) {
91+
std::cerr << "[getNearest1D] Map ID " << mapID << " not found." << std::endl;
92+
return NAN;
93+
}
94+
95+
const auto& map = itMap->second;
96+
if (map.empty()) {
97+
std::cerr << "[getNearest1D] Map ID " << mapID << " is empty." << std::endl;
98+
return NAN;
99+
}
100+
101+
auto it = map.lower_bound(x);
102+
if (it == map.begin()) return it->second;
103+
if (it == map.end()) return std::prev(it)->second;
104+
105+
auto prev = std::prev(it);
106+
return (std::abs(prev->first - x) < std::abs(it->first - x)) ? prev->second : it->second;
107+
}
108+
109+
/// Convenience version: lookup by name
110+
float getNearest1DByName(float x, const std::string& mapName) {
111+
auto it = nameToMapID.find(mapName);
112+
if (it == nameToMapID.end()) {
113+
std::cerr << "[getNearest1DByName] Map name \"" << mapName << "\" not found." << std::endl;
114+
return NAN;
115+
}
116+
return getNearest1D(x, it->second);
117+
}
118+
119+
/// Example usage
120+
void example1D() {
121+
TFile *f = TFile::Open("timeSeries10000_apass5.root");
122+
TTree * tree0=(TTree*)f->Get("timeSeries");
123+
// Fill tree here or load from file
124+
int mapID = registerMap1DByName("dcar_vs_time", "time", "mTSITSTPC.mDCAr_A_NTracks_median", tree0, "subentry==127");
125+
tree0->SetAlias("mDCAr_A_NTracks_median_All" ,("getNearest1D(time, " + std::to_string(mapID) + ")").data());
126+
tree0->Draw("mTSITSTPC.mDCAr_A_NTracks_median:mDCAr_A_NTracks_median_All","indexType==1","",10000);
127+
}
128+
/* ------------------------------------------------------------------
129+
Statistics extension (non‑breaking) -------------------------------
130+
Added without changing previous API.
131+
132+
New options:
133+
• Enum‑based interface for better ROOT compatibility
134+
enum StatKind { kMean=0, kMedian=1, kStd=2 };
135+
float getStat(double x,int mapID,StatKind kind,double dx);
136+
137+
• Convenience thin wrappers for ROOT aliases
138+
getMean1D , getMedian1D , getStd1D
139+
140+
• cacheStat unchanged (uses strings internally)
141+
142+
------------------------------------------------------------------*/
143+
144+
#include <vector>
145+
#include <algorithm>
146+
#include <numeric>
147+
148+
// --- enum for faster numeric calls --------------------------------
149+
enum StatKind { kMean=0, kMedian=1, kStd=2 };
150+
151+
// Cache: stat → mapID → dx → (x → value)
152+
static std::map<int, std::map<double, std::map<double,float>>> cacheMean;
153+
static std::map<int, std::map<double, std::map<double,float>>> cacheMedian;
154+
static std::map<int, std::map<double, std::map<double,float>>> cacheStd;
155+
156+
static float _mean(const std::vector<float>& v){ return v.empty()?NAN:std::accumulate(v.begin(),v.end(),0.0f)/v.size(); }
157+
static float _median(std::vector<float> v){ if(v.empty()) return NAN; size_t n=v.size()/2; std::nth_element(v.begin(),v.begin()+n,v.end()); return v[n]; }
158+
static float _std(const std::vector<float>& v){ if(v.size()<2) return NAN; float m=_mean(v); double s2=0; for(float e:v){ double d=e-m; s2+=d*d;} return std::sqrt(s2/(v.size()-1)); }
159+
160+
//--------------------------------------------------------------------
161+
static float _computeStat(double x,int mapID,double dx,StatKind k){
162+
const auto itM=registeredMaps.find(mapID);
163+
if(itM==registeredMaps.end()||itM->second.empty()) return NAN;
164+
const auto &mp=itM->second;
165+
std::vector<float> buf;
166+
for(auto it=mp.lower_bound(x-dx); it!=mp.end()&&it->first<=x+dx; ++it) buf.push_back(it->second);
167+
if(buf.empty()) return NAN;
168+
switch(k){
169+
case kMean: return _mean(buf);
170+
case kMedian: return _median(buf);
171+
case kStd: return _std(buf);
172+
}
173+
return NAN;
174+
}
175+
176+
//--------------------------------------------------------------------
177+
/**
178+
* @brief Return a local statistic (mean / median / std) around a query point.
179+
*
180+
* This version is preferred inside **TTree::Draw** because it uses an enum
181+
* instead of a string literal.
182+
*
183+
* @param x Center of the window (same coordinate used in the cache)
184+
* @param mapID ID returned by registerMap1D / registerMap1DByName
185+
* @param kind kMean (0), kMedian (1) or kStd (2)
186+
* @param dx Half‑window size: the statistic is computed from all points
187+
* with X in [x − dx, x + dx]
188+
*
189+
* Internally the first request builds (and caches) a map x → stat(x)
190+
* for the given (mapID, dx, kind). Subsequent calls are O(log N).
191+
*/
192+
// Fast numeric interface (enum) ------------------------------------
193+
float getStat(double x,int mapID,StatKind kind,double dx){
194+
auto *pcache = (kind==kMean? &cacheMean : (kind==kMedian? &cacheMedian : &cacheStd));
195+
auto &byMap = (*pcache)[mapID];
196+
auto &byDx = byMap[dx];
197+
if(byDx.empty()){
198+
// build lazily for this dx
199+
const auto itM=registeredMaps.find(mapID);
200+
if(itM==registeredMaps.end()) return NAN;
201+
for(const auto &kv: itM->second){ double cx=kv.first; byDx[cx]=_computeStat(cx,mapID,dx,kind);} }
202+
const auto &statMap = byDx;
203+
auto it=statMap.lower_bound(x);
204+
if(it==statMap.begin()) return it->second;
205+
if(it==statMap.end()) return std::prev(it)->second;
206+
auto prev=std::prev(it);
207+
return (fabs(prev->first-x)<fabs(it->first-x)?prev->second:it->second);
208+
}
209+
210+
// String interface kept for backward compat.
211+
float getStat(double x,int mapID,const char* st,double dx){
212+
std::string s(st);
213+
if(s=="mean") return getStat(x,mapID,kMean ,dx);
214+
if(s=="median") return getStat(x,mapID,kMedian,dx);
215+
if(s=="std"||s=="sigma") return getStat(x,mapID,kStd,dx);
216+
std::cerr<<"[getStat] Unknown statType="<<s<<std::endl; return NAN;
217+
}
218+
219+
//--------------------------------------------------------------------
220+
// Convenience wrappers for ROOT Draw / numeric helpers -----------------
221+
inline float getMean1D (double x,int id,double dx){ return getStat(x,id,kMean ,dx);} // mean
222+
inline float getMedian1D(double x,int id,double dx){ return getStat(x,id,kMedian,dx);} // median
223+
inline float getStd1D (double x,int id,double dx){ return getStat(x,id,kStd ,dx);} // stddev
224+
225+
// Integer overload for ROOT (fix #2) --------------------------------
226+
inline float getStat(double x,int id,int kind,double dx){
227+
return getStat(x,id,static_cast<StatKind>(kind),dx);
228+
}
229+
230+
//--------------------------------------------------------------------
231+
// Pre‑cache requested stats (by enum) ------------------------------- (by enum) -------------------------------
232+
bool cacheStat(int mapID,const std::vector<std::string>& stats,double dx){
233+
for(const std::string &s:stats){
234+
if(s=="mean") getStat(0,mapID,kMean ,dx); // lazy build
235+
else if(s=="median") getStat(0,mapID,kMedian,dx);
236+
else if(s=="std"||s=="sigma") getStat(0,mapID,kStd,dx);
237+
}
238+
return true;
239+
}
240+
241+
//--------------------------------------------------------------------
242+
/// Example: statistics with enum wrappers
243+
void exampleStat1D(){
244+
TFile *f=TFile::Open("timeSeries10000_apass5.root");
245+
TTree *t=(TTree*)f->Get("timeSeries");
246+
int id = registerMap1DByName("dcar_time_stat","time","mTSITSTPC.mDCAr_A_NTracks_median",t,"subentry==127");
247+
248+
// Pre‑cache mean & std for ±200 window
249+
cacheStat(id,{"mean","std"},200);
250+
251+
// Use integer selector (0 = mean, 2 = std). This avoids any ROOT
252+
// overload ambiguity and works in TTree::Draw directly.
253+
t->SetAlias("dcar_mean", Form("getStat(time,%d,0,200)", id)); // 0 → kMean
254+
t->SetAlias("dcar_sigma", Form("getStat(time,%d,2,200)", id)); // 2 → kStd
255+
256+
t->Draw("mTSITSTPC.mDCAr_A_NTracks_median:dcar_mean","indexType==1","colz",10000);
257+
t->Draw("getStat(time,591487517, 0 ,10000+0):getStat(time,591487517, 1 ,10000+0)","indexType==1","colz",100000);
258+
}

0 commit comments

Comments
 (0)