Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
3b962d7
Initial commit generated by copilot
AndrewEdmonds11 Apr 1, 2026
48ed70d
An example macro
AndrewEdmonds11 Apr 1, 2026
2026c40
Added compilation step with copilot
AndrewEdmonds11 Apr 1, 2026
b2688c8
Manual edit to add include paths
AndrewEdmonds11 Apr 1, 2026
3f4c4f5
Now runs a ROOT macro that is compiled
AndrewEdmonds11 Apr 1, 2026
60b95f6
Add a first_filestem pattern with copilot
AndrewEdmonds11 Apr 1, 2026
019671f
Add n threads parameter
AndrewEdmonds11 Apr 1, 2026
b19f770
Add auto-detection of compilation skipping
AndrewEdmonds11 Apr 1, 2026
6887fb1
Add a max files parameter
AndrewEdmonds11 Apr 1, 2026
bcf1e4e
Flag missing files errors as errors from jobs
AndrewEdmonds11 Apr 1, 2026
3ed2df9
Adds an hadd option to produce a merged file
AndrewEdmonds11 Apr 1, 2026
a75a336
Have the individual job files removed when using hadd and also add op…
AndrewEdmonds11 Apr 1, 2026
2e07d9b
Update README
AndrewEdmonds11 Apr 1, 2026
5ec1b65
Automatically try to evenly distribute files across workers
AndrewEdmonds11 Apr 1, 2026
5a03459
Update README and add copilot-instructions
AndrewEdmonds11 Apr 1, 2026
41e2abe
Rename script to roodask
AndrewEdmonds11 Apr 1, 2026
fcd29b2
Add roodask into rooutil/roodask/
AndrewEdmonds11 Apr 1, 2026
a1ebd56
Make it so that roodask is a command line tool in the path
AndrewEdmonds11 Apr 1, 2026
d2ab8cd
Add force compilation option and also fix paths
AndrewEdmonds11 Apr 1, 2026
5c27e09
Update the text output of roodask
AndrewEdmonds11 Apr 1, 2026
1708220
Put hadded file into output/ dir
AndrewEdmonds11 Apr 1, 2026
b504ebd
Update READMEs
AndrewEdmonds11 Apr 1, 2026
45393aa
Allow for using ROOT macros in the example folder
AndrewEdmonds11 Apr 1, 2026
f849408
Updating README
AndrewEdmonds11 Apr 1, 2026
dde4c27
Allow for arguments to be passed to binaries
AndrewEdmonds11 Apr 2, 2026
041e573
Thought we had fixed this
AndrewEdmonds11 Apr 2, 2026
c5efbc4
Add a couple of example manifests
AndrewEdmonds11 Apr 2, 2026
04298da
Update readme
AndrewEdmonds11 Apr 2, 2026
19a03a6
Update these READMEs too
AndrewEdmonds11 Apr 2, 2026
830d016
Updating README
AndrewEdmonds11 Apr 2, 2026
83a604c
Add an optional post-processing step
AndrewEdmonds11 Apr 2, 2026
3fb54b0
Update README
AndrewEdmonds11 Apr 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .muse
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ ROOT_LIBRARY_PATH

PYTHONPATH utils
PYTHONPATH helper
PYTHONPATH rooutil/roodask/
PATH bin
6 changes: 6 additions & 0 deletions bin/roodask
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/usr/bin/env python3

from roodask import main

if __name__ == "__main__":
main()
54 changes: 51 additions & 3 deletions rooutil/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,10 @@
7. [Common Cut Functions](#Common-Cut-Functions)
8. [Combining Cut Function](#Combining-Cut-Functions)
9. [Creating Ntuples From EventNtuple](#Creating-Ntuples-From-EventNtuple)
10. [Speed Optimizations](#Speed-Optimizations)
11. [Debugging](#Debugging)
12. [For Developers](#For-Developers)
10. [```roodask```](#roodask)
11. [Speed Optimizations](#Speed-Optimizations)
12. [Debugging](#Debugging)
13. [For Developers](#For-Developers)

## Introduction

Expand Down Expand Up @@ -212,6 +213,53 @@ If you want to also remove tracks from the event, you should use ```SelectTracks
### Non event-based ntuples
It's also possible to use RooUtil to create a new ntuple with a different structure (e.g. one entry per track). See an example in [CreateTrackNtuple.C](./examples/CreateTrackNtuple.C) for how this can be done

## ```roodask```
```roodask``` allows you to run a RooUtil-based macro or program in parallel over multiple files using dask. Here we present two examples: running a ROOT macro from the examples folder and running the RooCount reference analysis.

### Setting Up

We need to set up the environment and get a filelist:
```
cd /to/your/work/area/
mu2einit
muse setup AnalysisMusingMDC2025
pyenv ana

mkdir filelists
metacat query files from mu2e:nts.mu2e.ensembleMDS3aMix1BBTriggered.MDC2025-001.root | mdh print-url -l disk -s path - > filelists/nts.mu2e.ensembleMDS3aMix1BBTriggered.MDC2025-001.root.list
```

You will also need to write a "manifest" JSON file. We will use the following examples:
* for running a ROOT macro use [this manifest file](./roodask/macro_manifest.json)
* for running a program use [this manifest file](./roodask/refana_manifest.json)

### Example 1: Running a ROOT Macro

This will run the example macro: PlotEntranceMomentumResolution_roodask.C

```
roodask --manifest EventNtuple/rooutil/roodask/macro_manifest.json --filelist filelists/nts.mu2e.ensembleMDS3aMix1BBTriggered.MDC2025-001.root.list --n-workers 2 --threads-per-worker 1 --max-files=3
```

This will produce an ```output/``` directory with an output file per job
* you can add the ```--hadd merged.root``` option if you want an hadded output (if you want to run ```hadd``` with multi-threading add ```--hadd-j``` option
* you can run a program on the hadded file with the ```--post-hadd``` option (see Example 2 for a concrete example)

### Example 2: Running a Program

This will run the RooCount RefAna:

```
roodask --manifest EventNtuple/rooutil/roodask/refana_manifest.json --filelist filelist.txt --n-workers 2 --threads-per-worker 1 --max-files=3 --hadd merged.root --post-hadd 'RooCountAna {merged} MDS3a'
```

This will produce a single output file (```output/merged.root```) as well as run the ```RooCountAna``` program on the merged output

### Additional Information:
* There is a dedicated [README](./roodask/README.md) for technical details
* There is a ```--scheduler``` option that will be important when we get things up and running on EAF


## Speed Optimizations
By default, RooUtil will read all the branches for every entry. If you are finding that this is too slow, then you can explicity turn on only the branches that you will be reading. This can increase the speed by as much as a factor of 10.

Expand Down
50 changes: 50 additions & 0 deletions rooutil/examples/PlotEntranceMomentumResolution_roodask.C
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
//
// An example of how to plot the momentum of electrons at the tracker entrance
// This uses cut functions defined in common_cuts.hh
//

#include "EventNtuple/rooutil/inc/RooUtil.hh"
#include "EventNtuple/rooutil/inc/common_cuts.hh"

#include "TH1F.h"

using namespace rooutil;
void PlotEntranceMomentumResolution_roodask(std::string filename, std::string outfilename) {

// Create the histogram you want to fill
TH1F* hRecoMomRes = new TH1F("hRecoMomRes", "Momentum Resolution at Tracker Entrance", 200,-10,10);
hRecoMomRes->SetDirectory(0);

// Set up RooUtil
RooUtil util(filename);

// Loop through the events
for (int i_event = 0; i_event < util.GetNEvents(); ++i_event) {
// Get the next event
auto& event = util.GetEvent(i_event);

// Get the e_minus tracks from the event
auto e_minus_tracks = event.GetTracks(is_e_minus);

// Loop through the e_minus tracks
for (auto& track : e_minus_tracks) {

// Get the track segments at the tracker entrance and has an MC step
auto trk_ent_segments = track.GetSegments([](TrackSegment& segment){ return tracker_entrance(segment) && has_mc_step(segment) && has_reco_step(segment); });

// Loop through the tracker entrance track segments
for (auto& segment : trk_ent_segments) {

// Fill the histogram
hRecoMomRes->Fill(segment.trkseg->mom.R() - segment.trksegmc->mom.R());
}
}
}

// Draw the histogram
// hRecoMomRes->Draw("HIST E");
TFile* outfile = new TFile(outfilename.c_str(), "RECREATE");
hRecoMomRes->Write();
outfile->Write();
outfile->Close();
}
106 changes: 106 additions & 0 deletions rooutil/roodask/.github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Copilot Instructions — roodask (Dask Distributed C++ Job Runner)

## Project Overview

This is a Python + C++ project for distributing ROOT macro execution across
Dask workers on a shared filesystem (Mu2e experiment at Fermilab).

**Main script:** `roodask.py` — a single-file Python CLI tool that:
1. Reads a JSON manifest (`jobs.json`) defining the C++ source, include paths,
libraries, and output pattern.
2. Reads a filelist (text file, one ROOT file per line).
3. Auto-generates a `main()` C++ wrapper, compiles the ROOT macro into a
standalone binary using `g++` + `root-config`.
4. Distributes input files across Dask workers (local or remote cluster).
5. Each worker runs the compiled binary via `subprocess`, passing a per-job
filelist and output path.
6. Collects results (stdout, stderr, returncode, timing) into `results.json`.
7. Optionally merges output ROOT files with `hadd`.
8. Optionally runs a post-hadd command on the merged file (`--post-hadd`).

## Environment

- **OS:** Scientific Linux / RHEL on Fermilab machines
- **Filesystem:** Shared NFS (`/exp/mu2e/app/...`) — all workers see the same files
- **Python:** 3.12+ from `/cvmfs/mu2e.opensciencegrid.org/env/ana/current/`
- **Key packages:** `dask`, `distributed` (2026.1.x), `uproot` (available but not used here)
- **ROOT:** Available via Mu2e environment setup (provides `root-config`, `hadd`, etc.)
- **Compiler:** `g++` from PATH or `$CXX` env var; ROOT include/lib paths from `root-config`
- **Important env vars:** `ROOT_INCLUDE_PATH` (colon-separated), `LD_LIBRARY_PATH`

## Architecture Details

### Compilation (`compile_source()`)
- Generates `work/<MacroName>_main.cpp` that `#include`s the macro using the
exact name from the manifest `"source"` field and calls its entry function
(name = file stem) with `argv[1]` (filelist) and `argv[2]` (output).
The macro must be findable via `include_dirs`.
- `source` path resolves from **cwd** (not manifest directory).
- Uses `$CXX` or `g++`, prints compiler version before building.
- `root-config --cflags --libs` provides ROOT flags.
- `LD_LIBRARY_PATH` entries are added as `-L` and `-Wl,-rpath` flags.
- `-Wl,--enable-new-dtags` ensures RUNPATH (not RPATH) so `LD_LIBRARY_PATH` wins at runtime.
- `-ltbb` is added (required by ROOT's libImt, not in `root-config --libs`).
- `include_dirs` values are split on `:` (os.pathsep) for colon-separated env vars.
- **Incremental:** only recompiles when source mtime > binary mtime.
- **Force recompile:** `--force-compile` skips the mtime check and always recompiles.
Prints a distinct message ("Forcing recompilation (--force-compile)") vs the
timestamp-triggered message ("Source is newer than binary, recompiling...").

### Path Resolution
- `source` from manifest: relative paths resolve from **cwd** (not manifest directory).
- `output_dir` from manifest: relative paths resolve from **cwd** (not manifest directory).
- `--work-dir`: relative paths resolve from **cwd**.
- `--hadd` target: relative paths resolve into the output directory.
- All printed paths and paths in `results.json` are absolute.

### Job Execution (`run_cpp_job()`)
- Each job receives a batch of input files, writes them to `work/<job_id>_filelist.txt`.
- If manifest has `"args"`, builds command as `<binary> [args...]` with `{filelist}` and
`{output}` placeholders substituted per job.
- If no `"args"`, defaults to: `<binary> <filelist_path> <output_file>`
- The submitting shell's full environment is captured and passed to workers via `env=`.
- Success = returncode 0 AND no "Error" in stderr (catches ROOT silent failures).

### Manifest (`jobs.json`)
- All string values undergo `${VAR}` expansion via `os.path.expandvars()` at load time.
- `output_pattern` supports `{job_id}` and `{first_filestem}` placeholders.
- `"args"` field: list of strings with `{filelist}` and `{output}` placeholders,
allowing the user to control the full command-line argument order.
- `config_template` field was removed — the script generates the wrapper automatically.

### File Batching
- If `--files-per-job` is given, uses that exact batch size.
- If omitted, auto-distributes: `ceil(n_files / (n_workers * threads_per_worker))`.

## Key Design Decisions

1. **Single file script** — `roodask.py` is intentionally one file for easy copying/sharing.
2. **No config template needed** — the main() wrapper is auto-generated, not templated.
3. **Binary invocation:** Default is `<binary> <filelist> <output>`. If `"args"` is set
in the manifest, the user controls the full argument list with `{filelist}` and
`{output}` placeholders (e.g. `["--input", "{filelist}", "--out", "{output}", "flag"]`).
4. **Shared filesystem assumed** — workers read/write the same paths as the submitter.
5. **Environment propagation** — `os.environ` is serialized and passed to each worker
subprocess to ensure consistent library resolution.

## Common Tasks

- **Adding a new CLI flag:** Edit `parse_args()`, then use `args.<flag>` in `main()`.
- **Changing the binary interface:** Edit the wrapper template in `compile_source()`
(the `wrapper_path.write_text(...)` block) and update `run_cpp_job()` subprocess call.
- **Adding post-processing:** Add after the post-hadd section in `main()`, before `client.close()`.
- **Supporting a new manifest field:** Add to `jobs.json`, access via `manifest.get("field")`.

## Files

| File | Purpose |
|-------------------------|------------------------------------------------------|
| `roodask.py` | Main script (compilation, batching, execution, merge)|
| `jobs.json` | Manifest (source, includes, output pattern) |
| `filelist.txt` | Input file paths (one per line) |
| `PlotEntranceMomentum.C`| Example ROOT macro |
| `README.md` | User-facing documentation |
| `work/` | Generated: compiled binary, wrapper, per-job filelists|
| `output/` | Generated: per-job output ROOT files |
| `results.json` | Generated: job outcomes (success, timing, stderr) |
49 changes: 49 additions & 0 deletions rooutil/roodask/PlotEntranceMomentum.C
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
//
// An example of how to plot the momentum of electrons at the tracker entrance
// This uses cut functions defined in common_cuts.hh
//

#include "EventNtuple/rooutil/inc/RooUtil.hh"
#include "EventNtuple/rooutil/inc/common_cuts.hh"

#include "TH1F.h"

using namespace rooutil;
void PlotEntranceMomentum(std::string filename, std::string outfilename) {

// Create the histogram you want to fill
TH1F* hRecoMom = new TH1F("hRecoMom", "Reconstructed Momentum at Tracker Entrance", 50,95,110);
hRecoMom->SetDirectory(0);

// Set up RooUtil
RooUtil util(filename);

// Loop through the events
for (int i_event = 0; i_event < util.GetNEvents(); ++i_event) {
// Get the next event
auto& event = util.GetEvent(i_event);

// Get the e_minus tracks from the event
auto e_minus_tracks = event.GetTracks(is_e_minus);

// Loop through the e_minus tracks
for (auto& track : e_minus_tracks) {

// Get the track segments at the tracker entrance
auto trk_ent_segments = track.GetSegments([](TrackSegment& segment){ return tracker_entrance(segment) && has_reco_step(segment); });

// Loop through the tracker entrance track segments
for (auto& segment : trk_ent_segments) {

// Fill the histogram
hRecoMom->Fill(segment.trkseg->mom.R());
}
}
}

// Draw the histogram
// hRecoMom->Draw("HIST E");
TFile* outfile = new TFile(outfilename.c_str(), "RECREATE");
hRecoMom->Write();
outfile->Close();
}
Loading