Skip to content

Commit 8625857

Browse files
author
miranov25
committed
docs(quantile_fit_nd): add contextLLM.md (cold-start guide + policies)
- One-page snapshot of goals, assumptions, API, commands - Documents discrete-input policy (PIT/mid-rank) and monotonicity - Links code, tests, and benchmark usage with scaling expectations PWGPP-643
1 parent 1b2ed00 commit 8625857

File tree

2 files changed

+101
-1
lines changed

2 files changed

+101
-1
lines changed

UTILS/dfextensions/quantile_fit_nd/bench_quantile_fit_nd.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -314,7 +314,8 @@ def main():
314314
# Plots
315315
if args.plot:
316316
_plot_scaling(res, dists)
317-
print("Saved PNG plots: bench_scaling_{dist}.png")
317+
print("Saved PNG plots:", ", ".join(f"bench_scaling_{d}.png" for d in dists))
318+
318319

319320
# Checks (warn by default; --strict to raise)
320321
for dist in dists:
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
# contextLLM.md — ND Quantile Linear Fit (quick context)
2+
3+
## TL;DR
4+
5+
We fit a **local linear inverse quantile model** per channel and nuisance grid:
6+
[
7+
X(q,n) \approx a(q_0,n) + b(q_0,n),\underbrace{(q - q_0)}_{\Delta q},\quad b>0
8+
]
9+
10+
* Monotonic in **q** via (b \gt b_\text{min}).
11+
* Smooth in nuisance axes (e.g., **z**, later **η**, **time**) via separable interpolation.
12+
* **Discrete inputs** (tracks/clusters/Poisson): convert to **continuous ranks** (PIT or mid-ranks) *before* fitting.
13+
14+
## Key Files
15+
16+
* `dfextensions/quantile_fit_nd/quantile_fit_nd.py` — core fitter + evaluator
17+
* `dfextensions/quantile_fit_nd/utils.py` — discrete→uniform helpers (PIT/mid-rank)
18+
* `dfextensions/quantile_fit_nd/test_quantile_fit_nd.py` — unit tests + rich diagnostics
19+
* `dfextensions/quantile_fit_nd/bench_quantile_fit_nd.py` — speed & precision benchmark, scaling plots
20+
* `dfextensions/quantile_fit_nd/quantile_fit_nd.md` — full spec (math, API, guarantees)
21+
22+
## Core Assumptions & Policies
23+
24+
* **Δq-centered OLS** per window (|Q-q_0|\le \Delta q), default (\Delta q=0.05).
25+
* **Monotonicity**: enforce (b \ge b_\text{min}) (configurable; “auto” heuristic or fixed).
26+
* **Nuisance interpolation**: separable (linear now; PCHIP later); only q must be monotone.
27+
* **Discrete inputs**:
28+
29+
* Prefer **randomized PIT**: (U=F(k!-!1)+V,[F(k)-F(k!-!1)]), (V\sim\text{Unif}(0,1)).
30+
* Or **mid-ranks**: (U=\tfrac{F(k!-!1)+F(k)}{2}) (deterministic).
31+
* Helpers: `discrete_to_uniform_rank_poisson`, `discrete_to_uniform_rank_empirical`.
32+
* **Uncertainty**: (\sigma_Q \approx \sigma_{X|Q}/|b|). Irreducible vs reducible split available downstream.
33+
34+
## Public API (stable)
35+
36+
```python
37+
from dfextensions.quantile_fit_nd.quantile_fit_nd import fit_quantile_linear_nd, QuantileEvaluator
38+
39+
table = fit_quantile_linear_nd(
40+
df, # columns: channel_id, Q, X, nuisance cols (e.g. z_vtx), is_outlier (optional)
41+
channel_key="channel_id",
42+
q_centers=np.arange(0, 1.0001, 0.025),
43+
dq=0.05,
44+
nuisance_axes={"z": "z_vtx"}, # later: {"z":"z_vtx","eta":"eta","time":"timestamp"}
45+
n_bins_axes={"z": 20},
46+
mask_col="is_outlier",
47+
b_min_option="auto", # or "fixed"
48+
)
49+
50+
evalr = QuantileEvaluator(table)
51+
q_hat = evalr.invert_rank(X=123.0, channel_id="ch0", z=1.2)
52+
a, b, sigmaQ = evalr.params(channel_id="ch0", q=0.4, z=0.0)
53+
```
54+
55+
### Output table (columns)
56+
57+
`channel_id, q_center, <axis>_center..., a, b, sigma_Q, sigma_Q_irr (optional), dX_dN (optional), db_d<axis>..., fit_stats(json), timestamp(optional)`
58+
59+
## Quickstart (clean run)
60+
61+
```bash
62+
# 1) Unit tests with diagnostics
63+
pytest -q -s dfextensions/quantile_fit_nd/test_quantile_fit_nd.py
64+
65+
# 2) Benchmark speed + precision + scaling (and plots)
66+
python dfextensions/quantile_fit_nd/bench_quantile_fit_nd.py --plot \
67+
--dists uniform,poisson,gaussian --Ns 2000,5000,10000,20000,50000 --lam 50
68+
```
69+
70+
* **Interpretation**: `rms_b ~ N^{-1/2}` (α≈−0.5); `rms_rt ~ const` (α≈0) because round-trip error is per-event.
71+
72+
## Reproducibility knobs
73+
74+
* RNG seed fixed in tests/bench (`RNG = np.random.default_rng(123456)`).
75+
* Poisson rank mode: randomized PIT (default) vs mid-rank (deterministic) — switch in test/bench helpers.
76+
* Scaling tolerances (`--scaling_tol`, `--rt_tol`) in the benchmark.
77+
78+
## Known Limitations
79+
80+
* Very edge q windows (near 0 or 1) can be data-sparse; we store fit_stats and may skip non-informative windows.
81+
* With extremely discrete/uniform ranks (without PIT), OLS degenerate: fitter will flag `low_Q_spread`.
82+
* Current interpolation is linear; PCHIP (shape-preserving) can be enabled later.
83+
* Inversion uses a stable linear local model and bracketing; works inside grid, clips at edges.
84+
85+
## Next Steps (nice-to-have)
86+
87+
* Optional robust fit (`fit_mode="huber"`), once outlier flags stabilize.
88+
* Add time as a nuisance axis or do time-sliced parallel fits + chain.
89+
* Export ROOT trees consistently (Parquet/Arrow already supported).
90+
* Add ML-friendly derivative grids (db/dz, db/dη) at higher resolution.
91+
92+
## Troubleshooting
93+
94+
* **ImportError in tests**: ensure `dfextensions/quantile_fit_nd/__init__.py` exists and you run from repo root.
95+
* **.idea committed**: add `.idea/` to repo-level `.gitignore` to avoid IDE noise.
96+
* **Poisson looks “nonsense”**: confirm PIT/mid-rank preprocessing of counts before calling `fit_*`.
97+
98+
---
99+

0 commit comments

Comments
 (0)