You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Added new "Performance & Benchmarking" section describing benchmark usage, results, and interpretation
- Included CLion-compatible Markdown tables for output columns, example results, and recommendations
- Documented benchmark command line and sample outputs (50k rows / 10k groups)
- Clarified how sigmaCut and parallelization affect runtime
- Minor formatting and readability improvements across the file
* The **OLS path** scales linearly with group count.
165
+
***Parallelization** provides 4–5× acceleration for thousands of small groups.
166
+
* Current synthetic *y‑only* outliers do **not** trigger re‑fitting overhead.
167
+
* Real‑data slowdowns (up to 25×) occur when **sigmaCut** forces iterative robust refits.
139
168
140
-
In the `make_parallel_fit` and `make_linear_fit` functions, the `cast_dtype` parameter ensures consistent numeric precision for slope, intercept, and error terms. This is useful for long pipelines or for memory-sensitive applications.
A future extension will introduce **leverage‑outlier** generation (outliers in X and Y) to replicate the observed 25× slowdown and allow comparative testing of different robust fitters.
181
+
182
+
## Tips
183
+
184
+
💡 Use `cast_dtype='float16'` for storage savings, but ensure it is compatible with downstream numerical precision requirements.
185
+
186
+
### Usage Example for `cast_dtype`
141
187
142
188
```python
143
189
import pandas as pd
@@ -146,24 +192,24 @@ from dfextensions.groupby_regression import GroupByRegressor
0 commit comments