feat(evaluation): Add Pareto-Optimal Evaluation#213
Conversation
4328af7 to
00dbac1
Compare
nasaul
left a comment
There was a problem hiding this comment.
The core Pareto logic looks algorithmically sound, but the PR needs some changes and add tests for ParetoFrontier
- Add evaluate and ParetoFrontier to __init__.py __all__ - Use narwhals native DataFrame filtering in ParetoFrontier - Properly extract evaluate() model columns in plot_pareto_2d - Add tests for ParetoFrontier
|
Thanks for the updates — the core Pareto dominance algorithm is correct and the narwhals-based approach in Bugs1. In the pareto_sorted = pareto_df.sort_values(metric_x) # pandas only
ax.scatter(pareto_df[metric_x], ...) # pandas-only indexing
for _, row in plot_df.iterrows(): # pandas onlyThe method signature accepts 2. plot_df["model"] = plot_df.index.astype(str)This modifies the passed-in DataFrame in place when it's pandas. Use Design Issues3.
4. Confusing In the else:
models = metrics # misleading: metrics is used as model names hereA user will naturally pass metric names like Minor
|
…r naming - Rewrite plot_pareto_2d to use narwhals/numpy instead of pandas-only methods - Stop mutating caller-provided DataFrame in plot_pareto_2d - Remove evaluate from package-level __init__.py exports (unintended API change) - Add model_subset parameter to find_non_dominated, separate from metrics - Add ParetoFrontier to evaluation.py __all__ - Fix PEP 8 blank lines before ParetoFrontier class - Remove stray debug comment in test
dc0f69b to
9052afd
Compare
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
Hello @ankitlade12 thanks for the update on this contribution. I've cleaned up a little bit the implementation and added a tutorial. Let me know if this tutorial is what you had in mind for the ParetoFrontier functionality. If your okay, we can merge the PR. |
|
Hi @nasaul — thanks a lot for taking the time to clean this up and put together the tutorial! The refactor is a clear improvement over what I had:
The tutorial is exactly the kind of walkthrough I had in mind — it motivates the multi-metric problem clearly, shows the standard All good on my end — happy to merge whenever you are. Thanks again! |
nasaul
left a comment
There was a problem hiding this comment.
Thanks for this contribution @ankitlade12!
Description
This pull request introduces multi-objective evaluation capabilities to
utilsforecast. It adds a robustParetoFrontierclass directly withinevaluation.py, providing a model-agnostic and dataframe-agnostic way to identify the best-performing models across conflicting metrics (e.g., minimizing RMSE while minimizing MAE, or minimizing latency while maximizing accuracy).Since
utilsforecastacts as the foundational evaluation layer for Nixtla's ecosystem, integrating Pareto selection natively enables downstream libraries (likemlforecastandstatsforecast) to leverage model multi-objective benchmarking out-of-the-box using the standard output ofevaluate().Key Changes
ParetoFrontierclass inevaluation.py: Includesis_dominatedmathematically validated bounding logic and exposedfind_non_dominatedroutines.plot_pareto_2d()to visually inspect the trade-off frontier. Matplotlib is lazily imported and handled gracefully with an explicitwarnings.warnif missing, addressing reviewer feedback to avoid rawprintstatements.AnyDFType): Ensurespandasandpolarsarrays are passed cleanly through the mathematical logic, addressing previous maintainer concerns surrounding hardpandasdependencies.__init__.py: IntegratedevaluateandParetoFrontierinto__all__for easy top-level access (from utilsforecast import ParetoFrontier).Example Usage