feat(evaluation): Add Pareto-Optimal Evaluation by ankitlade12 · Pull Request #213 · Nixtla/utilsforecast

ankitlade12 · 2026-03-02T05:14:48Z

Description

This pull request introduces multi-objective evaluation capabilities to utilsforecast. It adds a robust ParetoFrontier class directly within evaluation.py, providing a model-agnostic and dataframe-agnostic way to identify the best-performing models across conflicting metrics (e.g., minimizing RMSE while minimizing MAE, or minimizing latency while maximizing accuracy).

Since utilsforecast acts as the foundational evaluation layer for Nixtla's ecosystem, integrating Pareto selection natively enables downstream libraries (like mlforecast and statsforecast) to leverage model multi-objective benchmarking out-of-the-box using the standard output of evaluate().

Key Changes

Added ParetoFrontier class in evaluation.py: Includes is_dominated mathematically validated bounding logic and exposed find_non_dominated routines.
Built-in 2D Plotting: Implemented plot_pareto_2d() to visually inspect the trade-off frontier. Matplotlib is lazily imported and handled gracefully with an explicit warnings.warn if missing, addressing reviewer feedback to avoid raw print statements.
Dataframe Agnosticism (AnyDFType): Ensures pandas and polars arrays are passed cleanly through the mathematical logic, addressing previous maintainer concerns surrounding hard pandas dependencies.
Exposed in __init__.py: Integrated evaluate and ParetoFrontier into __all__ for easy top-level access (from utilsforecast import ParetoFrontier).

Example Usage

from utilsforecast.evaluation import evaluate, ParetoFrontier
from utilsforecast.losses import mae, rmse

# 1. Run standard evaluation
performance_df = evaluate(cv_results, metrics=[mae, rmse], agg_fn="mean")

# 2. Get Pareto Optimal Models
pareto_optimal_df = ParetoFrontier.find_non_dominated(performance_df)

# 3. Visualize Trade-offs
ax = ParetoFrontier.plot_pareto_2d(performance_df, metric_x='rmse', metric_y='mae')

CLAassistant · 2026-03-02T05:14:55Z

All committers have signed the CLA.

nasaul

The core Pareto logic looks algorithmically sound, but the PR needs some changes and add tests for ParetoFrontier

- Add evaluate and ParetoFrontier to __init__.py __all__ - Use narwhals native DataFrame filtering in ParetoFrontier - Properly extract evaluate() model columns in plot_pareto_2d - Add tests for ParetoFrontier

nasaul · 2026-04-05T21:54:02Z

Thanks for the updates — the core Pareto dominance algorithm is correct and the narwhals-based approach in find_non_dominated is the right direction. There are a few bugs and design issues that need to be addressed before merging.

Bugs

1. plot_pareto_2d breaks polars (hard pd.DataFrame dependency)

In the "metric" column branch, a plain pd.DataFrame is constructed internally, and then pandas-only methods are called on the result of find_non_dominated:

pareto_sorted = pareto_df.sort_values(metric_x)  # pandas only
ax.scatter(pareto_df[metric_x], ...)              # pandas-only indexing
for _, row in plot_df.iterrows():                 # pandas only

The method signature accepts AnyDFType but breaks silently for polars input. Either convert to pandas explicitly at the start of the plotting method, or use narwhals throughout.

2. plot_pareto_2d mutates a caller-provided DataFrame

plot_df["model"] = plot_df.index.astype(str)

This modifies the passed-in DataFrame in place when it's pandas. Use .copy() or .assign() instead.

Design Issues

3. __init__.py unexpectedly exports evaluate at package level

evaluate was not previously exported from utilsforecast.__init__. Adding it here is an unintended API surface change and causes eager import of all of evaluation.py on every import utilsforecast. Only ParetoFrontier (if desired) should be added.

4. Confusing metrics parameter in find_non_dominated

In the "metric" column branch (evaluate() output format), when metrics is passed it is actually used as model names, not metric names:

else:
    models = metrics  # misleading: metrics is used as model names here

A user will naturally pass metric names like ["rmse", "mae"] and get unexpected behavior. Please rename the parameter (e.g., model_subset) or document this clearly, and consider whether the evaluate() format branch even needs this override.

Minor

evaluation.py module-level __all__ still only has ['evaluate'] — add ParetoFrontier.
Missing two blank lines before the ParetoFrontier class definition (PEP 8).
Test has a stray # wait: debug comment that should be removed.

…r naming - Rewrite plot_pareto_2d to use narwhals/numpy instead of pandas-only methods - Stop mutating caller-provided DataFrame in plot_pareto_2d - Remove evaluate from package-level __init__.py exports (unintended API change) - Add model_subset parameter to find_non_dominated, separate from metrics - Add ParetoFrontier to evaluation.py __all__ - Fix PEP 8 blank lines before ParetoFrontier class - Remove stray debug comment in test

review-notebook-app · 2026-04-24T01:44:05Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

nasaul · 2026-04-24T01:45:28Z

Hello @ankitlade12 thanks for the update on this contribution. I've cleaned up a little bit the implementation and added a tutorial. Let me know if this tutorial is what you had in mind for the ParetoFrontier functionality. If your okay, we can merge the PR.

ankitlade12 · 2026-04-24T19:54:19Z

Hi @nasaul — thanks a lot for taking the time to clean this up and put together the tutorial!

The refactor is a clear improvement over what I had:

Moving ParetoFrontier into its own model_selection.py module is the right call — it keeps evaluation.py focused and gives the functionality a more natural home.
The vectorized is_dominated (broadcasting over the full (n, m) matrix) is much cleaner and faster than my per-model loop.
Using nw.from_dict(..., backend=nw.get_native_namespace(...)) in plot_pareto_2d is a nicer way to keep polars input as polars all the way through — better than my workaround.
Adding id_col / cutoff_col parameters to find_non_dominated is a good catch for users who use non-default column names in evaluate().
The two extra tests (test_plot_pareto_2d_no_model_column, test_plot_pareto_2d_polars) cover cases I had missed.

The tutorial is exactly the kind of walkthrough I had in mind — it motivates the multi-metric problem clearly, shows the standard evaluate() → ParetoFrontier flow, and the CV section is a nice bonus.

All good on my end — happy to merge whenever you are. Thanks again!

nasaul

Thanks for this contribution @ankitlade12!

feat(evaluation): Add Pareto-Optimal Evaluation

00dbac1

ankitlade12 force-pushed the feat/multi-objective-eval branch from 4328af7 to 00dbac1 Compare March 4, 2026 07:58

nasaul requested changes Mar 9, 2026

View reviewed changes

Comment thread utilsforecast/evaluation.py Outdated

Comment thread utilsforecast/__init__.py Outdated

Comment thread utilsforecast/evaluation.py Outdated

fix(pareto): address PR review comments

d70f884

- Add evaluate and ParetoFrontier to __init__.py __all__ - Use narwhals native DataFrame filtering in ParetoFrontier - Properly extract evaluate() model columns in plot_pareto_2d - Add tests for ParetoFrontier

ankitlade12 requested a review from nasaul March 10, 2026 04:26

ankitlade12 added 4 commits March 11, 2026 16:18

chore: fix ruff import order

2ff1c85

Merge branch 'main' into feat/multi-objective-eval

40a22c5

Merge branch 'main' into feat/multi-objective-eval

c14cf60

Merge branch 'main' into feat/multi-objective-eval

b0b0737

ankitlade12 added 2 commits April 8, 2026 09:08

Merge branch 'main' into feat/multi-objective-eval

0a8fe56

ankitlade12 force-pushed the feat/multi-objective-eval branch from dc0f69b to 9052afd Compare April 8, 2026 14:15

nasaul added 5 commits April 23, 2026 16:41

Merge branch 'main' into feat/multi-objective-eval

a3fc420

Addresses code quality

546be43

Adds extra columns in find_non_dominated

22fb512

Refactor

2fedae0

Update docs

0692a4f

nasaul approved these changes Apr 24, 2026

View reviewed changes

nasaul merged commit 22ae12e into Nixtla:main Apr 24, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(evaluation): Add Pareto-Optimal Evaluation#213

feat(evaluation): Add Pareto-Optimal Evaluation#213
nasaul merged 13 commits intoNixtla:mainfrom
ankitlade12:feat/multi-objective-eval

ankitlade12 commented Mar 2, 2026

Uh oh!

CLAassistant commented Mar 2, 2026 •

edited

Loading

Uh oh!

nasaul left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nasaul commented Apr 5, 2026

Uh oh!

review-notebook-app Bot commented Apr 24, 2026

Uh oh!

nasaul commented Apr 24, 2026

Uh oh!

ankitlade12 commented Apr 24, 2026

Uh oh!

nasaul left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ankitlade12 commented Mar 2, 2026

Description

Key Changes

Example Usage

Uh oh!

CLAassistant commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nasaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nasaul commented Apr 5, 2026

Bugs

Design Issues

Minor

Uh oh!

review-notebook-app Bot commented Apr 24, 2026

Uh oh!

nasaul commented Apr 24, 2026

Uh oh!

ankitlade12 commented Apr 24, 2026

Uh oh!

nasaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Mar 2, 2026 •

edited

Loading