Skip to content
This repository was archived by the owner on Mar 30, 2026. It is now read-only.
This repository was archived by the owner on Mar 30, 2026. It is now read-only.

Tagging system, autoplot, and benchmark helpers for work-precision diagrams #177

@ChrisRackauckas-Claude

Description

@ChrisRackauckas-Claude

Motivation

The exemplary benchmarks like linear_wpd, allen_cahn_fdm_wpd, and BCR follow a pattern of within-family and between-family comparisons (e.g., all Rosenbrock methods, all 5th order methods, then best-of-each-family comparisons). Currently, achieving this requires significant manual boilerplate in each benchmark file: creating separate WorkPrecisionSets per family, manually selecting best methods, repeating tolerance ranges and error modes, and manually composing comparison plots.

This issue proposes a set of infrastructure improvements to DiffEqDevTools that would:

  1. Make it trivial to tag methods and generate family/cross-family comparison plots
  2. Run all methods once and generate many views from the same data
  3. Automatically identify "best of family" methods for cross-family comparisons
  4. Support interactive plots for dense comparison diagrams
  5. Eventually simplify SciMLBenchmarks code significantly via these helpers

Planned Features

Phase 1: Core tagging infrastructure

Add a tags field to WorkPrecision:

mutable struct WorkPrecision
    # ... existing fields ...
    tags::Vector{Symbol}  # NEW: e.g., [:rosenbrock, :stiff, :4th_order, :autodiff]
end

Tags are specified via the setups dict (backward compatible — no tags = empty vector):

setups = [
    Dict(:alg => Rosenbrock23(), :tags => [:rosenbrock, :2nd_order, :stiff]),
    Dict(:alg => Rodas5P(),      :tags => [:rosenbrock, :5th_order, :stiff]),
    Dict(:alg => TRBDF2(),       :tags => [:bdf, :2nd_order, :stiff]),
    Dict(:alg => KenCarp4(),     :tags => [:sdirk, :4th_order, :stiff, :imex]),
    Dict(:alg => Tsit5(),        :tags => [:rk, :5th_order, :nonstiff, :reference]),
    Dict(:alg => CVODE_BDF(),    :tags => [:bdf, :sundials, :reference]),
]

Filtering helpers:

# Get subset matching ALL specified tags (AND logic)
filter_by_tags(wp_set, :rosenbrock)           # all rosenbrock methods
filter_by_tags(wp_set, :5th_order, :stiff)    # 5th order AND stiff
filter_by_tags(wp_set, :reference)            # just reference methods

# Exclude by tags
exclude_by_tags(wp_set, :reference)           # everything except reference

SDE/DAE compatibility: Tags are purely additive metadata. The existing WorkPrecisionSet constructors for AbstractRODEProblem, AbstractEnsembleProblem, AbstractBVProblem, etc., just need to pass tags through. No changes to numruns_error, error_estimate = :weak_final, prob_choice, or any SDE/DAE-specific parameters.

DAE formalism comparisons can use tags naturally:

setups = [
    Dict(:alg => Rodas5P(), :prob_choice => 1, :tags => [:mass_matrix, :rosenbrock]),
    Dict(:alg => Rodas5P(), :prob_choice => 2, :tags => [:sparse, :rosenbrock]),
    Dict(:alg => IDA(),     :prob_choice => 3, :tags => [:sundials, :dae_residual]),
]

Phase 2: Multi-error-mode runs

Currently each WorkPrecisionSet uses a single error_estimate. To avoid re-running everything for each error mode, support computing multiple error metrics in one pass:

wp_set = WorkPrecisionSet(prob, abstols, reltols, setups;
    error_estimates = [:final, :l2, :L2],  # compute all three
    appxsol = test_sol)

The errors StructArray in WorkPrecision already stores all computed error types as a NamedTuple, so this is mainly about requesting timeseries_errors=true, dense_errors=true together and storing a vector of active error estimates for plotting.

For weak SDE errors (:weak_final, :weak_l2, etc.), the same mechanism applies — these are already stored in the errors dict. The expensive part (ensemble runs via numruns_error) only needs to happen once regardless of how many weak error metrics are extracted from the results.

Phase 3: Tag-based and reference-method plotting

Extend the plot recipe to support tag-based subsetting and reference method overlays:

# Plot only rosenbrock family
plot(wp_set, tags = [:rosenbrock])

# Plot 5th order methods with reference methods always included
plot(wp_set, tags = [:5th_order], include_tags = [:reference])

# Reference methods get distinct styling (dashed, thinner)
plot(wp_set, tags = [:imex], reference_tags = [:reference],
     reference_style = (linestyle = :dash, linewidth = 1, alpha = 0.5))

When reference methods are far out of frame from the main methods, option to:

  • Auto-adjust axis limits to include them
  • Or clip/omit them with a warning
  • Or use a secondary inset plot

Phase 4: Best-of-family helpers

Automatically identify standout methods per family:

# Get the best method per family tag (by Pareto efficiency on the error-time curve)
best = best_by_tag(wp_set, :rosenbrock; n = 2, error_estimate = :final)

# Create a "best of all families" WorkPrecisionSet
families = [:rosenbrock, :bdf, :sdirk, :rk, :imex, :exponential]
best_of = best_of_families(wp_set, families; n = 2)
plot(best_of)  # cross-family comparison with the top 2 from each

"Best" should be determined by Pareto efficiency on the work-precision curve (not just minimum error or minimum time, but the overall curve quality). Could use area under the log-log curve, or minimum time at a reference error level, or a combination.

Phase 5: Time cutoff for slow methods

Some methods are extremely slow at certain tolerances. The current NaN-filtering handles crashes, but not the case where a method takes 100x longer than others. Options:

  1. Process-level timeout via Distributed: Run each solve in a worker process with a timeout. If it exceeds the cutoff, kill it and mark as NaN.

    WorkPrecisionSet(prob, abstols, reltols, setups;
        timeout = 300.0,  # seconds per solve
        parallel = :distributed)  # use Distributed workers
  2. Relative timeout: Set timeout as a multiple of the fastest solve at each tolerance level (e.g., 50x the fastest).

  3. Callback-based: Use a DiscreteCallback that checks wall-clock time and terminates.

The Distributed approach is cleanest for hard timeouts but adds a dependency. The callback approach works within a single process. We should support both.

For SDE weak benchmarks (which are already very expensive with numruns_error = 1000), the timeout should apply per-trajectory or per-ensemble, not per-individual-solve.

Phase 6: AutoDiff on/off comparison helpers

# Automatically create AD vs no-AD variants of setups
setups_with_ad = with_autodiff_variants(setups;
    ad_backends = [AutoForwardDiff(), AutoFiniteDiff()],
    methods = [:best, 3]  # only for top 3 methods or :all
)

# Or tag-based: only show AD comparison for reference + best methods
plot(wp_set, tags = [:autodiff_forward], compare_tags = [:autodiff_finite])

Phase 7: Autoplot — generate comprehensive plot sets

A single function that generates all standard comparison plots:

plots = autoplot(wp_set;
    families = [:rosenbrock, :bdf, :sdirk, :rk, :imex],
    tolerance_ranges = Dict(:low => (1e-3, 1e-8), :high => (1e-8, 1e-13)),
    error_modes = [:final, :l2, :L2],
    reference_tags = [:reference],
    autodiff_compare = true,
    best_n = 2,
    backend = :gr  # or :plotlyjs for interactive
)

Returns a structured collection of plots:

  • Per-family plots (within-family comparison)
  • Cross-family "best of" plots
  • AD on/off comparison (with best methods only)
  • Low tolerance and high tolerance versions of each
  • Each error mode version

Phase 8: Interactive Plotly support

For plots with many overlapping curves, interactive Plotly plots would help:

  • Hover to see method name and exact values
  • Click legend to toggle individual methods
  • Zoom into regions of interest

This could be a separate package extension (DiffEqDevToolsPlotlyExt) or just work via PlotlyJS backend for Plots.jl. The plot recipes should degrade gracefully — same recipe, different backend.

Phase 9: SciMLBenchmarks migration

After the DiffEqDevTools infrastructure is in place, update SciMLBenchmarks to use it:

  • Replace manual family-grouping with tags
  • Replace repeated WorkPrecisionSet calls with single tagged runs
  • Use autoplot to generate the standard comparison plots
  • Dramatically reduce per-benchmark boilerplate

Design Constraints

  • Full backward compatibility: All new fields have defaults, all new parameters are keyword-only with defaults matching current behavior.
  • SDE weak benchmarks: The most expensive benchmarks (1000+ trajectories × multiple methods). Tagging/filtering must be zero-cost at solve time — it's purely metadata for post-hoc plot generation.
  • DAE problem formalism: prob_choice pattern must continue to work. Tags complement it by adding semantic meaning (:mass_matrix vs :dae_residual vs :mtk_reduced).
  • No new hard dependencies: Plotly support via package extension. Distributed timeout via package extension or optional import.

Implementation Plan

  • PR 1: Phase 1 — Core tagging infrastructure (tags field, filtering helpers)
  • PR 2: Phase 2 — Multi-error-mode support
  • PR 3: Phase 3 — Tag-based plotting and reference method overlays
  • PR 4: Phase 4 — Best-of-family helpers
  • PR 5: Phase 5 — Time cutoff mechanism
  • PR 6: Phase 6 — AutoDiff comparison helpers
  • PR 7: Phase 7 — Autoplot
  • PR 8: Phase 8 — Plotly extension
  • PR 9+: Phase 9 — SciMLBenchmarks migration (multiple PRs, one per benchmark category)

PRs 1–4 are the core value and can be done incrementally. PRs 5–8 are enhancements. PR 9+ is the payoff in simplified benchmark code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions