feat(ray_tune): difference stopper #412

michael-johnston · 2026-01-14T15:32:44Z

This PR add a new stopper, BayesianDifferenceStopper, to ray tune operator.

This stopper enables stopping sample when the mean difference in two metrics is determined to be above or below a threshold with a target certainty.

Use cases including determining if a model is drifting i.e. if the mean difference is above or below a tolerance threshold given new data; and detecting performance regressions i.e. if given a new software version software performance has changed substantially (beyond a threshold)

if an operation that created a nested op was interrupted during the nested op the relationship between the parent and child would not be captured. this change fixes this.

Stopper which stops when the difference between too metrics is greater/less than a threshold with some probability e.g. stop if perf_version_a and perf_version_b < 10 tokens/sec with 95% probability also add tests

AlessandroPomponio

I will need to check again tomorrow, but these are some issues I see

orchestrator/modules/operators/_orchestrate_core.py

orchestrator/modules/operators/collections.py

plugins/operators/ray_tune/ado_ray_tune/stoppers.py

michael-johnston · 2026-01-14T18:44:40Z

@AlessandroPomponio hold off on review as I noticed a more fundamental issue I need to fix

will convert to draft and ping you when ready

Deprecated mode parameter Removes cases where the condition specified by the user is known to never be satisfied but the run did not stop e.g. 95% prop the m ean difference > 10, but the mean difference is 2 - this would not stop even when the probability the mean difference <10 is 1.0

Signed-off-by: Michael Johnston <66301584+michael-johnston@users.noreply.github.com>

DRL-NextGen · 2026-01-15T15:16:14Z

Checks Summary

Last run: 2026-01-16T11:23:13.286Z

Code Risk Analyzer vulnerability scan found 2 vulnerabilities:

Severity	Identifier	Package	Details	Fix
🔷 Medium	CVE-2026-22773	vllm	vLLM is vulnerable to DoS in Idefics3 vision models via image payload with ambiguous dimensions GHSA-grg2-63fw-f2qr vllm:0.11.1	0.12.0
◻ Unknown	CVE-2025-53000	nbconvert	nbconvert has an uncontrolled search path that leads to unauthorized code execution on Windows GHSA-xm59-rqc7-hhvf nbconvert:7.16.6->ado-core:1.3.3	>7.16.6

Mend Unified Agent vulnerability scan found 1 vulnerabilities:

Severity	Identifier	Package	Details	Fix
🔺 High	CVE-2025-53000	nbconvert-7.16.6-py3-none-any.whl	The nbconvert tool, jupyter nbconvert, converts Jupyter notebooks to various other formats via Jinja... The nbconvert tool, jupyter nbconvert, converts Jupyter notebooks to various other formats via Jinja templates. Versions of nbconvert up to and including 7.16.6 on Windows have a vulnerability in which converting a notebook containing SVG output to a PDF results in unauthorized code execution. Specifically, a third party can create a "inkscape.bat" file that defines a Windows batch script, capable of arbitrary code execution. When a user runs "jupyter nbconvert --to pdf" on a notebook containing SVG output to a PDF on a Windows platform from this directory, the "inkscape.bat" file is run unexpectedly. As of time of publication, no known patches exist.	Not Available

Changes addressed

Now allows setting if the identifier is in target or observed format. Default is either (the existing behaviour)

MeasurementSpace.propertyWithIdentifierInSpace

…erved format target is default, keeping existing behaviour

…ent_stopper

orchestrator/schema/measurementspace.py

plugins/operators/ray_tune/ado_ray_tune/config.py

plugins/operators/ray_tune/ado_ray_tune/operator.py

website/docs/operators/optimisation-with-ray-tune.md

Co-authored-by: Alessandro Pomponio <10339005+AlessandroPomponio@users.noreply.github.com> Signed-off-by: Michael Johnston <66301584+michael-johnston@users.noreply.github.com>

orchestrator/schema/measurementspace.py

Co-authored-by: Alessandro Pomponio <10339005+AlessandroPomponio@users.noreply.github.com> Signed-off-by: Michael Johnston <66301584+michael-johnston@users.noreply.github.com>

michael-johnston · 2026-01-16T10:48:48Z

@VassilisVassiliadis suggested operation for testing with sfttrainer. Change fields as desired.

metadata:
  description: "Perform latin hypercube sampling with difference stopper for space using sfttrainer lora benchmark experiment"
  name: "lhc-difference-sfttrainer-lora"
operation:
  module:
    operatorName: "ray_tune"
    operationType: "search"
  parameters:
    runtimeConfig:
      stop:
      - name: "BayesianMetricDifferenceStopper"
        keywordParams:
          metric_a: "finetune_lora_benchmark-v1.0.0-fms_hf_tuning_version.2.6.0-dataset_tokens_per_second_per_gpu"  # v1 measurement
          metric_b: "finetune_lora_benchmark-v1.0.0-fms_hf_tuning_version.3.0.0-dataset_tokens_per_second_per_gpu"  # v2 measurement
          threshold: 100                  # Stop when we know |v1-v2| > or < 100 with target probability
          target_probability: 0.95        # 95% confidence
          min_samples: 10                 # Wait for 10 trials minimum
    orchestratorConfig:
      metric_format: "observed" # We need to use observed property value as the target property id is the same for both experiment versions
    tuneConfig:
      metric: "finetune_lora_benchmark-v1.0.0-fms_hf_tuning_version.2.6.0-dataset_tokens_per_second_per_gpu" #ray tune needs primary metric to track
      max_concurrent_trials: 1 # This is set for debugging. Increase if you want multiple measurements at once.
      mode: min 
      num_samples: 32
      search_alg:
        name: lhu_sampler
spaces:
  - space-60b5c0-12e5dd

config.py: import typing operator.py import literal from typing

Suppresses errors in IDE if plugin is not installed

Changes addressed

michael-johnston · 2026-01-16T11:41:19Z

@VassilisVassiliadis We will wait on your update that above YAML works before merging.

VassilisVassiliadis · 2026-01-16T14:36:05Z

I'll reply here when my test is over.

michael-johnston added 3 commits January 12, 2026 19:04

fix(core): relationships to interrupted nested ops

b6180e1

if an operation that created a nested op was interrupted during the nested op the relationship between the parent and child would not be captured. this change fixes this.

chore(core): log level

edf80c8

feat(ado_ray_tune): difference stopper

b011085

Stopper which stops when the difference between too metrics is greater/less than a threshold with some probability e.g. stop if perf_version_a and perf_version_b < 10 tokens/sec with 95% probability also add tests

AlessandroPomponio changed the title ~~feat: difference stopper~~ feat(ray_tune): difference stopper Jan 14, 2026

AlessandroPomponio previously requested changes Jan 14, 2026

View reviewed changes

michael-johnston marked this pull request as draft January 14, 2026 18:45

michael-johnston and others added 2 commits January 15, 2026 14:38

Merge branch 'main' into maj_moment_stopper

7b4d75c

Signed-off-by: Michael Johnston <66301584+michael-johnston@users.noreply.github.com>

michael-johnston added 3 commits January 15, 2026 15:27

chore(core): black

a5fa247

docs(website): difference stopper

a73a212

chore: fix typo

2b9fc0a

michael-johnston and others added 6 commits January 15, 2026 19:05

Merge branch 'main' into maj_moment_stopper

037f6fa

feat(core): enhance propertyWithIdentifierInSpace

02387d4

Now allows setting if the identifier is in target or observed format. Default is either (the existing behaviour)

test(schema): tests for updated method

47817fd

MeasurementSpace.propertyWithIdentifierInSpace

feat(ray_tune): Allow metrics/properties to be named in target or obs…

c71243f

…erved format target is default, keeping existing behaviour

docs(website): Add metric_format docs

ba09e20

Merge remote-tracking branch 'origin/maj_moment_stopper' into maj_mom…

5c846cf

…ent_stopper

michael-johnston marked this pull request as ready for review January 15, 2026 19:33

michael-johnston requested a review from AlessandroPomponio January 15, 2026 19:33

chore(ray_tune): fix field name

a6fc472

AlessandroPomponio requested changes Jan 16, 2026

View reviewed changes

michael-johnston and others added 4 commits January 16, 2026 10:12

Apply suggestions from code review

7a1b924

Co-authored-by: Alessandro Pomponio <10339005+AlessandroPomponio@users.noreply.github.com> Signed-off-by: Michael Johnston <66301584+michael-johnston@users.noreply.github.com>

Apply suggestions from code review

478f1d4

Co-authored-by: Alessandro Pomponio <10339005+AlessandroPomponio@users.noreply.github.com> Signed-off-by: Michael Johnston <66301584+michael-johnston@users.noreply.github.com>

Apply suggestions from code review

b6889a7

Co-authored-by: Alessandro Pomponio <10339005+AlessandroPomponio@users.noreply.github.com> Signed-off-by: Michael Johnston <66301584+michael-johnston@users.noreply.github.com>

Update orchestrator/schema/measurementspace.py

9ec47c5

Co-authored-by: Alessandro Pomponio <10339005+AlessandroPomponio@users.noreply.github.com> Signed-off-by: Michael Johnston <66301584+michael-johnston@users.noreply.github.com>

AlessandroPomponio previously requested changes Jan 16, 2026

View reviewed changes

orchestrator/schema/measurementspace.py Show resolved Hide resolved

Update orchestrator/schema/measurementspace.py

6472dbc

Co-authored-by: Alessandro Pomponio <10339005+AlessandroPomponio@users.noreply.github.com> Signed-off-by: Michael Johnston <66301584+michael-johnston@users.noreply.github.com>

michael-johnston added 2 commits January 16, 2026 10:59

chore(ray_tune): fix imports

6ddfb48

config.py: import typing operator.py import literal from typing

chore(ray_tune): use local imports

3d37db1

Suppresses errors in IDE if plugin is not installed

michael-johnston requested a review from AlessandroPomponio January 16, 2026 11:40

michael-johnston requested a review from VassilisVassiliadis January 16, 2026 11:40

AlessandroPomponio approved these changes Jan 16, 2026

View reviewed changes

feat(ray_tune): difference stopper #412

Are you sure you want to change the base?

feat(ray_tune): difference stopper #412

Uh oh!

Conversation

michael-johnston commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlessandroPomponio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michael-johnston commented Jan 14, 2026

Uh oh!

DRL-NextGen commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michael-johnston commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michael-johnston commented Jan 16, 2026

Uh oh!

VassilisVassiliadis commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

michael-johnston commented Jan 14, 2026 •

edited

Loading

DRL-NextGen commented Jan 15, 2026 •

edited

Loading

michael-johnston commented Jan 16, 2026 •

edited

Loading