-
Notifications
You must be signed in to change notification settings - Fork 1
feat(ray_tune): difference stopper #412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
if an operation that created a nested op was interrupted during the nested op the relationship between the parent and child would not be captured. this change fixes this.
Stopper which stops when the difference between too metrics is greater/less than a threshold with some probability e.g. stop if perf_version_a and perf_version_b < 10 tokens/sec with 95% probability also add tests
AlessandroPomponio
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will need to check again tomorrow, but these are some issues I see
|
@AlessandroPomponio hold off on review as I noticed a more fundamental issue I need to fix will convert to draft and ping you when ready |
Deprecated mode parameter Removes cases where the condition specified by the user is known to never be satisfied but the run did not stop e.g. 95% prop the m ean difference > 10, but the mean difference is 2 - this would not stop even when the probability the mean difference <10 is 1.0
Signed-off-by: Michael Johnston <66301584+michael-johnston@users.noreply.github.com>
|
Checks Summary Last run: 2026-01-16T11:23:13.286Z Code Risk Analyzer vulnerability scan found 2 vulnerabilities:
Mend Unified Agent vulnerability scan found 1 vulnerabilities:
|
Changes addressed
Now allows setting if the identifier is in target or observed format. Default is either (the existing behaviour)
MeasurementSpace.propertyWithIdentifierInSpace
…erved format target is default, keeping existing behaviour
Co-authored-by: Alessandro Pomponio <10339005+AlessandroPomponio@users.noreply.github.com> Signed-off-by: Michael Johnston <66301584+michael-johnston@users.noreply.github.com>
Co-authored-by: Alessandro Pomponio <10339005+AlessandroPomponio@users.noreply.github.com> Signed-off-by: Michael Johnston <66301584+michael-johnston@users.noreply.github.com>
Co-authored-by: Alessandro Pomponio <10339005+AlessandroPomponio@users.noreply.github.com> Signed-off-by: Michael Johnston <66301584+michael-johnston@users.noreply.github.com>
Co-authored-by: Alessandro Pomponio <10339005+AlessandroPomponio@users.noreply.github.com> Signed-off-by: Michael Johnston <66301584+michael-johnston@users.noreply.github.com>
Co-authored-by: Alessandro Pomponio <10339005+AlessandroPomponio@users.noreply.github.com> Signed-off-by: Michael Johnston <66301584+michael-johnston@users.noreply.github.com>
|
@VassilisVassiliadis suggested operation for testing with sfttrainer. Change fields as desired. metadata:
description: "Perform latin hypercube sampling with difference stopper for space using sfttrainer lora benchmark experiment"
name: "lhc-difference-sfttrainer-lora"
operation:
module:
operatorName: "ray_tune"
operationType: "search"
parameters:
runtimeConfig:
stop:
- name: "BayesianMetricDifferenceStopper"
keywordParams:
metric_a: "finetune_lora_benchmark-v1.0.0-fms_hf_tuning_version.2.6.0-dataset_tokens_per_second_per_gpu" # v1 measurement
metric_b: "finetune_lora_benchmark-v1.0.0-fms_hf_tuning_version.3.0.0-dataset_tokens_per_second_per_gpu" # v2 measurement
threshold: 100 # Stop when we know |v1-v2| > or < 100 with target probability
target_probability: 0.95 # 95% confidence
min_samples: 10 # Wait for 10 trials minimum
orchestratorConfig:
metric_format: "observed" # We need to use observed property value as the target property id is the same for both experiment versions
tuneConfig:
metric: "finetune_lora_benchmark-v1.0.0-fms_hf_tuning_version.2.6.0-dataset_tokens_per_second_per_gpu" #ray tune needs primary metric to track
max_concurrent_trials: 1 # This is set for debugging. Increase if you want multiple measurements at once.
mode: min
num_samples: 32
search_alg:
name: lhu_sampler
spaces:
- space-60b5c0-12e5dd |
config.py: import typing operator.py import literal from typing
Suppresses errors in IDE if plugin is not installed
Changes addressed
|
@VassilisVassiliadis We will wait on your update that above YAML works before merging. |
|
I'll reply here when my test is over. |
This PR add a new stopper, BayesianDifferenceStopper, to ray tune operator.
This stopper enables stopping sample when the mean difference in two metrics is determined to be above or below a threshold with a target certainty.
Use cases including determining if a model is drifting i.e. if the mean difference is above or below a tolerance threshold given new data; and detecting performance regressions i.e. if given a new software version software performance has changed substantially (beyond a threshold)