Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new warmup-stable-decay (WSD) learning-rate scheduler to the backend-agnostic BaseLR registry, wires it into argument checking and backend exports, and expands cross-backend test coverage to validate behavior and consistency.
Changes:
- Implement
LearningRateWSD(type="wsd") with warmup support plus configurable decay modes (inverse_linear,cosine,linear). - Extend CLI/config arg validation to accept and validate WSD-specific parameters.
- Add/extend tests across universal, TF, PT, PD, and consistency suites for WSD behavior.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
deepmd/dpmodel/utils/learning_rate.py |
Adds the LearningRateWSD scheduler implementation and registers it under BaseLR. |
deepmd/utils/argcheck.py |
Adds WSD-specific validation and exposes WSD arguments via the lr args plugin registry. |
deepmd/pt/utils/learning_rate.py |
Re-exports LearningRateWSD for the PyTorch backend utilities. |
deepmd/pd/utils/learning_rate.py |
Re-exports LearningRateWSD for the Paddle backend utilities. |
source/tests/universal/dpmodel/utils/test_learning_rate.py |
Adds extensive unit tests for WSD (modes, warmup, array input, boundary conditions). |
source/tests/tf/test_lr.py |
Adds TF wrapper build/value tests for WSD (default + cosine). |
source/tests/pt/test_lr.py |
Adds PT-side WSD curve tests for all decay modes. |
source/tests/pd/test_lr.py |
Adds PD-side WSD curve tests for all decay modes. |
source/tests/consistent/test_learning_rate.py |
Extends consistency parameterization to include WSD and adjusts the sampling step to hit the decay phase. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
📝 WalkthroughWalkthroughAdds a new plugin-registered learning rate schedule Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
source/tests/tf/test_lr.py (1)
106-140: These assertions don't exercise the TF graph path.
lr_schedule.value()just delegates tobase_lr.value(), so both new tests still pass ifLearningRateSchedule.build()is wrong fortype="wsd". Please evaluate the tensor returned bybuild()at a mid-decay step and compare that result instead.🧪 Suggested coverage improvement
- global_step = tf.constant(0, dtype=tf.int64) - lr_schedule.build(global_step, num_steps=10000) + g = tf.Graph() + with g.as_default(): + global_step = tf.placeholder(shape=[], dtype=tf.int64) + lr_tensor = lr_schedule.build(global_step, num_steps=10000) self.assertIsInstance(lr_schedule.base_lr, LearningRateWSD) - np.testing.assert_allclose( - lr_schedule.value(9500), lr_schedule.base_lr.value(9500), rtol=1e-10 - ) + with tf.Session(graph=g) as sess: + tensor_value = sess.run(lr_tensor, feed_dict={global_step: 9500}) + np.testing.assert_allclose( + tensor_value, lr_schedule.base_lr.value(9500), rtol=1e-10 + )Apply the same pattern to the cosine WSD variant as well.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/tf/test_lr.py` around lines 106 - 140, The tests are only exercising the Python path because lr_schedule.value(9500) passes a plain int; update both tests to pass a TF tensor so the built TF graph is evaluated: after calling LearningRateSchedule.build(global_step, num_steps=10000), call lr_schedule.value(tf.constant(9500, dtype=tf.int64)) and compare it to lr_schedule.base_lr.value(tf.constant(9500, dtype=tf.int64)) (and do the same change in test_wsd_cosine_build_and_value) to ensure LearningRateSchedule.build, base_lr and LearningRateWSD TF-paths are exercised.source/tests/universal/dpmodel/utils/test_learning_rate.py (1)
142-165: Consider splitting mixed validation assertions into separate tests.
test_invalid_decay_phase_ratioalso checks invaliddecay_type; splitting improves failure localization.♻️ Suggested test split
- def test_invalid_decay_phase_ratio(self) -> None: - """Test invalid WSD decay_phase_ratio values.""" + def test_invalid_decay_phase_ratio(self) -> None: + """Test invalid WSD decay_phase_ratio values.""" with self.assertRaises(ValueError): LearningRateWSD( start_lr=1e-3, stop_lr=1e-5, num_steps=10000, decay_phase_ratio=0.0, ) with self.assertRaises(ValueError): LearningRateWSD( start_lr=1e-3, stop_lr=1e-5, num_steps=10000, decay_phase_ratio=1.1, ) + + def test_invalid_decay_type(self) -> None: + """Test invalid WSD decay_type values.""" with self.assertRaises(ValueError): LearningRateWSD( start_lr=1e-3, stop_lr=1e-5, num_steps=10000, decay_type="bad_mode", )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/universal/dpmodel/utils/test_learning_rate.py` around lines 142 - 165, test_invalid_decay_phase_ratio currently asserts multiple unrelated invalid cases (invalid decay_phase_ratio and invalid decay_type) in one test; split it into separate tests so each assertion checks a single validation rule. Create one test (e.g., test_invalid_decay_phase_ratio_values) that calls LearningRateWSD with decay_phase_ratio=0.0 and decay_phase_ratio=1.1 and asserts ValueError, and another test (e.g., test_invalid_decay_type) that calls LearningRateWSD with decay_type="bad_mode" and asserts ValueError; keep function/class names LearningRateWSD, decay_phase_ratio, and decay_type to locate the code. Ensure each new test has a clear name and only one responsibility for better failure localization.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deepmd/dpmodel/utils/learning_rate.py`:
- Around line 514-525: The current int(self.decay_phase_ratio * self.num_steps)
can produce 0 or exceed post-warmup steps; change the computation of
decay_phase_steps to derive from and clamp against decay_num_steps instead of
raising: compute e.g. desired = max(0, int(round(self.decay_phase_ratio *
self.num_steps))) (or 0 if you prefer), then set self.decay_phase_steps =
min(desired, self.decay_num_steps) and if self.decay_phase_steps == 0 and
self.decay_num_steps > 0 set it to 1 (or otherwise handle the zero-decay case),
replacing the two ValueError checks; update code around decay_phase_steps,
decay_phase_ratio, num_steps and decay_num_steps to use this clamped value so
short runs or heavy-warmup configs don’t raise at runtime.
---
Nitpick comments:
In `@source/tests/tf/test_lr.py`:
- Around line 106-140: The tests are only exercising the Python path because
lr_schedule.value(9500) passes a plain int; update both tests to pass a TF
tensor so the built TF graph is evaluated: after calling
LearningRateSchedule.build(global_step, num_steps=10000), call
lr_schedule.value(tf.constant(9500, dtype=tf.int64)) and compare it to
lr_schedule.base_lr.value(tf.constant(9500, dtype=tf.int64)) (and do the same
change in test_wsd_cosine_build_and_value) to ensure LearningRateSchedule.build,
base_lr and LearningRateWSD TF-paths are exercised.
In `@source/tests/universal/dpmodel/utils/test_learning_rate.py`:
- Around line 142-165: test_invalid_decay_phase_ratio currently asserts multiple
unrelated invalid cases (invalid decay_phase_ratio and invalid decay_type) in
one test; split it into separate tests so each assertion checks a single
validation rule. Create one test (e.g., test_invalid_decay_phase_ratio_values)
that calls LearningRateWSD with decay_phase_ratio=0.0 and decay_phase_ratio=1.1
and asserts ValueError, and another test (e.g., test_invalid_decay_type) that
calls LearningRateWSD with decay_type="bad_mode" and asserts ValueError; keep
function/class names LearningRateWSD, decay_phase_ratio, and decay_type to
locate the code. Ensure each new test has a clear name and only one
responsibility for better failure localization.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a283f5a5-f445-4ee4-858e-4f089ee4287d
📒 Files selected for processing (9)
deepmd/dpmodel/utils/learning_rate.pydeepmd/pd/utils/learning_rate.pydeepmd/pt/utils/learning_rate.pydeepmd/utils/argcheck.pysource/tests/consistent/test_learning_rate.pysource/tests/pd/test_lr.pysource/tests/pt/test_lr.pysource/tests/tf/test_lr.pysource/tests/universal/dpmodel/utils/test_learning_rate.py
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5326 +/- ##
==========================================
+ Coverage 82.29% 82.32% +0.02%
==========================================
Files 775 777 +2
Lines 77627 77877 +250
Branches 3675 3678 +3
==========================================
+ Hits 63887 64112 +225
- Misses 12566 12590 +24
- Partials 1174 1175 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
♻️ Duplicate comments (1)
deepmd/dpmodel/utils/learning_rate.py (1)
514-518:⚠️ Potential issue | 🟡 MinorHandle edge case when
decay_num_stepsis zero.The clamping logic at line 515-516 ensures
decay_phase_steps >= 1, but whendecay_num_steps = 0(i.e.,num_steps = warmup_steps = 0), this results instable_steps = -1. While the_decay_valuelogic handles this gracefully by returningstop_lrimmediately, having negativestable_stepsis semantically incorrect.Consider adding early validation or adjusting the clamping:
🛡️ Suggested fix
# === Derive stable and decay phase lengths === self.decay_phase_ratio = decay_phase_ratio self.decay_type = decay_type + # When decay_num_steps is 0, there's no room for decay + if self.decay_num_steps == 0: + self.decay_phase_steps = 0 + self.stable_steps = 0 + return # Clamp decay_phase_steps to valid range [1, decay_num_steps] self.decay_phase_steps = max( 1, min(int(self.decay_phase_ratio * self.num_steps), self.decay_num_steps) ) self.stable_steps = self.decay_num_steps - self.decay_phase_steps🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/dpmodel/utils/learning_rate.py` around lines 514 - 518, The current clamping forces decay_phase_steps >= 1 which yields stable_steps negative when decay_num_steps == 0; update the logic so decay_phase_steps is clamped to the range [0, decay_num_steps] (allowing zero) or explicitly handle the decay_num_steps == 0 case by setting decay_phase_steps = 0 and stable_steps = 0 and/or validating decay_num_steps >= 0; adjust the block that sets self.decay_phase_steps and self.stable_steps (and any callers like _decay_value) to rely on non-negative stable_steps and ensure no negative values are produced.
🧹 Nitpick comments (1)
deepmd/dpmodel/utils/learning_rate.py (1)
559-562: Redundantxp.asarraywrapping.Since
tauis already an array withstep_dtype, the expressionxp.pi * tauwill produce an array of the same dtype. Thexp.asarray(..., dtype=step_dtype)call is redundant.♻️ Suggested simplification
elif self.decay_type == "cosine": decay_lr = stop_lr + (start_lr - stop_lr) * 0.5 * ( - one + xp.cos(xp.asarray(xp.pi * tau, dtype=step_dtype)) + one + xp.cos(xp.pi * tau) )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/dpmodel/utils/learning_rate.py` around lines 559 - 562, In the cosine decay branch (where self.decay_type == "cosine") the expression xp.asarray(xp.pi * tau, dtype=step_dtype) is redundant because xp.pi * tau already yields an array with the correct dtype; replace that subexpression with xp.pi * tau so decay_lr is computed as stop_lr + (start_lr - stop_lr) * 0.5 * (one + xp.cos(xp.pi * tau)), keeping references to decay_lr, start_lr, stop_lr, tau, xp.cos and step_dtype unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@deepmd/dpmodel/utils/learning_rate.py`:
- Around line 514-518: The current clamping forces decay_phase_steps >= 1 which
yields stable_steps negative when decay_num_steps == 0; update the logic so
decay_phase_steps is clamped to the range [0, decay_num_steps] (allowing zero)
or explicitly handle the decay_num_steps == 0 case by setting decay_phase_steps
= 0 and stable_steps = 0 and/or validating decay_num_steps >= 0; adjust the
block that sets self.decay_phase_steps and self.stable_steps (and any callers
like _decay_value) to rely on non-negative stable_steps and ensure no negative
values are produced.
---
Nitpick comments:
In `@deepmd/dpmodel/utils/learning_rate.py`:
- Around line 559-562: In the cosine decay branch (where self.decay_type ==
"cosine") the expression xp.asarray(xp.pi * tau, dtype=step_dtype) is redundant
because xp.pi * tau already yields an array with the correct dtype; replace that
subexpression with xp.pi * tau so decay_lr is computed as stop_lr + (start_lr -
stop_lr) * 0.5 * (one + xp.cos(xp.pi * tau)), keeping references to decay_lr,
start_lr, stop_lr, tau, xp.cos and step_dtype unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 1363b61c-f8fb-4804-b834-420b56c86723
📒 Files selected for processing (2)
deepmd/dpmodel/utils/learning_rate.pysource/tests/universal/dpmodel/utils/test_learning_rate.py
🚧 Files skipped from review as they are similar to previous changes (1)
- source/tests/universal/dpmodel/utils/test_learning_rate.py
The doc will be added through PR #5276
Summary by CodeRabbit
New Features
Tests