-
Notifications
You must be signed in to change notification settings - Fork 587
refactor(optimizer): finalize optimizer schema and backend handling #5157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request refactors the optimizer configuration schema to promote it from being nested within the training section to a dedicated top-level optimizer section in the input JSON configuration. This change improves consistency and clarity across the DeePMD-kit codebase.
Changes:
- Introduced a new top-level
optimizersection in the configuration schema with support for multiple optimizer types (Adam, AdamW, LKF, AdaMuon, HybridMuon) - Updated all backend implementations (TensorFlow, PyTorch, Paddle) to extract optimizer parameters from the new location
- Updated documentation to reference the new
optimizersection across multiple training guides
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
doc/train/training-advanced.md |
Added comprehensive documentation for the new optimizer section with examples and backend-specific support information |
doc/model/train-fitting-tensor.md |
Updated to reference the new optimizer section in the configuration structure |
doc/model/train-fitting-property.md |
Updated to reference the new optimizer section in the configuration structure |
doc/model/train-fitting-dos.md |
Updated to reference the new optimizer section in the configuration structure |
deepmd/utils/argcheck.py |
Defined optimizer schema with ArgsPlugin pattern, moved optimizer definitions from training_args to standalone optimizer_args, added support for all optimizer types with proper defaults |
deepmd/tf/train/trainer.py |
Updated to extract optimizer parameters from new optimizer section, added early validation for TensorFlow-unsupported features (weight_decay, non-Adam optimizers), implemented beta1/beta2 parameters |
deepmd/pt/train/training.py |
Updated to extract optimizer parameters from new optimizer section, refactored get_opt_param to include optimizer-specific default values, expanded optimizer initialization with explicit beta parameters |
deepmd/pd/train/training.py |
Updated to extract optimizer parameters from new optimizer section, added beta1/beta2/weight_decay parameters to Adam optimizer initialization |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughCentralizes optimizer configuration under a top-level Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User as User/config
participant Compat as compat.convert_optimizer_to_new_format
participant Entryp as Entrypoint/train()
participant Trainer as Trainer (PT/PD/TF)
participant Optim as Optimizer factory
participant Sched as LR Scheduler
User->>Entryp: provide input JSON
Entryp->>Compat: convert_optimizer_to_new_format(jdata)
Compat-->>Entryp: normalized jdata with top-level optimizer
Entryp->>Trainer: start training with normalized config
Trainer->>Optim: read optimizer.type and params
Optim-->>Trainer: instantiate optimizer (Adam/AdamW/LKF/AdaMuon/HybridMuon)
Trainer->>Sched: create LambdaLR warm-up (when applicable)
Sched-->>Trainer: return scheduler attached to optimizer
Trainer-->>Entryp: training loop proceeds using optimizer+scheduler
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@doc/model/train-fitting-dos.md`:
- Line 19: The README uses non-descriptive link text "here"; change it to a
descriptive phrase and update the inline reference so readers know what they’re
clicking. Replace "here" with something like "training parameters reference" or
"training keyword definitions" while keeping the same target file
(train-se-e2-a.md), and ensure the sentence still mentions the relevant keywords
{ref}`model[standard]/fitting_net <model[standard]/fitting_net>` and {ref}`loss
<loss>` so readers can find the training parameter explanations for fitting the
dos.
In `@doc/model/train-fitting-property.md`:
- Line 17: Replace the ambiguous link text "here" with a descriptive phrase so
the sentence reads clearly and improves accessibility; specifically update the
sentence referencing train-se-atten.md to use descriptive link text such as "SE
attention training documentation" (or similar) instead of "here", keeping the
surrounding refs ({ref}`model`, {ref}`learning_rate`, {ref}`optimizer`,
{ref}`loss`, {ref}`training`) and the callouts to
{ref}`model[standard]/fitting_net` and {ref}`loss` intact.
In `@doc/model/train-fitting-tensor.md`:
- Line 33: Replace the non-descriptive link text "here" in the sentence
describing training JSON (the line referencing `input.json`, `ener` mode and the
sections {ref}`model <model>`, {ref}`learning_rate <learning_rate>`,
{ref}`optimizer <optimizer>`, {ref}`loss <loss>` and {ref}`training <training>`)
with a descriptive link label that reflects the target content (e.g., "training
keywords and meanings" or "training keyword reference") and point it to
train-se-e2-a.md so the link text clearly conveys what the reader will find.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
deepmd/pd/train/training.pydeepmd/pt/train/training.pydeepmd/tf/train/trainer.pydeepmd/utils/argcheck.pydoc/model/train-fitting-dos.mddoc/model/train-fitting-property.mddoc/model/train-fitting-tensor.mddoc/train/training-advanced.md
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-12-12T13:40:14.334Z
Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.334Z
Learning: Test PyTorch training with `cd examples/water/se_e2_a && dp --pt train input_torch.json --skip-neighbor-stat` and allow timeout of 60+ seconds for validation
Applied to files:
doc/model/train-fitting-tensor.md
📚 Learning: 2025-12-12T13:40:14.334Z
Learnt from: CR
Repo: deepmodeling/deepmd-kit PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-12T13:40:14.334Z
Learning: Test TensorFlow training with `cd examples/water/se_e2_a && dp train input.json --skip-neighbor-stat` and allow timeout of 60+ seconds for validation
Applied to files:
doc/model/train-fitting-tensor.md
📚 Learning: 2026-01-10T04:28:18.703Z
Learnt from: OutisLi
Repo: deepmodeling/deepmd-kit PR: 5130
File: deepmd/pt/optimizer/adamuon.py:40-109
Timestamp: 2026-01-10T04:28:18.703Z
Learning: The coefficients (3.4445, -4.7750, 2.0315) in the `zeropower_via_newtonschulz5` function in `deepmd/pt/optimizer/adamuon.py` are standard across AdaMuon implementations and should not be modified for consistency with the reference implementations.
Applied to files:
deepmd/pt/train/training.py
🧬 Code graph analysis (2)
deepmd/utils/argcheck.py (1)
deepmd/utils/plugin.py (2)
register(41-59)register(122-141)
deepmd/pd/train/training.py (1)
deepmd/pt/train/training.py (1)
get_opt_param(160-197)
🪛 markdownlint-cli2 (0.18.1)
doc/model/train-fitting-property.md
17-17: Link text should be descriptive
(MD059, descriptive-link-text)
🪛 Ruff (0.14.11)
deepmd/pt/train/training.py
178-178: Avoid specifying long messages outside the exception class
(TRY003)
deepmd/tf/train/trainer.py
130-132: Avoid specifying long messages outside the exception class
(TRY003)
134-137: Avoid specifying long messages outside the exception class
(TRY003)
325-327: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (41)
- GitHub Check: Agent
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Test Python (8, 3.13)
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Test Python (11, 3.10)
- GitHub Check: Test Python (11, 3.13)
- GitHub Check: Test Python (10, 3.10)
- GitHub Check: Test Python (9, 3.13)
- GitHub Check: Test Python (10, 3.13)
- GitHub Check: Test Python (8, 3.10)
- GitHub Check: Test Python (3, 3.13)
- GitHub Check: Test Python (9, 3.10)
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test Python (7, 3.13)
- GitHub Check: Test Python (6, 3.13)
- GitHub Check: Test Python (12, 3.10)
- GitHub Check: Test Python (12, 3.13)
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (7, 3.10)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
🔇 Additional comments (12)
deepmd/pt/train/training.py (4)
126-129: Verify legacy optimizer keys are handled.The trainer now reads only the top-level
optimizersection whenoptim_dictis absent. If existing configs still settraining.opt_type/weight_decay/adam_beta2, those values will be ignored. Please confirm this is an intentional breaking change or add a fallback/deprecation error to make it explicit.Also applies to: 319-320
160-197: LGTM: per-optimizer defaults centralized.Nice consolidation of default hyperparameters and validation in
get_opt_param.
775-793: Nice: AdaMuon now restores optimizer state + scheduler.Keeps resume behavior consistent with Adam/AdamW.
733-764: No changes needed —fusedparameter is guaranteed in all supported PyTorch versions.The project requires PyTorch ≥2.7 (GPU) or ≥2.8 (CPU) per
pyproject.toml. Bothtorch.optim.Adam(fused since 1.13) andtorch.optim.AdamW(fused since 2.0) fully support thefusedparameter across all pinned versions. The current implementation is correct and doesn't require guards.Likely an incorrect or invalid review comment.
deepmd/utils/argcheck.py (3)
2568-2852: LGTM: optimizer schema centralized underoptimizer.Clear variant-based definitions and docstrings.
3683-3684: No review comment needed.
3754-3768: LGTM: optimizer args wired intogen_argsfor single/multi-task.deepmd/pd/train/training.py (2)
116-163: LGTM: optimizer params now sourced from the top-level config.Parsing the
optimizerblock once keeps the trainer aligned with the new schema.Also applies to: 250-266
595-606: LGTM: optimizer hyperparams are now plumbed through.deepmd/tf/train/trainer.py (2)
123-137: Optimizer parsing/validation looks solid.
323-348: Optimizer construction is consistent across branches.doc/train/training-advanced.md (1)
48-63: Optimizer section is clear and well-scoped.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5157 +/- ##
==========================================
- Coverage 81.92% 81.92% -0.01%
==========================================
Files 714 714
Lines 73301 73371 +70
Branches 3616 3616
==========================================
+ Hits 60055 60106 +51
- Misses 12083 12103 +20
+ Partials 1163 1162 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@deepmd/pt/train/training.py`:
- Around line 160-189: get_opt_param currently silently defaults to "Adam" when
callers use the legacy key "opt_type" or omit "type"; change get_opt_param to
check for a legacy "opt_type" key first and if present copy it into the
canonical "type" with a deprecation warning (use warnings.warn(...,
DeprecationWarning) or the module logger) before validating opt_type, and if
neither key is present raise a ValueError instead of silently defaulting to
"Adam"; reference the function get_opt_param and the symbols opt_type,
params["type"], and params["opt_type"] when implementing this guard and warning.
In `@deepmd/utils/compat.py`:
- Around line 426-477: The conversion currently only runs when "opt_type" is
present so legacy optimizer keys left in training_cfg (e.g., adam_beta1,
weight_decay) are ignored; update the logic in the block around
training_cfg/optimizer_keys (symbols: training_cfg, optimizer_keys,
extracted_cfg, optimizer_cfg) to trigger extraction when any legacy key is
present (not just opt_type), include the legacy alias "muon_min_2d_dim" by
adding it to optimizer_keys and mapping it to "min_2d_dim" when moving keys,
then proceed to convert opt_type->type if present and merge into optimizer_cfg
(preserving conversion precedence) and keep the existing defaulting to
default_optimizer/type behavior.
🧹 Nitpick comments (2)
deepmd/pd/train/training.py (1)
152-173: Consider removing unused kf_ parameters from extraction.*The Paddle backend only supports the Adam optimizer (validated at line 161-162), but
get_opt_paramstill extracts Kalman Filter parameters (kf_blocksize,kf_start_pref_e, etc.) that are only relevant for the LKF optimizer. While this doesn't cause functional issues, removing these unused extractions would improve code clarity and align better with the backend's actual capabilities.The early validation for unsupported optimizer types is good (fail-fast approach).
♻️ Optional: Remove unused kf_* parameters
opt_param = { - "kf_blocksize": params.get("kf_blocksize"), - "kf_start_pref_e": params.get("kf_start_pref_e"), - "kf_limit_pref_e": params.get("kf_limit_pref_e"), - "kf_start_pref_f": params.get("kf_start_pref_f"), - "kf_limit_pref_f": params.get("kf_limit_pref_f"), "adam_beta1": params.get("adam_beta1"), "adam_beta2": params.get("adam_beta2"), "weight_decay": params.get("weight_decay"), }deepmd/tf/train/trainer.py (1)
324-349: Minor: Duplicate optimizer type validation.The optimizer type check at lines 325-328 is redundant since the same validation already occurs in
_init_param(lines 130-133), which runs before_build_optimizeris called. The second check could be removed for cleaner code, though keeping it as a defensive guard is also acceptable.♻️ Optional: Remove redundant validation
def _build_optimizer(self): - if self.optimizer_type != "Adam": - raise RuntimeError( - f"Unsupported optimizer type {self.optimizer_type} for TensorFlow backend." - ) if self.run_opt.is_distrib:
| def get_opt_param(params: dict[str, Any]) -> tuple[str, dict[str, Any]]: | ||
| opt_type = params.get("opt_type", "Adam") | ||
| """ | ||
| Extract optimizer parameters. | ||
| Note: Default values are already filled by argcheck.normalize() | ||
| before this function is called, so we use params.get() without | ||
| fallback defaults here. | ||
| """ | ||
| opt_type = params.get("type", "Adam") | ||
| if opt_type not in ("Adam", "AdamW", "LKF", "AdaMuon", "HybridMuon"): | ||
| raise ValueError(f"Not supported optimizer type '{opt_type}'") | ||
| opt_param = { | ||
| # LKF parameters | ||
| "kf_blocksize": params.get("kf_blocksize", 5120), | ||
| "kf_start_pref_e": params.get("kf_start_pref_e", 1), | ||
| "kf_limit_pref_e": params.get("kf_limit_pref_e", 1), | ||
| "kf_start_pref_f": params.get("kf_start_pref_f", 1), | ||
| "kf_limit_pref_f": params.get("kf_limit_pref_f", 1), | ||
| "kf_blocksize": params.get("kf_blocksize"), | ||
| "kf_start_pref_e": params.get("kf_start_pref_e"), | ||
| "kf_limit_pref_e": params.get("kf_limit_pref_e"), | ||
| "kf_start_pref_f": params.get("kf_start_pref_f"), | ||
| "kf_limit_pref_f": params.get("kf_limit_pref_f"), | ||
| # Common parameters | ||
| "weight_decay": params.get("weight_decay", 0.001), | ||
| "weight_decay": params.get("weight_decay"), | ||
| # Muon/AdaMuon parameters | ||
| "momentum": params.get("momentum", 0.95), | ||
| "adam_beta1": params.get("adam_beta1", 0.9), | ||
| "adam_beta2": params.get("adam_beta2", 0.95), | ||
| "lr_adjust": params.get("lr_adjust", 10.0), | ||
| "lr_adjust_coeff": params.get("lr_adjust_coeff", 0.2), | ||
| "muon_2d_only": params.get("muon_2d_only", True), | ||
| "min_2d_dim": params.get("min_2d_dim", 1), | ||
| "momentum": params.get("momentum"), | ||
| "adam_beta1": params.get("adam_beta1"), | ||
| "adam_beta2": params.get("adam_beta2"), | ||
| "lr_adjust": params.get("lr_adjust"), | ||
| "lr_adjust_coeff": params.get("lr_adjust_coeff"), | ||
| "muon_2d_only": params.get("muon_2d_only"), | ||
| "min_2d_dim": params.get("min_2d_dim"), | ||
| } | ||
| return opt_type, opt_param |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid silent fallback to Adam when legacy opt_type is provided.
If a caller bypasses the compatibility converter and still uses opt_type, the current logic defaults to Adam without warning, which can silently change training behavior. Consider mapping the legacy key (with a warning) or raising a targeted error.
🔧 Suggested guard for legacy key
- opt_type = params.get("type", "Adam")
+ opt_type = params.get("type")
+ if opt_type is None and "opt_type" in params:
+ log.warning(
+ "optimizer.opt_type is deprecated; use optimizer.type or run convert_optimizer_to_new_format."
+ )
+ opt_type = params["opt_type"]
+ if opt_type is None:
+ opt_type = "Adam"🧰 Tools
🪛 Ruff (0.14.13)
170-170: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In `@deepmd/pt/train/training.py` around lines 160 - 189, get_opt_param currently
silently defaults to "Adam" when callers use the legacy key "opt_type" or omit
"type"; change get_opt_param to check for a legacy "opt_type" key first and if
present copy it into the canonical "type" with a deprecation warning (use
warnings.warn(..., DeprecationWarning) or the module logger) before validating
opt_type, and if neither key is present raise a ValueError instead of silently
defaulting to "Adam"; reference the function get_opt_param and the symbols
opt_type, params["type"], and params["opt_type"] when implementing this guard
and warning.
| # Case 1: Old format - optimizer params in training section | ||
| if "opt_type" in training_cfg: | ||
| # Optimizer parameters that may be in the training section | ||
| optimizer_keys = [ | ||
| "opt_type", | ||
| "kf_blocksize", | ||
| "kf_start_pref_e", | ||
| "kf_limit_pref_e", | ||
| "kf_start_pref_f", | ||
| "kf_limit_pref_f", | ||
| "weight_decay", | ||
| "momentum", | ||
| "muon_momentum", | ||
| "adam_beta1", | ||
| "adam_beta2", | ||
| "lr_adjust", | ||
| "lr_adjust_coeff", | ||
| "muon_2d_only", | ||
| "min_2d_dim", | ||
| ] | ||
|
|
||
| # Extract optimizer parameters from training section | ||
| extracted_cfg = {} | ||
| for key in optimizer_keys: | ||
| if key in training_cfg: | ||
| extracted_cfg[key] = training_cfg.pop(key) | ||
|
|
||
| # Convert opt_type to type for new format | ||
| if "opt_type" in extracted_cfg: | ||
| extracted_cfg["type"] = extracted_cfg.pop("opt_type") | ||
|
|
||
| # Merge with existing optimizer config (conversion takes precedence) | ||
| optimizer_cfg = {**optimizer_cfg, **extracted_cfg} | ||
|
|
||
| if warning: | ||
| warnings.warn( | ||
| "Placing optimizer parameters (opt_type, kf_blocksize, etc.) in the training section " | ||
| "is deprecated. Use a separate 'optimizer' section with 'type' field instead.", | ||
| DeprecationWarning, | ||
| stacklevel=2, | ||
| ) | ||
|
|
||
| # Case 2: Fill in missing defaults | ||
| # If type is not specified, default to Adam | ||
| if "type" not in optimizer_cfg: | ||
| optimizer_cfg["type"] = default_optimizer["type"] | ||
|
|
||
| # Fill in defaults for Adam optimizer type | ||
| if optimizer_cfg["type"] in ("Adam", "AdamW"): | ||
| for key, value in default_optimizer.items(): | ||
| if key not in optimizer_cfg: | ||
| optimizer_cfg[key] = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle legacy optimizer keys even when opt_type is absent (and include muon_min_2d_dim).
Conversion only runs when opt_type exists, so older configs that set adam_beta1 / weight_decay (but relied on default optimizer) will leave those keys in training and fail normalization. Also the legacy alias muon_min_2d_dim isn’t moved. Consider converting whenever any legacy optimizer key is present and defaulting type to Adam if missing.
🛠️ Suggested fix
- # Case 1: Old format - optimizer params in training section
- if "opt_type" in training_cfg:
- # Optimizer parameters that may be in the training section
- optimizer_keys = [
- "opt_type",
- "kf_blocksize",
- "kf_start_pref_e",
- "kf_limit_pref_e",
- "kf_start_pref_f",
- "kf_limit_pref_f",
- "weight_decay",
- "momentum",
- "muon_momentum",
- "adam_beta1",
- "adam_beta2",
- "lr_adjust",
- "lr_adjust_coeff",
- "muon_2d_only",
- "min_2d_dim",
- ]
-
- # Extract optimizer parameters from training section
- extracted_cfg = {}
- for key in optimizer_keys:
- if key in training_cfg:
- extracted_cfg[key] = training_cfg.pop(key)
+ # Optimizer parameters that may be in the training section
+ optimizer_keys = [
+ "opt_type",
+ "kf_blocksize",
+ "kf_start_pref_e",
+ "kf_limit_pref_e",
+ "kf_start_pref_f",
+ "kf_limit_pref_f",
+ "weight_decay",
+ "momentum",
+ "muon_momentum",
+ "adam_beta1",
+ "adam_beta2",
+ "lr_adjust",
+ "lr_adjust_coeff",
+ "muon_2d_only",
+ "muon_min_2d_dim",
+ "min_2d_dim",
+ ]
+
+ # Case 1: Old format - optimizer params in training section
+ legacy_keys = [k for k in optimizer_keys if k in training_cfg]
+ if legacy_keys:
+ # Extract optimizer parameters from training section
+ extracted_cfg = {k: training_cfg.pop(k) for k in legacy_keys}
# Convert opt_type to type for new format
if "opt_type" in extracted_cfg:
extracted_cfg["type"] = extracted_cfg.pop("opt_type")
+ elif "type" not in optimizer_cfg:
+ extracted_cfg["type"] = default_optimizer["type"]🤖 Prompt for AI Agents
In `@deepmd/utils/compat.py` around lines 426 - 477, The conversion currently only
runs when "opt_type" is present so legacy optimizer keys left in training_cfg
(e.g., adam_beta1, weight_decay) are ignored; update the logic in the block
around training_cfg/optimizer_keys (symbols: training_cfg, optimizer_keys,
extracted_cfg, optimizer_cfg) to trigger extraction when any legacy key is
present (not just opt_type), include the legacy alias "muon_min_2d_dim" by
adding it to optimizer_keys and mapping it to "min_2d_dim" when moving keys,
then proceed to convert opt_type->type if present and merge into optimizer_cfg
(preserving conversion precedence) and keep the existing defaulting to
default_optimizer/type behavior.
| opt_type = params.get("type", "Adam") | ||
| if opt_type not in ("Adam", "AdamW", "LKF", "AdaMuon", "HybridMuon"): | ||
| raise ValueError(f"Not supported optimizer type '{opt_type}'") | ||
| opt_param = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest not formatting opt_param here. For example, when using the Adam optimizer, the KF parameters are not applicable and therefore should not be retrieved from the params directory.
one may simply assign params to opt_param
| weight_decay=float(self.opt_param["weight_decay"]), | ||
| fused=False if DEVICE.type == "cpu" else True, | ||
| ) | ||
| if self.opt_type == "Adam": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here get parameters from the self.opt_param based on the type of the optimizer.
Summary by CodeRabbit
New Features
Documentation
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.