Skip to content

Mix batch 0429#5439

Open
littlepeachs wants to merge 4 commits intodeepmodeling:masterfrom
littlepeachs:mix_batch_0429
Open

Mix batch 0429#5439
littlepeachs wants to merge 4 commits intodeepmodeling:masterfrom
littlepeachs:mix_batch_0429

Conversation

@littlepeachs
Copy link
Copy Markdown

@littlepeachs littlepeachs commented May 9, 2026

Add mixed-nloc LMDB batching for PyTorch training

Summary

This PR adds optional mixed_batch support to the PyTorch LMDB training and validation data path.

When mixed_batch is enabled, one LMDB batch may contain frames with different atom counts / nloc values. Atom-wise fields are flattened across the batch, frame ownership is tracked with batch and ptr, and the model runs through a flat forward path without padding.

The default behavior is unchanged. mixed_batch is false by default, so LMDB batches still use the existing same-nloc sampler unless the option is explicitly enabled.

Motivation

LMDB batches previously required all frames in a batch to have the same nloc. This can make batching inefficient for datasets that mix systems with different atom counts. Mixed batching lets those frames train together while keeping atom indexing explicit in a flattened graph layout.

Main Changes

  • Add mixed_batch to PyTorch LMDB training_data and validation_data.
  • Keep mix_batch as an alias for the same option.
  • In mixed_batch=true mode, interpret batch_size as the number of frames/systems in one mixed batch.
  • Add a mixed LMDB collate path that concatenates atom-wise fields, stacks frame-wise fields, and emits batch and ptr for frame ownership.
  • Precompute flat graph metadata and indices during collation for the DPA3/RepFlow path.
  • Add flat forward support through the training wrapper, energy model, atomic model, descriptors, fitting net, and energy loss.
  • Keep atom-wise outputs flat and reduce frame-wise quantities using the global atom-to-frame mapping.

Usage

{
  "training": {
    "training_data": {
      "systems": "path/to/train.lmdb",
      "batch_size": 8,
      "mixed_batch": true
    },
    "validation_data": {
      "systems": "path/to/valid.lmdb",
      "batch_size": 8,
      "mixed_batch": true
    }
  }
}

With mixed_batch=true, batch_size controls how many frames/systems are merged into one flattened batch.

With mixed_batch=false, the existing same-nloc LMDB batching behavior is preserved.

Tests

  • Add unit tests for mixed-nloc LMDB collation.
  • Cover flattened batch / ptr layout and flat graph precomputation.
  • Cover the mix_batch config alias.
  • Add a DPA3 LMDB mixed-batch example config.
  • Add test_mixed_batch.sh for end-to-end training coverage.

Scope

  • This change applies to the PyTorch LMDB data path.
  • The flat graph path is currently used by the DPA3/RepFlow descriptor flow.

Summary by CodeRabbit

  • New Features

    • Added support for mixed-batch LMDB training, enabling efficient training on datasets with frames containing varying numbers of atoms.
    • Introduced mixed_batch configuration option for training and validation datasets.
    • Enhanced model architecture with flat-batch forward paths for improved performance with heterogeneous systems.
  • Improvements

    • Improved energy loss normalization for mixed-batch inputs.

Review Change Stack

@github-actions github-actions Bot added the Python label May 9, 2026
@dosubot dosubot Bot added the new feature label May 9, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 9, 2026

📝 Walkthrough

Walkthrough

This PR implements mixed-batch LMDB training for DeepMD-kit's PyTorch backend, enabling frames with differing atom counts in single batches. New flat-batch forward paths flatten per-atom tensors across frames, precompute graph structures (neighbor lists, edge/angle indices), and reorganize model inference around flattened atom representations. The LMDB collator flattens atom-wise data and optionally precomputes graph metadata. Loss normalization, data loading, and training infrastructure are updated to detect and handle mixed batches via batch/ptr bookkeeping.

Changes

Mixed-batch LMDB Training Pipeline

Layer / File(s) Summary
Configuration & Type Contracts
deepmd/utils/argcheck.py, deepmd/pt/utils/nlist.py, deepmd/pt/utils/lmdb_dataset.py
Added mixed_batch (alias mix_batch) boolean config options to training_data and validation_data with documentation. Introduced FlatGraphData type alias and collate-related type exports for backward compatibility.
Graph Precomputation Utilities
deepmd/pt/utils/nlist.py, deepmd/pt/model/network/graph_utils_flat.py
New build_precomputed_flat_graph computes flattened neighbor/angle lists from per-frame coordinates and pointers. Added rebuild_extended_coord_from_flat_graph, get_central_ext_index, and extend_input_and_build_neighbor_list_with_images for coordinate/index transformations. Refactored ghost-extension logic into _extend_coord_with_ghosts_impl with optional image tracking. New get_graph_index_flat derives edge and angle indices from flat neighbor lists.
Descriptor Flat-Batch Paths
deepmd/pt/model/descriptor/env_mat.py, deepmd/pt/model/descriptor/repflows.py, deepmd/pt/model/descriptor/dpa3.py
Added prod_env_mat_flat to compute smooth environment matrices from flattened coordinates and neighbor lists. DescrptBlockRepflows.forward_flat processes flat-graph precomputed neighbor data and optional dynamic edge/angle indices through embeddings and RepFlow layers. DescrptDPA3.forward_flat casts extended coordinates, computes type embeddings, optionally adds charge/spin embeddings, and delegates to RepFlow descriptor.
Fitting Network & Atomic Model
deepmd/pt/model/task/invar_fitting.py, deepmd/pt/model/atomic_model/dp_atomic_model.py
InvarFitting.forward_flat converts flat atom inputs (batch/ptr) into padded per-frame tensors, calls the existing forward, then re-flattens outputs. DPAtomicModel.forward_common_atomic_flat runs descriptor and fitting flat paths, derives central atom indices, applies per-atom masking via atom_excl, and stores mask/predictions.
Energy Model & Model Initialization
deepmd/pt/model/model/ener_model.py, deepmd/pt/model/model/__init__.py, deepmd/pt/model/model/make_model.py
EnergyModel.forward now routes to forward_common_flat when batch and ptr are provided. Added child_seed import and default seed initialization for descriptors/fitting nets. CM model class gained forward_common_flat_native (rebuilds extended coordinates), forward_common_lower_flat (atomic inference + energy reduction), _compute_derivatives_flat (autograd-based gradients), and forward_common_flat (public wrapper).
Loss & Training Wrapper
deepmd/pt/loss/ener.py, deepmd/pt/train/wrapper.py
EnergyStdLoss.forward detects mixed batches via ptr and computes per-frame-averaged atom normalization. ModelWrapper.forward signature extended with optional flat-graph tensors; when batch and ptr are present, augments input_dict with extended geometry and precomputed graph structures.
LMDB Collation & Dataset
deepmd/pt/utils/lmdb_dataset.py
New _collate_lmdb_mixed_batch flattens atom-wise tensors across frames while preserving per-frame structure for energy/box; constructs batch/ptr for bookkeeping. make_lmdb_mixed_batch_collate factory optionally precomputes flat-graph data via build_precomputed_flat_graph. LmdbDataset detects mixed atom counts and routes to mixed-batch collator; when mixed_batch=True, creates internal dataloader with mixed-batch collation instead of per-nloc sampler.
Training Entrypoint & Trainer
deepmd/pt/entrypoints/main.py, deepmd/pt/train/training.py
get_trainer passes mixed_batch parameter from training/validation data config into LmdbDataset. Trainer's get_dataloader_and_iter_lmdb implements mixed-batch path: selects sampler (random/sequential by shuffle), derives graph_config from model descriptor, uses make_lmdb_mixed_batch_collate collator. Trainer.get_data() detects mixed-batch format via batch/ptr, moves those tensors to device, and extends input keys with precomputed flat-graph fields.
Tests, Examples & Configuration
source/tests/pt/test_lmdb_dataloader.py, test_mixed_batch.sh, test_mptraj/lmdb_baseline.json, test_mptraj/lmdb_mixed_batch.json, .gitignore
Added test_mixed_batch_init to verify mixed-batch dataset setup, test_collate_mixed_nloc_flattens_atomwise to check collation output structure, test_mixed_batch_collate_precomputes_graph for graph precomputation, and test_mix_batch_arg_alias for config alias normalization. Bash script test_mixed_batch.sh runs mixed-batch training. Created lmdb_baseline.json and lmdb_mixed_batch.json configs with full model/training specs. Updated .gitignore to skip temporary directories.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • deepmodeling/deepmd-kit#5408: Refactors LMDB collate logic with shared helper functions; this PR implements the actual mixed-batch collation and LmdbDataset changes.
  • deepmodeling/deepmd-kit#5283: Prior PyTorch LMDB data-loading additions to same modules (lmdb_dataset.py, training.py, main.py); this PR extends with mixed-batch support.
  • deepmodeling/deepmd-kit#4661: Modifies PyTorch training/inference pipeline and ModelWrapper data flow; this PR adds flat-graph tensors to the same wrapper/entrypoint interface.

Suggested labels

new feature, Python, Examples

Suggested reviewers

  • iProzd
  • wanghan-iapcm
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Mix batch 0429' is vague and generic, referring only to a feature name and a date code without clearly conveying the primary change or scope of the pull request. Revise the title to be more descriptive and specific. For example: 'Add mixed-nloc batch support for LMDB training with flat graph precomputation' or 'Implement optional mixed-batch data loading for PyTorch LMDB path'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch mix_batch_0429

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
deepmd/pt/loss/ener.py (1)

412-424: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Generalized-force loss still assumes a single natoms for every frame.

This branch reshapes force and drdq with natoms * 3, which only works for uniform-size batches. In mixed-batch mode those tensors are flattened across frames, so enabling generalized-force loss here will mis-shape the data or crash.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/loss/ener.py` around lines 412 - 424, The generalized-force branch
in ener.py assumes a single natoms by reshaping force_pred, force_label and drdq
with natoms * 3, which breaks for mixed-size batches; instead, use frame-aware
shapes: do not flatten all frames into one dimension—reshape or iterate using
per-frame atom counts (or keep forces as (nframes, natoms_i, 3) and drdq as
(nframes, natoms_i*3, n_gcoord)) and compute gen_force and gen_force_label
per-frame (or via batched operations that use the original nframes dimension) so
tensor dims align; also apply find_drdq/pref_gf per-frame. Locate symbols:
has_gf, drdq, find_drdq, pref_gf, force_pred, force_label,
force_reshape_nframes, force_label_reshape_nframes, drdq_reshape, gen_force,
gen_force_label and update the reshaping/Einstein-summation to operate over the
nframes dimension (or loop over frames) instead of using natoms * 3 across the
whole batch.
🧹 Nitpick comments (13)
deepmd/pt/model/descriptor/repflows.py (1)

840-874: 💤 Low value

Make the disjoint-frame invariant explicit.

The synthetic batch=1 wrap is only correct because RepFlowLayer.forward is strictly per-atom + per-neighbor, and because nlist/a_nlist here never cross frame boundaries (the LMDB collator builds per-frame extended atoms and offsets neighbor indices accordingly, then maps back via per-frame mapping). A brief comment near this block stating that invariant — and noting that any future RepFlowLayer change introducing global ops over the batch axis would silently mix frames — will save a future debugging session.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/model/descriptor/repflows.py` around lines 840 - 874, Add a short
explanatory comment above the synthetic batch=1 wrapping that states the
disjoint-frame invariant: explain that the temporary batching of tensors like
node_ebd_batched, edge_ebd_batched, h2_batched, angle_ebd_batched,
nlist_batched, a_nlist_batched, etc. is safe because RepFlowLayer.forward
operates strictly per-atom and per-neighbor and the LMDB collator guarantees
nlist/a_nlist never reference atoms across different frames (mapping/offsets
keep frames disjoint); also warn that if RepFlowLayer.forward (or any layer in
self.layers) later introduces global ops over the batch axis the synthetic batch
will mix frames and break correctness, so such changes must preserve the
per-frame isolation or remove this batching trick.
deepmd/pt/utils/nlist.py (2)

101-170: 🏗️ Heavy lift

Per-frame Python loop with .item() calls forces a CPU/GPU sync each iteration.

For large batch_size (e.g., 128 in lmdb_baseline.json), this loop incurs 2 * batch_size device syncs from int(ptr[...].item()) plus per-frame kernel launches for extend_coord_with_ghosts_with_images and build_neighbor_list. Even though it runs in dataloader workers, this can dominate end-to-end throughput when frames have small atom counts. Two relatively cheap mitigations:

  • Pull ptr to CPU once before the loop (ptr_cpu = ptr.cpu().tolist()) and index lists directly, eliminating per-iteration syncs.
  • If the data loader ultimately runs on CPU tensors (typical for collate_fn), the per-frame work is largely free of device sync but the loop still costs Python overhead — consider a vectorized batched ghost-extension that operates on a padded tensor and then re-flattens, mirroring the existing extend_input_and_build_neighbor_list_with_images.

Not blocking, but worth measuring before this lands as the default mixed-batch path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/utils/nlist.py` around lines 101 - 170, The loop is causing
repeated device syncs via int(ptr[frame_idx].item()) and per-frame kernel
overhead in extend_coord_with_ghosts_with_images and build_neighbor_list; fix by
materializing ptr on CPU once (e.g., ptr_cpu = ptr.cpu().tolist()) and use
ptr_cpu[frame_idx]/ptr_cpu[frame_idx+1] for start_idx/end_idx to eliminate
.item() calls, and where possible call
normalize_coord/extend_coord_with_ghosts_with_images/build_neighbor_list on
already-CPU tensors or add a batched/padded variant (similar to
extend_input_and_build_neighbor_list_with_images) to process frames in a
vectorized way to reduce Python per-frame overhead.

262-281: 💤 Low value

Document the "first nloc extended atoms per frame are local" assumption.

get_central_ext_index relies on the layout produced by _extend_coord_with_ghosts_impl, where shift indices are sorted by L2 norm and (0,0,0) is therefore the first image — making the first nloc extended atoms in each frame the local ones. This invariant is implicit; a one-line comment will save a future maintainer who changes the ghost layout from silently breaking this helper.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/utils/nlist.py` around lines 262 - 281, Add a one-line doc comment
inside get_central_ext_index stating the invariant it relies on: that the
extended layout produced by _extend_coord_with_ghosts_impl places the unshifted
image (shift (0,0,0)) first for each frame and thus the first nloc extended
atoms per frame are the local atoms; mention that shift indices are sorted by L2
norm so the (0,0,0) image is first and changing that ordering will break
get_central_ext_index.
.gitignore (1)

81-81: 💤 Low value

deepmd-kit/ ignore pattern is overly broad.

Ignoring a directory named after the project itself can mask legitimate content if the repo or a checkout is ever placed under a deepmd-kit/ subdirectory (e.g., nested workspaces, tooling that clones into a project-named folder). Consider narrowing or scoping the rule to only the artifact path you intend to ignore.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.gitignore at line 81, The .gitignore entry "deepmd-kit/" is too broad and
may unintentionally ignore legitimately nested repos; update the ignore rule
that currently uses deepmd-kit/ to a more specific pattern (for example scope it
to the repo root by changing to "/deepmd-kit/" or better yet target only
build/dist/artifact paths like "/deepmd-kit/build", "/deepmd-kit/dist", or
specific filenames/patterns such as "deepmd-kit/*.egg-info") so only the
intended artifact directories/files are ignored; locate the deepmd-kit/ entry in
the .gitignore and replace it with one or more narrow, explicit patterns.
deepmd/pt/model/task/invar_fitting.py (2)

299-315: 💤 Low value

Atom-wise output detection is shape-heuristic; consider being explicit.

The current rule un-pads any tensor whose first two dims happen to be (nframes, max_nloc) and leaves everything else untouched. That works for the current outputs (var_name, middle_output), but a future output that returns frame-level shape (nframes, max_nloc) would be silently un-padded, and any per-atom output whose first axis is something other than nframes would be silently passed through. Consider keying off output_def() (which already declares atomic vs. frame-level variables) instead of tensor shape.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/model/task/invar_fitting.py` around lines 299 - 315, The tensor
un-padding currently uses a shape heuristic in the loop that builds result_flat;
instead use the model's declared output types from output_def() to decide which
keys are atom-wise. Replace the shape-based condition in the for key, value in
result_batch loop with a lookup like output_def()[key].is_atomic (or equivalent
in your model API) and only apply valid_atom_mask to values for keys declared
atomic; leave frame-level or other outputs untouched and handle non-tensor
values as before (keep variable names result_batch, result_flat,
valid_atom_mask, and call output_def() to determine atomic vs frame-level).

259-264: 💤 Low value

Padding atype = -1 silently wraps to the last type in PyTorch indexing.

Per-type lookups inside self.forward (bias_atom_e/case embedding/etc.) use tensor[atype], where -1 does not error out — it wraps to the last type. Outputs are masked away later, so this does not corrupt the result, but it does waste compute on the padding rows and can interact unexpectedly with any branch that special-cases negative types. Using a valid sentinel like 0 is safer and semantically clearer; the valid_atom_mask continues to discard the rows.

♻️ Suggested change
-        atype_batch = torch.full(
-            (nframes, max_nloc),
-            -1,
-            dtype=atype.dtype,
-            device=device,
-        )
+        # Use 0 as a safe dummy type for padding rows; outputs are discarded
+        # by valid_atom_mask below so the value is not observable.
+        atype_batch = torch.zeros(
+            (nframes, max_nloc),
+            dtype=atype.dtype,
+            device=device,
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/model/task/invar_fitting.py` around lines 259 - 264, The padding
value for atype_batch is set to -1 which causes PyTorch indexing (e.g., in
self.forward when doing per-type lookups like bias_atom_e[atype] or embeddings)
to wrap to the last type; change the sentinel to a valid non-negative index
(e.g., 0 or a dedicated PAD type index) when creating atype_batch so lookups
won't accidentally reference real types, and keep using valid_atom_mask to
discard those rows downstream; update places that assume negative sentinel
behavior to use the new PAD index (refer to atype_batch, self.forward,
bias_atom_e, and valid_atom_mask).
deepmd/pt/model/network/graph_utils_flat.py (1)

41-89: 💤 Low value

LGTM.

Edge index construction (n2e_index / n_ext2e_index) and the angle index mapping via edge_lookup are consistent with the consumers in repflows.py and the dynamic-selection path. The a_nlist_mask_3d logic correctly requires both neighbors of an angle to be valid before producing a triplet. The single CPU sync via n_edge = ... .item() is necessary for torch.arange.

One small optional suggestion: an assert a_sel <= nnei near the top would surface mis-configured inputs as a clear error rather than a downstream broadcast failure (DescrptDPA3 already enforces e_sel >= a_sel, so this is mostly defensive).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/model/network/graph_utils_flat.py` around lines 41 - 89, Add a
defensive check to guard against mis-configured inputs by asserting that a_sel
(from a_nlist_mask.shape[1]) does not exceed nnei (from nlist_flat.shape[1])
near the top of the function; insert something like an assert or raise
ValueError after computing nnei and a_sel so that functions using n2e_index,
edge_lookup_a, and the a_nlist_mask_3d broadcasts fail fast with a clear message
instead of producing downstream broadcast errors.
deepmd/pt/model/model/make_model.py (3)

520-599: 💤 Low value

forward_common_flat is a pure passthrough to forward_common_flat_native.

The wrapper adds no behavior — same parameters, same return — and just forwards every argument. Two near-identical method definitions (with duplicated docstrings) is a maintenance hazard and adds friction for readers tracing the call. Either (a) inline forward_common_flat_native into forward_common_flat and drop the _native variant, or (b) document a clear reason for the split (e.g. JIT export boundary, planned non-native variant) so the duplication is justified.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/model/model/make_model.py` around lines 520 - 599,
forward_common_flat is a pure passthrough to forward_common_flat_native with
duplicated docstrings, creating maintenance overhead; either remove the
redundant wrapper by merging forward_common_flat_native into forward_common_flat
(rename/remove the _native variant and keep a single method and docstring) or
add a concise comment/docstring to forward_common_flat explaining the explicit
reason for the split (e.g., JIT/export boundary or future divergence). Update
any call sites that reference forward_common_flat_native to use the single
chosen method name (forward_common_flat) and ensure tests/type hints still pass.

296-309: 💤 Low value

Move the helper import to module level and tidy the validation order.

Two minor suggestions:

  1. from deepmd.pt.utils.nlist import rebuild_extended_coord_from_flat_graph is imported every call inside the hot forward path (line 296). Promote it to the existing nlist import block at the top of the file.
  2. The requires_grad_(True) work on coord/box at lines 280-282 happens before the field-presence validation at lines 283-309. If the validation raises, the clone+detach is wasted. Reorder so the validation runs first.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/model/model/make_model.py` around lines 296 - 309, Move the helper
import for rebuild_extended_coord_from_flat_graph out of the hot forward path
and add it to the module-level nlist import block with the other
deepmd.pt.utils.nlist imports so it is imported once at load time; also change
the validation order inside the flat mixed-batch branch so you check for
presence of the required graph fields (mapping, extended_batch, extended_image,
etc.) and raise the RuntimeError before mutating tensors, and only call
coord.requires_grad_(True) / box.requires_grad_(True) (and any clone+detach)
after the validation passes to avoid wasted work when validation fails.

311-314: 💤 Low value

Redundant assertions after the explicit None-check.

Lines 283-295 already verify extended_atype, extended_batch, mapping, nlist, nlist_ext, etc. are not None (otherwise the else raises), so the four assert statements at lines 311-314 are redundant at runtime. If they exist solely for type-narrowing under static checkers, a comment to that effect would help future readers. Otherwise they can be removed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/model/model/make_model.py` around lines 311 - 314, The four assert
statements for extended_atype, extended_batch, mapping, and nlist are redundant
because the preceding explicit None-check/else already guarantees they are not
None; either remove those assert lines or replace them with a short explanatory
comment indicating they exist purely for static type-narrowing (for tools like
mypy) so future readers understand their purpose; locate the checks around the
make_model function where extended_atype, extended_batch, mapping, and nlist are
validated and update accordingly.
deepmd/pt/train/training.py (1)

1690-1723: 💤 Low value

Minor structural cleanup in get_data for mixed-batch handling.

The current loop skips batch/ptr (lines 1697-1698) only to transfer them separately at lines 1722-1723. Since they are tensors, they would be handled correctly by the generic not isinstance(..., list) branch at line 1699-1701. You can drop the explicit skip and the post-loop transfer, then just extend input_keys when is_mixed_batch is true:

♻️ Proposed simplification
         for key in batch_data.keys():
             if key == "sid" or key == "fid" or "find_" in key:
                 continue
-            # Skip batch and ptr for now, will handle them separately
-            elif key == "batch" or key == "ptr":
-                continue
             elif not isinstance(batch_data[key], list):
                 if batch_data[key] is not None:
                     batch_data[key] = batch_data[key].to(DEVICE, non_blocking=True)
@@
         # Mixed-nloc LMDB batches include precomputed flat-graph tensors.
         if is_mixed_batch:
             input_keys = input_keys + list(_FLAT_GRAPH_INPUT_KEYS)
-            batch_data["batch"] = batch_data["batch"].to(DEVICE, non_blocking=True)
-            batch_data["ptr"] = batch_data["ptr"].to(DEVICE, non_blocking=True)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/train/training.py` around lines 1690 - 1723, In get_data, remove
the special-case skipping of "batch" and "ptr" inside the batch_data loop and
delete the separate post-loop device transfers for batch_data["batch"] and
batch_data["ptr"]; instead let the existing non-list tensor branch handle them
(they'll be moved to DEVICE automatically), and only keep the is_mixed_batch
detection so you can extend input_keys with list(_FLAT_GRAPH_INPUT_KEYS) when
is_mixed_batch is true (symbols: get_data, batch_data, is_mixed_batch,
input_keys, _FLAT_GRAPH_INPUT_KEYS, DEVICE).
deepmd/pt/model/model/ener_model.py (1)

92-176: ⚡ Quick win

Extract the shared output-assembly block to avoid maintaining two near-identical paths.

Lines 117-143 (flat path) and 153-174 (regular path) build model_predict in essentially the same way (atom_energy/energy, conditional force/dforce, virials, mask, hessian). Two parallel implementations are a maintenance hazard — future output keys or shape adjustments must be mirrored in both, and any drift becomes silent. Consider extracting a small helper like _assemble_energy_predict(model_ret, do_atomic_virial) and calling it from both branches, with the flat branch additionally adding model_predict["updated_coord"] += coord in the no-fitting-net case.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/model/model/ener_model.py` around lines 92 - 176, The two branches
in forward (flat and regular) duplicate the same output-assembly logic for
model_predict; extract that into a helper method (e.g.
_assemble_energy_predict(model_ret, do_atomic_virial)) that encapsulates setting
"atom_energy", "energy", choosing "force" vs "dforce" using do_grad_r("energy"),
adding "virial" and "atom_virial" via do_grad_c("energy") and do_atomic_virial,
including "mask" and "hessian" when present/when _hessian_enabled; then call
this helper from both the flat-path and regular-path after receiving model_ret,
and keep the flat/regular-specific behavior (the no-fitting-net case that does
model_predict["updated_coord"] += coord) in the respective branch only. Ensure
the helper references self.do_grad_r, self.do_grad_c, self._hessian_enabled and
self.get_fitting_net() behavior remains consistent.
deepmd/pt/train/wrapper.py (1)

237-247: 💤 Low value

natoms semantics now diverge between flat and regular batches; document the contract.

For regular batches atype.dim() > 1 and natoms = atype.shape[-1] is per-frame nloc, but for mixed batches atype is 1-D and natoms = atype.shape[0] is the total flattened atom count across all frames. Downstream loss code must distinguish these by inspecting ptr in input_dict to normalize correctly — that's a fragile, implicit contract spread across wrapper.py, ener.py, and make_model.py.

Consider either (a) renaming/splitting the argument so callees see total_atoms vs nloc_per_frame explicitly, or (b) adding a docstring/inline note here that clearly states the dual semantics and that the loss must branch on ptr. The current inline comment on lines 237-238 helps but does not surface the downstream consumer expectation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/train/wrapper.py` around lines 237 - 247, The natoms value
currently has dual semantics (per-frame nloc when atype.dim()>1 vs total
flattened atom count when 1-D) which is implicit and fragile; fix by making the
contract explicit: compute both nloc_per_frame (e.g., atype.shape[-1] when
atype.dim()>1 else None) and total_atoms (atype.numel() or atype.shape[0]) in
wrapper.py and change the call to self.loss[task_key](input_dict,
self.model[task_key], label, nloc_per_frame=nloc_per_frame,
total_atoms=total_atoms, learning_rate=cur_lr), then update all loss
implementations (ener.py) and any factory in make_model.py to accept these new
named args (or provide backward-compatible handling) so downstream code no
longer must infer semantics from ptr.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.gitignore:
- Line 78: The .gitignore entry "test_mptraj/" conflicts with tracked files
committed in this PR (test_mptraj/lmdb_baseline.json and
test_mptraj/lmdb_mixed_batch.json); fix by either removing the directory-level
ignore line "test_mptraj/" or by adding explicit negation patterns for the
tracked files (e.g., "!test_mptraj/lmdb_baseline.json" and
"!test_mptraj/lmdb_mixed_batch.json") so future edits to those config files are
not hidden and git add won't require -f; update .gitignore accordingly and
ensure the two committed files remain tracked.

In `@deepmd/pt/loss/ener.py`:
- Around line 228-240: The mixed-batch normalization currently computes a single
scalar atom_norm via atom_norm = mean(1.0 / natoms_per_frame) which misweights
frames with different atom counts; instead compute per-frame normalizers and
apply them before reducing l2_ener_loss and l2_virial_loss: when is_mixed_batch
(detect via ptr and natoms_per_frame derived from ptr), build per-frame
atom_norms = 1.0 / natoms_per_frame.float() and multiply each frame's loss by
its corresponding atom_norm before taking the final mean/sum (or perform a
weighted reduction using these per-frame weights) so that l2_ener_loss and
l2_virial_loss are normalized per-frame correctly rather than using a single
averaged scalar.

In `@deepmd/pt/model/atomic_model/dp_atomic_model.py`:
- Around line 378-390: The flat-path call to descriptor.forward_flat currently
passes fparam unmodified and thus skips the fallback logic used in
forward_atomic; update the flat-path before calling descriptor.forward_flat (the
block around descriptor.forward_flat) to mirror forward_atomic's behavior: if
self.add_chg_spin_ebd (or the equivalent flag) is True and fparam is None,
compute the default via self.get_default_fparam(...) using the same inputs/shape
logic used in forward_atomic and assign it to fparam, then pass that fparam into
descriptor.forward_flat so mixed-batch runs use the same default frame
parameters.

In `@deepmd/pt/model/model/__init__.py`:
- Around line 90-98: The hybrid-descriptor logic in
_set_default_descriptor_init_seed (and the analogous block later) always derives
child seeds from the passed-in default `seed` instead of the effective parent
config seed; update the logic to first determine the parent_seed =
params.get("seed", seed) (i.e., use params["seed"] when present, otherwise fall
back to the incoming seed) and then call child_seed(parent_seed, idx) when
recursing into each hybrid child (preserving the existing
isinstance(descriptor_params, dict) check and recursive call to
_set_default_descriptor_init_seed).

In `@deepmd/pt/model/model/make_model.py`:
- Around line 497-518: The flat-path branch in make_model.py quietly skips
atomic virial when do_grad_c("energy") is true and do_atomic_virial is true (the
pass under if do_atomic_virial), causing fit_ret to miss
atom_virial/energy_derv_c keys and silently breaking callers (e.g.,
EnergyModel.forward reading model_ret["energy_derv_c"]); update the branch in
the block handling energy_derv_c_redu so that if do_atomic_virial is true you
either (A) compute and populate the appropriate atomic virial / energy_derv_c
entries in fit_ret consistent with the reduced-to-atom shape used elsewhere (use
energy_redu, energy_derv_c_redu and ptr to derive per-atom contributions), or if
that is nontrivial, (B) raise a clear NotImplementedError (or emit a one-time
logged warning and raise) referencing do_atomic_virial and the flat-path so
callers know the feature is unsupported; ensure the error/warning mentions
make_model.make_model (or the surrounding function) and that fit_ret must
include the keys model_predict/EnergyModel.forward expects (e.g.,
"energy_derv_c" / atom_virial) to avoid silent failures.
- Around line 217-352: The flat forward path forward_common_flat_native is
missing the same input/output precision normalization used in forward_common;
call the model's _input_type_cast on incoming tensors (coord, box, fparam,
aparam, atype, batch, mapping, extended_* inputs as appropriate) at the start so
coord/box are promoted to global_pt_float_precision (and energy-related inputs
to global_pt_ener_float_precision if required) before doing
clone().detach().requires_grad_(True) or rebuilding extended_coord, and after
computing model_predict_lower run _output_type_cast to promote outputs
(especially energy_redu) back to global_pt_ener_float_precision; ensure you
reference the existing helpers _input_type_cast and _output_type_cast and keep
their casting behavior identical to forward_common.

In `@deepmd/pt/train/training.py`:
- Around line 264-299: When _data.mixed_batch is true but the model descriptor
lacks repflows, raise an immediate, clear error instead of constructing a
dataloader with graph_config=None; inside the mixed_batch branch (where
descriptor is obtained from model_for_graph.atomic_model.descriptor and
make_lmdb_mixed_batch_collate is called), check hasattr(descriptor, "repflows")
and if False raise a ValueError/RuntimeError with a message like
"mixed_batch=True requires a flat-graph capable descriptor with 'repflows' (used
by make_lmdb_mixed_batch_collate and
EnergyModel.forward/forward_common_flat_native) — disable mixed_batch or provide
a compatible descriptor." This ensures failure occurs at dataloader construction
rather than later in EnergyModel.forward.

In `@deepmd/pt/utils/lmdb_dataset.py`:
- Around line 42-49: The __all__ export list is not alphabetically sorted which
triggers Ruff RUF022; reorder the list items in the __all__ variable so they are
lexicographically sorted (e.g., "_collate_lmdb_batch",
"_collate_lmdb_mixed_batch", "LmdbDataset", "LmdbTestData", "is_lmdb",
"make_lmdb_mixed_batch_collate") while preserving the existing string entries,
quotes, and list structure so the symbol names (LmdbDataset, LmdbTestData,
_collate_lmdb_batch, _collate_lmdb_mixed_batch, make_lmdb_mixed_batch_collate,
is_lmdb) remain unchanged.
- Around line 247-255: Mixed batching currently replaces the shuffle-based
DataLoader and ignores computed block weights, so when mixed_batch is True you
must construct _inner_dataloader to honor _block_targets computed by
compute_block_targets() rather than using DataLoader(..., shuffle=True). Replace
the plain shuffle=True approach in the mixed_batch branch by creating and
passing a sampler that uses _block_targets (e.g.,
torch.utils.data.WeightedRandomSampler or an equivalent custom sampler) to
DataLoader, keep num_workers=0 and collate_fn=_collate_lmdb_mixed_batch, and
ensure _inner_dataloader is assigned the DataLoader configured with that sampler
so auto_prob_style/block weighting still affects sampling.
- Around line 51-61: The whitelisted atom-wise keys in
_ATOMWISE_MIXED_BATCH_KEYS are missing supported labels (e.g. "atom_ener",
"drdq"), causing those fields to be handled by collate_tensor_fn and break
mixed-batch logic; update the frozenset declaration named
_ATOMWISE_MIXED_BATCH_KEYS to include all atom-wise labels used elsewhere (at
minimum "atom_ener" and "drdq") and make the same addition to the duplicate
whitelist further down (the other frozenset at lines ~99-104) so mixed-batch
flattening consistently handles these fields instead of falling back to
collate_tensor_fn.

In `@deepmd/pt/utils/nlist.py`:
- Around line 246-259: The code currently inverts atom_cell (atom_cell =
cell[extended_batch]) per extended atom which is wasteful; instead compute the
inverse once per frame by calling torch.linalg.inv_ex on cell (after cell =
box.reshape(-1,3,3)) and then index the resulting rec_cell with extended_batch
(i.e., rec_cell_full, _ = torch.linalg.inv_ex(cell); rec_cell =
rec_cell_full[extended_batch]) so subsequent einsum uses the indexed per-atom
inverses without repeating the matrix inversion.

In `@test_mixed_batch.sh`:
- Around line 15-19: The script prints "Check mixed_batch_train.log for details"
but never writes that file; fix by redirecting the training command's
stdout/stderr to mixed_batch_train.log (e.g., change the dp --pt train
test_mptraj/lmdb_mixed_batch.json --skip-neighbor-stat invocation to append or
write both stdout and stderr into mixed_batch_train.log) or, if you prefer not
to create a log, update the echo lines to not reference mixed_batch_train.log;
modify the dp invocation or the echo accordingly to keep the message accurate.

In `@test_mptraj/lmdb_baseline.json`:
- Around line 195-205: The baseline config currently hardcodes
developer-specific absolute paths in the keys stat_file, training_data.systems,
and validation_data.systems which breaks portability; change these values to
reusable placeholders or environment-variable expansions (e.g. ${DATASET_PATH})
or point them to a small example dataset shipped in the repo (mirror other
example configs), update any code that loads the config to resolve those env
vars, and add a short note in the config or README explaining how
test_mixed_batch.sh sets or substitutes those variables before running.

In `@test_mptraj/lmdb_mixed_batch.json`:
- Around line 74-84: The fixture uses hard-coded absolute paths in stat_file and
LMDB system entries (stat_file, training_data.systems, validation_data.systems)
which prevents CI portability; update the test_mptraj/lmdb_mixed_batch.json
fixture to reference repo-managed test data or injected paths by replacing
absolute paths with relative paths inside the repo (e.g., a test-data directory)
or with placeholders/env variables read by the test runner, and ensure the test
harness supplies those fixtures before running so the config uses
repo-controlled files rather than machine-specific absolute paths.

---

Outside diff comments:
In `@deepmd/pt/loss/ener.py`:
- Around line 412-424: The generalized-force branch in ener.py assumes a single
natoms by reshaping force_pred, force_label and drdq with natoms * 3, which
breaks for mixed-size batches; instead, use frame-aware shapes: do not flatten
all frames into one dimension—reshape or iterate using per-frame atom counts (or
keep forces as (nframes, natoms_i, 3) and drdq as (nframes, natoms_i*3,
n_gcoord)) and compute gen_force and gen_force_label per-frame (or via batched
operations that use the original nframes dimension) so tensor dims align; also
apply find_drdq/pref_gf per-frame. Locate symbols: has_gf, drdq, find_drdq,
pref_gf, force_pred, force_label, force_reshape_nframes,
force_label_reshape_nframes, drdq_reshape, gen_force, gen_force_label and update
the reshaping/Einstein-summation to operate over the nframes dimension (or loop
over frames) instead of using natoms * 3 across the whole batch.

---

Nitpick comments:
In @.gitignore:
- Line 81: The .gitignore entry "deepmd-kit/" is too broad and may
unintentionally ignore legitimately nested repos; update the ignore rule that
currently uses deepmd-kit/ to a more specific pattern (for example scope it to
the repo root by changing to "/deepmd-kit/" or better yet target only
build/dist/artifact paths like "/deepmd-kit/build", "/deepmd-kit/dist", or
specific filenames/patterns such as "deepmd-kit/*.egg-info") so only the
intended artifact directories/files are ignored; locate the deepmd-kit/ entry in
the .gitignore and replace it with one or more narrow, explicit patterns.

In `@deepmd/pt/model/descriptor/repflows.py`:
- Around line 840-874: Add a short explanatory comment above the synthetic
batch=1 wrapping that states the disjoint-frame invariant: explain that the
temporary batching of tensors like node_ebd_batched, edge_ebd_batched,
h2_batched, angle_ebd_batched, nlist_batched, a_nlist_batched, etc. is safe
because RepFlowLayer.forward operates strictly per-atom and per-neighbor and the
LMDB collator guarantees nlist/a_nlist never reference atoms across different
frames (mapping/offsets keep frames disjoint); also warn that if
RepFlowLayer.forward (or any layer in self.layers) later introduces global ops
over the batch axis the synthetic batch will mix frames and break correctness,
so such changes must preserve the per-frame isolation or remove this batching
trick.

In `@deepmd/pt/model/model/ener_model.py`:
- Around line 92-176: The two branches in forward (flat and regular) duplicate
the same output-assembly logic for model_predict; extract that into a helper
method (e.g. _assemble_energy_predict(model_ret, do_atomic_virial)) that
encapsulates setting "atom_energy", "energy", choosing "force" vs "dforce" using
do_grad_r("energy"), adding "virial" and "atom_virial" via do_grad_c("energy")
and do_atomic_virial, including "mask" and "hessian" when present/when
_hessian_enabled; then call this helper from both the flat-path and regular-path
after receiving model_ret, and keep the flat/regular-specific behavior (the
no-fitting-net case that does model_predict["updated_coord"] += coord) in the
respective branch only. Ensure the helper references self.do_grad_r,
self.do_grad_c, self._hessian_enabled and self.get_fitting_net() behavior
remains consistent.

In `@deepmd/pt/model/model/make_model.py`:
- Around line 520-599: forward_common_flat is a pure passthrough to
forward_common_flat_native with duplicated docstrings, creating maintenance
overhead; either remove the redundant wrapper by merging
forward_common_flat_native into forward_common_flat (rename/remove the _native
variant and keep a single method and docstring) or add a concise
comment/docstring to forward_common_flat explaining the explicit reason for the
split (e.g., JIT/export boundary or future divergence). Update any call sites
that reference forward_common_flat_native to use the single chosen method name
(forward_common_flat) and ensure tests/type hints still pass.
- Around line 296-309: Move the helper import for
rebuild_extended_coord_from_flat_graph out of the hot forward path and add it to
the module-level nlist import block with the other deepmd.pt.utils.nlist imports
so it is imported once at load time; also change the validation order inside the
flat mixed-batch branch so you check for presence of the required graph fields
(mapping, extended_batch, extended_image, etc.) and raise the RuntimeError
before mutating tensors, and only call coord.requires_grad_(True) /
box.requires_grad_(True) (and any clone+detach) after the validation passes to
avoid wasted work when validation fails.
- Around line 311-314: The four assert statements for extended_atype,
extended_batch, mapping, and nlist are redundant because the preceding explicit
None-check/else already guarantees they are not None; either remove those assert
lines or replace them with a short explanatory comment indicating they exist
purely for static type-narrowing (for tools like mypy) so future readers
understand their purpose; locate the checks around the make_model function where
extended_atype, extended_batch, mapping, and nlist are validated and update
accordingly.

In `@deepmd/pt/model/network/graph_utils_flat.py`:
- Around line 41-89: Add a defensive check to guard against mis-configured
inputs by asserting that a_sel (from a_nlist_mask.shape[1]) does not exceed nnei
(from nlist_flat.shape[1]) near the top of the function; insert something like
an assert or raise ValueError after computing nnei and a_sel so that functions
using n2e_index, edge_lookup_a, and the a_nlist_mask_3d broadcasts fail fast
with a clear message instead of producing downstream broadcast errors.

In `@deepmd/pt/model/task/invar_fitting.py`:
- Around line 299-315: The tensor un-padding currently uses a shape heuristic in
the loop that builds result_flat; instead use the model's declared output types
from output_def() to decide which keys are atom-wise. Replace the shape-based
condition in the for key, value in result_batch loop with a lookup like
output_def()[key].is_atomic (or equivalent in your model API) and only apply
valid_atom_mask to values for keys declared atomic; leave frame-level or other
outputs untouched and handle non-tensor values as before (keep variable names
result_batch, result_flat, valid_atom_mask, and call output_def() to determine
atomic vs frame-level).
- Around line 259-264: The padding value for atype_batch is set to -1 which
causes PyTorch indexing (e.g., in self.forward when doing per-type lookups like
bias_atom_e[atype] or embeddings) to wrap to the last type; change the sentinel
to a valid non-negative index (e.g., 0 or a dedicated PAD type index) when
creating atype_batch so lookups won't accidentally reference real types, and
keep using valid_atom_mask to discard those rows downstream; update places that
assume negative sentinel behavior to use the new PAD index (refer to
atype_batch, self.forward, bias_atom_e, and valid_atom_mask).

In `@deepmd/pt/train/training.py`:
- Around line 1690-1723: In get_data, remove the special-case skipping of
"batch" and "ptr" inside the batch_data loop and delete the separate post-loop
device transfers for batch_data["batch"] and batch_data["ptr"]; instead let the
existing non-list tensor branch handle them (they'll be moved to DEVICE
automatically), and only keep the is_mixed_batch detection so you can extend
input_keys with list(_FLAT_GRAPH_INPUT_KEYS) when is_mixed_batch is true
(symbols: get_data, batch_data, is_mixed_batch, input_keys,
_FLAT_GRAPH_INPUT_KEYS, DEVICE).

In `@deepmd/pt/train/wrapper.py`:
- Around line 237-247: The natoms value currently has dual semantics (per-frame
nloc when atype.dim()>1 vs total flattened atom count when 1-D) which is
implicit and fragile; fix by making the contract explicit: compute both
nloc_per_frame (e.g., atype.shape[-1] when atype.dim()>1 else None) and
total_atoms (atype.numel() or atype.shape[0]) in wrapper.py and change the call
to self.loss[task_key](input_dict, self.model[task_key], label,
nloc_per_frame=nloc_per_frame, total_atoms=total_atoms, learning_rate=cur_lr),
then update all loss implementations (ener.py) and any factory in make_model.py
to accept these new named args (or provide backward-compatible handling) so
downstream code no longer must infer semantics from ptr.

In `@deepmd/pt/utils/nlist.py`:
- Around line 101-170: The loop is causing repeated device syncs via
int(ptr[frame_idx].item()) and per-frame kernel overhead in
extend_coord_with_ghosts_with_images and build_neighbor_list; fix by
materializing ptr on CPU once (e.g., ptr_cpu = ptr.cpu().tolist()) and use
ptr_cpu[frame_idx]/ptr_cpu[frame_idx+1] for start_idx/end_idx to eliminate
.item() calls, and where possible call
normalize_coord/extend_coord_with_ghosts_with_images/build_neighbor_list on
already-CPU tensors or add a batched/padded variant (similar to
extend_input_and_build_neighbor_list_with_images) to process frames in a
vectorized way to reduce Python per-frame overhead.
- Around line 262-281: Add a one-line doc comment inside get_central_ext_index
stating the invariant it relies on: that the extended layout produced by
_extend_coord_with_ghosts_impl places the unshifted image (shift (0,0,0)) first
for each frame and thus the first nloc extended atoms per frame are the local
atoms; mention that shift indices are sorted by L2 norm so the (0,0,0) image is
first and changing that ordering will break get_central_ext_index.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b6ad87fb-4157-40de-8f34-510d8232bbef

📥 Commits

Reviewing files that changed from the base of the PR and between 78bffde and 9f4ea0a.

📒 Files selected for processing (21)
  • .gitignore
  • deepmd/pt/entrypoints/main.py
  • deepmd/pt/loss/ener.py
  • deepmd/pt/model/atomic_model/dp_atomic_model.py
  • deepmd/pt/model/descriptor/dpa3.py
  • deepmd/pt/model/descriptor/env_mat.py
  • deepmd/pt/model/descriptor/repflows.py
  • deepmd/pt/model/model/__init__.py
  • deepmd/pt/model/model/ener_model.py
  • deepmd/pt/model/model/make_model.py
  • deepmd/pt/model/network/graph_utils_flat.py
  • deepmd/pt/model/task/invar_fitting.py
  • deepmd/pt/train/training.py
  • deepmd/pt/train/wrapper.py
  • deepmd/pt/utils/lmdb_dataset.py
  • deepmd/pt/utils/nlist.py
  • deepmd/utils/argcheck.py
  • source/tests/pt/test_lmdb_dataloader.py
  • test_mixed_batch.sh
  • test_mptraj/lmdb_baseline.json
  • test_mptraj/lmdb_mixed_batch.json

Comment thread .gitignore
system/

temp/
test_mptraj/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Ignoring test_mptraj/ conflicts with tracked files in this PR.

This PR commits test_mptraj/lmdb_baseline.json and test_mptraj/lmdb_mixed_batch.json under test_mptraj/, but the new test_mptraj/ ignore rule will hide any future additions/edits in that directory and forces git add -f. Either drop the directory-level ignore or add explicit negations for the tracked configs.

🛠️ Suggested fix
 temp/
-test_mptraj/
+test_mptraj/*
+!test_mptraj/*.json
+!test_mptraj/.gitignore
 pkl/
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
test_mptraj/
test_mptraj/*
!test_mptraj/*.json
!test_mptraj/.gitignore
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.gitignore at line 78, The .gitignore entry "test_mptraj/" conflicts with
tracked files committed in this PR (test_mptraj/lmdb_baseline.json and
test_mptraj/lmdb_mixed_batch.json); fix by either removing the directory-level
ignore line "test_mptraj/" or by adding explicit negation patterns for the
tracked files (e.g., "!test_mptraj/lmdb_baseline.json" and
"!test_mptraj/lmdb_mixed_batch.json") so future edits to those config files are
not hidden and git add won't require -f; update .gitignore accordingly and
ensure the two committed files remain tracked.

Comment thread deepmd/pt/loss/ener.py
Comment on lines +228 to +240
# Detect mixed batch format
is_mixed_batch = "ptr" in input_dict and input_dict["ptr"] is not None

# For mixed batch, compute per-frame atom_norm and average
if is_mixed_batch:
ptr = input_dict["ptr"]
nframes = ptr.numel() - 1
# Compute natoms for each frame
natoms_per_frame = ptr[1:] - ptr[:-1] # [nframes]
# Average atom_norm across frames
atom_norm = torch.mean(1.0 / natoms_per_frame.float())
else:
atom_norm = 1.0 / natoms
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

This mixed-batch normalization misweights frames with different atom counts.

l2_ener_loss/l2_virial_loss are reduced over frames first, then multiplied by mean(1 / N_i). That is not the same as applying each frame’s own 1 / N_i normalization before the reduction, so mixed batches with small and large frames get the wrong loss weighting.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/loss/ener.py` around lines 228 - 240, The mixed-batch normalization
currently computes a single scalar atom_norm via atom_norm = mean(1.0 /
natoms_per_frame) which misweights frames with different atom counts; instead
compute per-frame normalizers and apply them before reducing l2_ener_loss and
l2_virial_loss: when is_mixed_batch (detect via ptr and natoms_per_frame derived
from ptr), build per-frame atom_norms = 1.0 / natoms_per_frame.float() and
multiply each frame's loss by its corresponding atom_norm before taking the
final mean/sum (or perform a weighted reduction using these per-frame weights)
so that l2_ener_loss and l2_virial_loss are normalized per-frame correctly
rather than using a single averaged scalar.

Comment on lines +378 to +390
if self.do_grad_r() or self.do_grad_c():
extended_coord.requires_grad_(True)

# Descriptor and fitting both consume the flat atom layout.
descriptor_out = self.descriptor.forward_flat(
extended_coord,
extended_atype,
extended_batch,
nlist,
mapping,
batch,
ptr,
fparam=fparam if self.add_chg_spin_ebd else None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Mirror the default-fparam fallback in the flat path.

forward_atomic() synthesizes get_default_fparam() when the fitting net expects frame parameters and the caller omits fparam. This flat path passes None straight into descriptor.forward_flat(), so mixed-batch runs lose that fallback and will diverge from the non-flat path for models using default charge/spin frame params.

Suggested fix
         if self.do_grad_r() or self.do_grad_c():
             extended_coord.requires_grad_(True)
 
+        if (
+            hasattr(self.fitting_net, "get_dim_fparam")
+            and self.fitting_net.get_dim_fparam() > 0
+            and fparam is None
+        ):
+            default_fparam_tensor = self.fitting_net.get_default_fparam()
+            assert default_fparam_tensor is not None
+            fparam_input_for_des = torch.tile(
+                default_fparam_tensor.unsqueeze(0), [ptr.numel() - 1, 1]
+            )
+        else:
+            fparam_input_for_des = fparam
+
         # Descriptor and fitting both consume the flat atom layout.
         descriptor_out = self.descriptor.forward_flat(
             extended_coord,
@@
-            fparam=fparam if self.add_chg_spin_ebd else None,
+            fparam=fparam_input_for_des if self.add_chg_spin_ebd else None,
             central_ext_index=central_ext_index,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/model/atomic_model/dp_atomic_model.py` around lines 378 - 390, The
flat-path call to descriptor.forward_flat currently passes fparam unmodified and
thus skips the fallback logic used in forward_atomic; update the flat-path
before calling descriptor.forward_flat (the block around
descriptor.forward_flat) to mirror forward_atomic's behavior: if
self.add_chg_spin_ebd (or the equivalent flag) is True and fparam is None,
compute the default via self.get_default_fparam(...) using the same inputs/shape
logic used in forward_atomic and assign it to fparam, then pass that fparam into
descriptor.forward_flat so mixed-batch runs use the same default frame
parameters.

Comment on lines +90 to +98
def _set_default_descriptor_init_seed(
params: dict[str, Any], seed: int | list[int]
) -> None:
if params.get("type") == "hybrid":
for idx, descriptor_params in enumerate(params.get("list", [])):
if isinstance(descriptor_params, dict):
_set_default_descriptor_init_seed(
descriptor_params, child_seed(seed, idx)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Hybrid descriptor child seeds ignore an explicit parent seed.

In hybrid mode, Line 97 derives child seeds from the passed default (DEFAULT_DESCRIPTOR_INIT_SEED) rather than the effective parent config seed. If users set descriptor.seed, it won’t drive child seed derivation.

Suggested fix
 def _set_default_descriptor_init_seed(
     params: dict[str, Any], seed: int | list[int]
 ) -> None:
+    _set_default_init_seed(params, seed)
+    parent_seed = params["seed"]
     if params.get("type") == "hybrid":
         for idx, descriptor_params in enumerate(params.get("list", [])):
             if isinstance(descriptor_params, dict):
                 _set_default_descriptor_init_seed(
-                    descriptor_params, child_seed(seed, idx)
+                    descriptor_params, child_seed(parent_seed, idx)
                 )
         return
-    _set_default_init_seed(params, seed)

Also applies to: 111-113

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/model/model/__init__.py` around lines 90 - 98, The
hybrid-descriptor logic in _set_default_descriptor_init_seed (and the analogous
block later) always derives child seeds from the passed-in default `seed`
instead of the effective parent config seed; update the logic to first determine
the parent_seed = params.get("seed", seed) (i.e., use params["seed"] when
present, otherwise fall back to the incoming seed) and then call
child_seed(parent_seed, idx) when recursing into each hybrid child (preserving
the existing isinstance(descriptor_params, dict) check and recursive call to
_set_default_descriptor_init_seed).

Comment on lines +217 to +352
def forward_common_flat_native(
self,
coord: torch.Tensor,
atype: torch.Tensor,
batch: torch.Tensor,
ptr: torch.Tensor,
box: torch.Tensor | None = None,
fparam: torch.Tensor | None = None,
aparam: torch.Tensor | None = None,
do_atomic_virial: bool = False,
extended_atype: torch.Tensor | None = None,
extended_batch: torch.Tensor | None = None,
extended_image: torch.Tensor | None = None,
extended_ptr: torch.Tensor | None = None,
mapping: torch.Tensor | None = None,
central_ext_index: torch.Tensor | None = None,
nlist: torch.Tensor | None = None,
nlist_ext: torch.Tensor | None = None,
a_nlist: torch.Tensor | None = None,
a_nlist_ext: torch.Tensor | None = None,
nlist_mask: torch.Tensor | None = None,
a_nlist_mask: torch.Tensor | None = None,
edge_index: torch.Tensor | None = None,
angle_index: torch.Tensor | None = None,
) -> dict[str, torch.Tensor]:
"""Forward pass for mixed-nloc batches with a precomputed flat graph.

This path consumes graph tensors prepared by the LMDB collate function
and keeps atom-wise values flattened across frames.

Parameters
----------
coord
Flattened atomic coordinates with shape [total_atoms, 3].
atype
Flattened atomic types with shape [total_atoms].
batch
Atom-to-frame assignment with shape [total_atoms].
ptr
Frame boundaries with shape [nframes + 1].
box
Simulation boxes with shape [nframes, 9].
fparam
Frame parameters with shape [nframes, ndf].
aparam
Flattened atomic parameters with shape [total_atoms, nda].
do_atomic_virial
Whether to calculate atomic virial.

Returns
-------
model_predict : dict[str, torch.Tensor]
Model predictions with flat format:
- atomwise outputs: [total_atoms, ...]
- frame-wise outputs: [nframes, ...]

Notes
-----
The precomputed graph fields are required for this path; missing
fields are treated as a data pipeline error.
"""
# Enable gradient tracking for coord and box if needed
if self.do_grad_r("energy"):
coord = coord.clone().detach().requires_grad_(True)
if self.do_grad_c("energy") and box is not None:
box = box.clone().detach().requires_grad_(True)
if (
extended_atype is not None
and extended_batch is not None
and extended_image is not None
and mapping is not None
and nlist is not None
and nlist_ext is not None
and a_nlist is not None
and a_nlist_ext is not None
and nlist_mask is not None
and a_nlist_mask is not None
and central_ext_index is not None
):
from deepmd.pt.utils.nlist import rebuild_extended_coord_from_flat_graph

extended_coord = rebuild_extended_coord_from_flat_graph(
coord,
box,
mapping,
extended_batch,
extended_image,
)
else:
raise RuntimeError(
"Flat mixed-batch forward requires precomputed graph fields from "
"the LMDB collate_fn."
)
# Pass flat extended coordinates directly to the atomic model.
assert extended_atype is not None
assert extended_batch is not None
assert mapping is not None
assert nlist is not None
model_predict_lower = self.forward_common_lower_flat(
extended_coord,
extended_atype,
extended_batch,
nlist,
mapping,
batch,
ptr,
do_atomic_virial=do_atomic_virial,
fparam=fparam,
aparam=aparam,
extended_ptr=extended_ptr,
central_ext_index=central_ext_index,
nlist_ext=nlist_ext,
a_nlist=a_nlist,
a_nlist_ext=a_nlist_ext,
nlist_mask=nlist_mask,
a_nlist_mask=a_nlist_mask,
edge_index=edge_index,
angle_index=angle_index,
)

# Compute derivatives if needed
if self.do_grad_r("energy") or self.do_grad_c("energy"):
model_predict_lower = self._compute_derivatives_flat(
model_predict_lower,
extended_coord,
extended_atype,
extended_batch,
coord,
atype,
batch,
ptr,
box,
do_atomic_virial,
)

return model_predict_lower
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether _input_type_cast / _output_type_cast are referenced anywhere
# in the new flat path or its callees.
rg -nP --type=py -C2 '_(input|output)_type_cast' deepmd/pt/model/

Repository: deepmodeling/deepmd-kit

Length of output: 2157


🏁 Script executed:

#!/bin/bash
# Find forward_common_lower_flat and _compute_derivatives_flat implementations
cd deepmd/pt/model/model && cat -n make_model.py | sed -n '353,700p' | head -100

Repository: deepmodeling/deepmd-kit

Length of output: 4742


🏁 Script executed:

#!/bin/bash
# Read _compute_derivatives_flat implementation
cd deepmd/pt/model/model && cat -n make_model.py | sed -n '438,550p'

Repository: deepmodeling/deepmd-kit

Length of output: 5611


🏁 Script executed:

#!/bin/bash
# Read forward_common_flat and forward_common_lower implementations for comparison
cd deepmd/pt/model/model && cat -n make_model.py | sed -n '520,600p' && echo "---" && cat -n make_model.py | sed -n '150,220p'

Repository: deepmodeling/deepmd-kit

Length of output: 6816


🏁 Script executed:

#!/bin/bash
# Read _input_type_cast and _output_type_cast implementations
cd deepmd/pt/model/model && cat -n make_model.py | sed -n '711,800p'

Repository: deepmodeling/deepmd-kit

Length of output: 4361


Add input/output precision casting to flat path for numeric consistency with forward_common.

forward_common normalizes input precision via _input_type_cast (line 171) and casts outputs back via _output_type_cast (line 214). forward_common_flat_native skips both, causing autograd to operate at the input's native precision instead of global_pt_float_precision, and outputs (particularly energy_redu) won't be promoted to global_pt_ener_float_precision. This breaks numeric consistency—if coord/box arrive in a non-canonical precision, box.clone().detach().requires_grad_(True) at line 282 can create a dtype mismatch between the cloned box and the extended_coord reconstructed from the original precision. Add precision casting at the flat path's entry and exit to match the behavior of the regular path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/model/model/make_model.py` around lines 217 - 352, The flat forward
path forward_common_flat_native is missing the same input/output precision
normalization used in forward_common; call the model's _input_type_cast on
incoming tensors (coord, box, fparam, aparam, atype, batch, mapping, extended_*
inputs as appropriate) at the start so coord/box are promoted to
global_pt_float_precision (and energy-related inputs to
global_pt_ener_float_precision if required) before doing
clone().detach().requires_grad_(True) or rebuilding extended_coord, and after
computing model_predict_lower run _output_type_cast to promote outputs
(especially energy_redu) back to global_pt_ener_float_precision; ensure you
reference the existing helpers _input_type_cast and _output_type_cast and keep
their casting behavior identical to forward_common.

Comment on lines +247 to +255
if mixed_batch:
with torch.device("cpu"):
self._inner_dataloader = DataLoader(
self,
batch_size=self._reader.batch_size,
shuffle=True,
num_workers=0,
collate_fn=_collate_lmdb_mixed_batch,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

auto_prob stops affecting sampling when mixed_batch=True.

compute_block_targets() still runs above, but this branch switches to a plain DataLoader(..., shuffle=True) and never consumes _block_targets. Any LMDB config relying on auto_prob_style/block weighting will silently change its training distribution once mixed batching is enabled.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/utils/lmdb_dataset.py` around lines 247 - 255, Mixed batching
currently replaces the shuffle-based DataLoader and ignores computed block
weights, so when mixed_batch is True you must construct _inner_dataloader to
honor _block_targets computed by compute_block_targets() rather than using
DataLoader(..., shuffle=True). Replace the plain shuffle=True approach in the
mixed_batch branch by creating and passing a sampler that uses _block_targets
(e.g., torch.utils.data.WeightedRandomSampler or an equivalent custom sampler)
to DataLoader, keep num_workers=0 and collate_fn=_collate_lmdb_mixed_batch, and
ensure _inner_dataloader is assigned the DataLoader configured with that sampler
so auto_prob_style/block weighting still affects sampling.

Comment thread deepmd/pt/utils/nlist.py
Comment on lines +246 to +259
if box is None:
return coord[mapping]
cell = box.reshape(-1, 3, 3)
atom_cell = cell[extended_batch]
rec_cell, _ = torch.linalg.inv_ex(atom_cell)
coord_inter = torch.einsum("ni,nij->nj", coord[mapping], rec_cell)
coord_wrapped = torch.einsum(
"ni,nij->nj",
torch.remainder(coord_inter, 1.0),
atom_cell,
)
image = extended_image.to(dtype=box.dtype, device=box.device)
shift_vec = torch.einsum("ni,nij->nj", image, atom_cell)
return coord_wrapped + shift_vec
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Invert the periodic cell once per frame, not once per extended atom.

atom_cell = cell[extended_batch] produces an [n_ext, 3, 3] tensor of repeated per-frame matrices, and torch.linalg.inv_ex(atom_cell) then inverts each row independently. This is called on every forward pass for periodic systems and scales with the number of extended atoms (tens of thousands). Inverting per-frame and indexing into the result preserves the same output and avoids the redundant work.

⚡ Suggested fix
     if box is None:
         return coord[mapping]
     cell = box.reshape(-1, 3, 3)
-    atom_cell = cell[extended_batch]
-    rec_cell, _ = torch.linalg.inv_ex(atom_cell)
+    rec_cell_per_frame, _ = torch.linalg.inv_ex(cell)
+    atom_cell = cell[extended_batch]
+    rec_cell = rec_cell_per_frame[extended_batch]
     coord_inter = torch.einsum("ni,nij->nj", coord[mapping], rec_cell)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/utils/nlist.py` around lines 246 - 259, The code currently inverts
atom_cell (atom_cell = cell[extended_batch]) per extended atom which is
wasteful; instead compute the inverse once per frame by calling
torch.linalg.inv_ex on cell (after cell = box.reshape(-1,3,3)) and then index
the resulting rec_cell with extended_batch (i.e., rec_cell_full, _ =
torch.linalg.inv_ex(cell); rec_cell = rec_cell_full[extended_batch]) so
subsequent einsum uses the indexed per-atom inverses without repeating the
matrix inversion.

Comment thread test_mixed_batch.sh
Comment on lines +15 to +19
dp --pt train test_mptraj/lmdb_mixed_batch.json --skip-neighbor-stat

echo ""
echo "=== Training completed ==="
echo "Check mixed_batch_train.log for details"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

The script never creates mixed_batch_train.log.

The final message points users to a log file that this script does not write. Either redirect the training output there or update the message.

Suggested fix
-dp --pt train test_mptraj/lmdb_mixed_batch.json --skip-neighbor-stat
+dp --pt train test_mptraj/lmdb_mixed_batch.json --skip-neighbor-stat \
+  > mixed_batch_train.log 2>&1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test_mixed_batch.sh` around lines 15 - 19, The script prints "Check
mixed_batch_train.log for details" but never writes that file; fix by
redirecting the training command's stdout/stderr to mixed_batch_train.log (e.g.,
change the dp --pt train test_mptraj/lmdb_mixed_batch.json --skip-neighbor-stat
invocation to append or write both stdout and stderr into mixed_batch_train.log)
or, if you prefer not to create a log, update the echo lines to not reference
mixed_batch_train.log; modify the dp invocation or the echo accordingly to keep
the message accurate.

Comment on lines +195 to +205
"stat_file": "./MP_traj_v024_alldata_mixu.hdf5",
"training_data": {
"systems": "/mnt/data_nas/zhangd/data/lmdb_data/mptraj_v024.lmdb",
"batch_size": "auto:128",
"_comment": "that's all"
},
"validation_data": {
"systems": "/mnt/data_nas/zhangd/data/lmdb_data/wbm.lmdb",
"batch_size": 1,
"_comment": "that's all"
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Baseline config contains developer-specific absolute paths.

stat_file, training_data.systems, and validation_data.systems point to a specific machine (/mnt/data_nas/zhangd/...) and a local ./MP_traj_v024_alldata_mixu.hdf5. This makes the baseline non-reproducible for any other user/CI. Consider parameterizing via env vars / placeholders, or pointing to a small example dataset that ships with the repo (similar to other example configs), and documenting how test_mixed_batch.sh resolves these.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test_mptraj/lmdb_baseline.json` around lines 195 - 205, The baseline config
currently hardcodes developer-specific absolute paths in the keys stat_file,
training_data.systems, and validation_data.systems which breaks portability;
change these values to reusable placeholders or environment-variable expansions
(e.g. ${DATASET_PATH}) or point them to a small example dataset shipped in the
repo (mirror other example configs), update any code that loads the config to
resolve those env vars, and add a short note in the config or README explaining
how test_mixed_batch.sh sets or substitutes those variables before running.

Comment on lines +74 to +84
"stat_file": "/aisi/mnt/data_nas/liwentao/devel_workspace/deepmd-kit-lmdb/MP_traj_v024_alldata_mixu.hdf5",
"training_data": {
"systems": "/mnt/data_nas/zhangd/data/lmdb_data/mptraj_v024.lmdb",
"batch_size": 8,
"mixed_batch": true,
"_comment": "Enable mixed_batch mode for testing, disable shuffle for reproducibility"
},
"validation_data": {
"systems": "/mnt/data_nas/zhangd/data/lmdb_data/wbm.lmdb",
"batch_size": 8,
"mixed_batch": true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

This test fixture is tied to one machine’s filesystem.

stat_file and both LMDB systems entries are hard-coded absolute paths, so the new script/config cannot run in CI or on another checkout. Please switch this fixture to repo-managed test data or make the paths injectable.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test_mptraj/lmdb_mixed_batch.json` around lines 74 - 84, The fixture uses
hard-coded absolute paths in stat_file and LMDB system entries (stat_file,
training_data.systems, validation_data.systems) which prevents CI portability;
update the test_mptraj/lmdb_mixed_batch.json fixture to reference repo-managed
test data or injected paths by replacing absolute paths with relative paths
inside the repo (e.g., a test-data directory) or with placeholders/env variables
read by the test runner, and ensure the test harness supplies those fixtures
before running so the config uses repo-controlled files rather than
machine-specific absolute paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant