Skip to content

feat(pt_expt): add dp compress support for pt_expt backend#5323

Open
wanghan-iapcm wants to merge 7 commits intodeepmodeling:masterfrom
wanghan-iapcm:feat-pt-expt-compress
Open

feat(pt_expt): add dp compress support for pt_expt backend#5323
wanghan-iapcm wants to merge 7 commits intodeepmodeling:masterfrom
wanghan-iapcm:feat-pt-expt-compress

Conversation

@wanghan-iapcm
Copy link
Collaborator

@wanghan-iapcm wanghan-iapcm commented Mar 17, 2026

Important

Breaking change: This PR bumps the serialization version for all compressible descriptors (se_e2_a, se_r, se_t, se_t_tebd, dpa1, se_atten_v2, dpa2). Models serialized with the new version cannot be loaded by older code. Uncompressed models are unaffected — they continue to use the previous version.

Summary

  • Add model compression (embedding net tabulation) for the pt_expt backend, matching the existing pt backend capability
  • Compressed models replace embedding net forward passes with polynomial lookup tables via C++ custom ops (tabulate_fusion_se_*), significantly speeding up inference
  • Support all compressible descriptors: se_e2_a, se_r, se_t, se_t_tebd, dpa1, se_atten_v2, dpa2 (hybrid delegates automatically)

Key changes

Infrastructure:

  • deepmd/pt_expt/utils/tabulate_ops.py — Register torch.library.register_fake for all 5 custom ops to enable torch.export/make_fx tracing through compressed forward paths
  • deepmd/pt_expt/utils/tabulate.pyDPTabulate subclass that detects descriptor type via serialized data (avoids isinstance checks against pt-specific classes)
  • deepmd/pt_expt/entrypoints/compress.py — Entry point: load .pte → deserialize → enable_compression() → re-export .pte

Descriptors: Each gets enable_compression() + @cast_precision call() override with compressed branch using the appropriate custom op.

dpmodel — compression state serialization (breaking version bumps):

The pt_expt backend persists models via serialize()model.jsondeserialize() (the .pte format), unlike pt/tf which use native framework save mechanisms (torch.jit.save / tf.saved_model) that capture the full runtime state. This means compression state (tabulated polynomial coefficients, precomputed type embeddings) must survive the serialize/deserialize round-trip for compressed .pte models to work.

Each compressible descriptor's serialization version is bumped when the model is compressed. Uncompressed models continue to use the old version, so there is no breakage for existing uncompressed model files. All backends (pt, pd, tf) accept the new version in deserialize() and simply ignore the "compress" key.

Descriptor Version bump Added fields
se_e2_a 2 → 3 compress_data, compress_info
se_r 2 → 3 compress_data, compress_info
se_t 2 → 3 compress_data, compress_info
se_t_tebd 1 → 2 compress_data, compress_info, type_embd_data
dpa1 2 → 3 type_embd_data, geo_compress, compress_data/info (if geo)
se_atten_v2 2 → 3 type_embd_data, geo_compress, compress_data/info (if geo)
dpa2 3 → 4 compress dict inside repinit_variable

dpmodel: Initialize self.compress = False in all descriptor __init__ methods.

Test plan

  • source/tests/pt_expt/model/test_model_compression.py — end-to-end compress → serialize → deserialize → eval
  • source/tests/pt_expt/descriptor/ — compressed forward, consistency, exportable, make_fx tests for all descriptors
  • source/tests/consistent/descriptor/ — cross-backend consistency tests pass with bumped versions

Summary by CodeRabbit

  • New Features

    • Added descriptor compression functionality to reduce model size and optimize memory usage during inference.
    • Introduced compress CLI command to enable tabulated embedding optimization on frozen trained models.
    • Enhanced descriptor serialization with improved version compatibility across multiple backends.
  • Tests

    • Added comprehensive test coverage for compressed descriptor forward passes and model compression workflows.

Implement model compression (embedding net tabulation) for the pt_expt
backend, replacing embedding net forward passes with polynomial lookup
tables via C++ custom ops (tabulate_fusion_se_*).

- Register fake tensor implementations for custom ops to enable
  torch.export/make_fx tracing through compressed forward paths
- Create pt_expt DPTabulate subclass using serialized type detection
  instead of isinstance checks against pt-specific classes
- Add enable_compression() and compressed call() to all descriptors:
  se_e2_a, se_r, se_t, se_t_tebd, dpa1, se_atten_v2, dpa2
  (hybrid delegates to sub-descriptors automatically)
- Add compress entry point and command dispatch in main.py
- Add test_compressed_forward tests for all descriptor types
- Initialize self.compress = False in dpmodel descriptor __init__
@wanghan-iapcm wanghan-iapcm requested a review from njzjz March 17, 2026 13:39
@dosubot dosubot bot added the new feature label Mar 17, 2026
@wanghan-iapcm wanghan-iapcm marked this pull request as draft March 17, 2026 13:42
Comment on lines +154 to +161
def call(
self,
coord_ext: torch.Tensor,
atype_ext: torch.Tensor,
nlist: torch.Tensor,
mapping: torch.Tensor | None = None,
fparam: torch.Tensor | None = None,
) -> Any:

Check warning

Code scanning / CodeQL

Signature mismatch in overriding method Warning

This method requires at least 4 positional arguments, whereas overridden
NativeOP.call
may be called with 1.
This call
correctly calls the base method, but does not match the signature of the overriding method.
This method does not accept arbitrary keyword arguments, which overridden
NativeOP.call
does.
This call
correctly calls the base method, but does not match the signature of the overriding method.
This method requires at most 6 positional arguments, whereas overridden
NativeOP.call
may be called with arbitrarily many.
This call
correctly calls the base method, but does not match the signature of the overriding method.
Comment on lines +188 to +195
def call(
self,
coord_ext: torch.Tensor,
atype_ext: torch.Tensor,
nlist: torch.Tensor,
mapping: torch.Tensor | None = None,
fparam: torch.Tensor | None = None,
) -> Any:

Check warning

Code scanning / CodeQL

Signature mismatch in overriding method Warning

This method requires at least 4 positional arguments, whereas overridden
NativeOP.call
may be called with 1.
This call
correctly calls the base method, but does not match the signature of the overriding method.
This method does not accept arbitrary keyword arguments, which overridden
NativeOP.call
does.
This call
correctly calls the base method, but does not match the signature of the overriding method.
This method requires at most 6 positional arguments, whereas overridden
NativeOP.call
may be called with arbitrarily many.
This call
correctly calls the base method, but does not match the signature of the overriding method.
Comment on lines +111 to +118
def call(
self,
coord_ext: torch.Tensor,
atype_ext: torch.Tensor,
nlist: torch.Tensor,
mapping: torch.Tensor | None = None,
fparam: torch.Tensor | None = None,
) -> Any:

Check warning

Code scanning / CodeQL

Signature mismatch in overriding method Warning

This method requires at least 4 positional arguments, whereas overridden
NativeOP.call
may be called with 1.
This call
correctly calls the base method, but does not match the signature of the overriding method.
This method does not accept arbitrary keyword arguments, which overridden
NativeOP.call
does.
This call
correctly calls the base method, but does not match the signature of the overriding method.
This method requires at most 6 positional arguments, whereas overridden
NativeOP.call
may be called with arbitrarily many.
This call
correctly calls the base method, but does not match the signature of the overriding method.
Comment on lines +100 to +107
def call(
self,
coord_ext: torch.Tensor,
atype_ext: torch.Tensor,
nlist: torch.Tensor,
mapping: torch.Tensor | None = None,
fparam: torch.Tensor | None = None,
) -> Any:

Check warning

Code scanning / CodeQL

Signature mismatch in overriding method Warning

This method requires at least 4 positional arguments, whereas overridden
NativeOP.call
may be called with 1.
This call
correctly calls the base method, but does not match the signature of the overriding method.
This method does not accept arbitrary keyword arguments, which overridden
NativeOP.call
does.
This call
correctly calls the base method, but does not match the signature of the overriding method.
This method requires at most 6 positional arguments, whereas overridden
NativeOP.call
may be called with arbitrarily many.
This call
correctly calls the base method, but does not match the signature of the overriding method.
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: adae1e005e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 17, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds descriptor compression: descriptors gain a compress flag and serialization changes; pt_expt gains enable_compression APIs that build DPTabulate tables, store tabulation buffers and precomputed type-embeddings, and route forward to compressed execution; fake-op registration, a CLI entrypoint, model min_nbor_dist persistence, and tests were added.

Changes

Cohort / File(s) Summary
dpmodel descriptor flag
deepmd/dpmodel/descriptor/dpa1.py, deepmd/dpmodel/descriptor/dpa2.py, deepmd/dpmodel/descriptor/se_atten_v2.py, deepmd/dpmodel/descriptor/se_e2_a.py, deepmd/dpmodel/descriptor/se_r.py, deepmd/dpmodel/descriptor/se_t.py, deepmd/dpmodel/descriptor/se_t_tebd.py
Initialize self.compress = False; update serialize()/deserialize() to include/accept optional compress blocks and bump accepted @version where applicable.
pt_expt DPA1 compression
deepmd/pt_expt/descriptor/dpa1.py
Add enable_compression, _store_compress_data, _store_type_embd_data, compressed call dispatch and _call_compressed using precomputed buffers and tabulation ops.
pt_expt DPA2 compression
deepmd/pt_expt/descriptor/dpa2.py
Add enable_compression, _store_compress_data, _store_type_embd_data, _call_compressed, and _compressed_repinit_forward enabling compressed repinit/repformer flows with optional geometric tabulation.
pt_expt SeAttenV2 delegation
deepmd/pt_expt/descriptor/se_atten_v2.py
Add wrapper methods delegating compression APIs and compressed call paths to DPA1 implementations.
pt_expt SeA / SeR / SeT / SeT_TEBD compression
deepmd/pt_expt/descriptor/se_e2_a.py, deepmd/pt_expt/descriptor/se_r.py, deepmd/pt_expt/descriptor/se_t.py, deepmd/pt_expt/descriptor/se_t_tebd.py
Add enable_compression, _store_compress_data, (where needed) _store_type_embd_data, _call_compressed, and dispatch logic to use compressed inference when enabled.
pt_expt tabulation & fake ops
deepmd/pt_expt/utils/tabulate.py, deepmd/pt_expt/utils/tabulate_ops.py
New DPTabulate that inspects serialized descriptor metadata; register fake tabulate_fusion_* ops for tracing/export when real C++ ops are absent.
pt_expt entrypoint & CLI
deepmd/pt_expt/entrypoints/compress.py, deepmd/pt_expt/entrypoints/main.py
New enable_compression entrypoint and compress CLI command: resolve min_nbor_dist, call model.enable_compression, and serialize compressed .pte (supports model_json_override).
pt_expt runtime & registry
deepmd/pt_expt/utils/__init__.py, deepmd/pt_expt/common.py
Import-side-effect to register fake tabulate ops; auto-register dpmodel→pt_expt converters when TorchModule defines deserialize.
model persistence
deepmd/pt_expt/model/make_model.py
Add min_nbor_dist property/getter/setter and get_min_nbor_dist() backed by a persistent tensor buffer for serialization.
serialization adjustments (PD/TF/PT backends)
deepmd/{pd,pt,tf}/model/descriptor/*
Bump accepted serialized @version minima and pop optional compress key during deserialize to tolerate compression metadata across backends.
utils serialization
deepmd/pt_expt/utils/serialization.py
deserialize_to_file gains model_json_override parameter to write a custom model.json into exported .pte.
tests: parity & E2E
source/tests/pt_expt/descriptor/*, source/tests/pt_expt/model/test_model_compression.py
Add compressed-vs-uncompressed parity tests per descriptor and an end-to-end model compression/freeze/compress/eval test suite including CLI coverage and min_nbor_dist round-trip.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI (main.py)
    participant Entrypoint as compress.enable_compression
    participant File as .pte file
    participant Model as Deserialized Model
    participant Descriptor as Descriptor.enable_compression
    participant Tabulate as DPTabulate
    participant Buffers as compress_data/compress_info

    CLI->>Entrypoint: invoke compress command
    Entrypoint->>File: read / deserialize .pte
    Entrypoint->>Model: get_min_nbor_dist / compute if needed
    Entrypoint->>Descriptor: Descriptor.enable_compression(min_nbor_dist,...)
    Descriptor->>Tabulate: build table(s) from serialized descrpt
    Tabulate->>Descriptor: return table buffers
    Descriptor->>Buffers: _store_compress_data() / _store_type_embd_data()
    Descriptor->>Descriptor: set self.compress = True
    Entrypoint->>File: write / serialize compressed .pte
Loading
sequenceDiagram
    participant Input as Input Tensors
    participant Descriptor as Descriptor.call
    participant Flag as self.compress
    participant Compressed as _call_compressed
    participant Op as tabulate_fusion_* op
    participant Buffers as compress_data/compress_info
    participant Output as Result

    Input->>Descriptor: coord_ext, atype_ext, nlist
    Descriptor->>Flag: check compress
    alt compress enabled
        Descriptor->>Compressed: run compressed path
        Compressed->>Buffers: load compress_data/compress_info
        Compressed->>Op: invoke tabulate_fusion_* with buffers
        Op->>Compressed: return tabulated outputs
        Compressed->>Output: assemble final tensor (+aux)
    else uncompressed
        Descriptor->>Descriptor: run original call path
        Descriptor->>Output: return original outputs
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested labels

breaking change, OP, C++

Suggested reviewers

  • njzjz
  • iProzd
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.16% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding model compression support to the pt_expt backend, which is the primary objective of this substantial PR touching descriptor serialization, compression enablement, and custom operator integration.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (14)
deepmd/dpmodel/descriptor/se_atten_v2.py (1)

199-199: Redundant initialization, but harmless.

DescrptSeAttenV2 explicitly calls DescrptDPA1.__init__() which already sets self.compress = False (line 347 in dpa1.py). Re-setting it here is redundant but ensures the child class explicitly controls its compression state for clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/dpmodel/descriptor/se_atten_v2.py` at line 199, Remove the redundant
assignment to self.compress in DescrptSeAttenV2.__init__; DescrptDPA1.__init__
already initializes self.compress = False, so delete the line "self.compress =
False" in the DescrptSeAttenV2 constructor to avoid duplication while preserving
behavior.
source/tests/pt_expt/descriptor/test_se_e2_a.py (1)

135-163: Also trace/export the compressed path.

This only validates eager execution after enable_compression(). The new fake-op registrations are there specifically to make torch.export/make_fx work on compressed models, so please add one compressed export/tracing assertion here or in a shared helper.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/pt_expt/descriptor/test_se_e2_a.py` around lines 135 - 163, The
test_compressed_forward currently only checks eager outputs after
dd0.enable_compression(0.5); add a compressed export/tracing check by exporting
or tracing the compressed DescrptSeA instance (e.g., using torch.export or
torch.fx.make_fx on dd0 after enable_compression) and run the traced/exported
graph on the same coord_ext/atype_ext/nlist inputs, then assert the
traced/export outputs match result_ref/result_cmp (use
torch.testing.assert_close with the same atol/rtol); locate the test function
test_compressed_forward and the DescrptSeA instance/enable_compression call to
insert this additional export/trace + assertion.
source/tests/pt_expt/descriptor/test_dpa1.py (1)

151-160: Keep an attention-enabled case in this compression test.

attn_layer=0 only exercises the no-attention configuration, while the rest of this file uses attn_layer=2. Reusing a nonzero layer count here would make the new compression coverage much closer to the DPA1 path this feature is targeting. If compression is intentionally limited to attn_layer=0, please assert that explicitly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/pt_expt/descriptor/test_dpa1.py` around lines 151 - 160, The
test instantiates DescrptDPA1 with attn_layer=0 which only covers the
no-attention path; update the instantiation to use a nonzero attention layer
(e.g., attn_layer=2) to exercise the attention-enabled compression path
consistent with other tests, or if compression is intentionally limited to
no-attention, add an explicit assertion or comment near the DescrptDPA1 creation
(referencing DescrptDPA1 and attn_layer) that verifies/declares attn_layer==0
and documents the intentional limitation.
deepmd/pt_expt/entrypoints/main.py (1)

198-210: Add a smoke test for the new compress dispatch.

This branch depends on parser-provided fields (step, frequency, training_script) that are easy to mis-wire and aren't exercised anywhere in this diff. A tiny main() dispatch test would catch CLI regressions early.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/entrypoints/main.py` around lines 198 - 210, Add a small smoke
test that exercises the "compress" dispatch in main: call the main entrypoint
(or the function that reads FLAGS/FLAGS.command) with FLAGS.command set to
"compress" and set the parser-provided fields FLAGS.step, FLAGS.frequency,
FLAGS.training_script (and FLAGS.INPUT/FLAGS.output) to safe dummy values, then
assert that enable_compression (imported in the snippet) is invoked (by
patching/mocking it) or that main returns/executes without error; this ensures
the dispatch that calls enable_compression(input_file=..., stride=...,
extrapolate=..., check_frequency=..., training_script=...) is wired correctly
and prevents CLI regressions.
deepmd/pt_expt/descriptor/se_e2_a.py (3)

172-192: Add strict=True to zip for safety.

Same recommendation for the non-type_one_side branch.

♻️ Suggested fix
         else:
             atype_loc = atype_ext[:, :nloc].reshape(nfnl)
             for embedding_idx, (compress_data_ii, compress_info_ii) in enumerate(
-                zip(self.compress_data, self.compress_info)
+                zip(self.compress_data, self.compress_info, strict=True)
             ):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/descriptor/se_e2_a.py` around lines 172 - 192, The zip over
self.compress_data and self.compress_info used in the loop
(zip(self.compress_data, self.compress_info) in the block that iterates
embedding_idx and calls torch.ops.deepmd.tabulate_fusion_se_a) should be made
safe by adding strict=True so mismatched lengths raise immediately; update that
zip call and the analogous zip call in the non-type_one_side branch to
zip(self.compress_data, self.compress_info, strict=True) to ensure length
mismatches fail fast and prevent silent truncation.

133-139: Prefix unused variable with underscore.

The diff variable is unpacked but never used.

♻️ Suggested fix
-        rr, diff, ww = self.env_mat.call(
+        rr, _diff, ww = self.env_mat.call(
             coord_ext,
             atype_ext,
             nlist,
             self.davg[...],
             self.dstd[...],
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/descriptor/se_e2_a.py` around lines 133 - 139, The unpacked
return from self.env_mat.call assigns rr, diff, ww but diff is unused; update
the tuple unpacking at the env_mat.call invocation to prefix the unused value
(e.g., rr, _diff, ww or rr, _, ww) so the unused variable is clearly marked, and
verify no other references to diff remain in the surrounding code (look for rr,
diff, ww in the call site and any subsequent uses).

154-171: Add strict=True to zip for safety.

Since compress_data and compress_info are created together with matching lengths, adding strict=True serves as a runtime assertion that catches potential bugs if the invariant is ever broken.

♻️ Suggested fix
         if self.type_one_side:
             for embedding_idx, (compress_data_ii, compress_info_ii) in enumerate(
-                zip(self.compress_data, self.compress_info)
+                zip(self.compress_data, self.compress_info, strict=True)
             ):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/descriptor/se_e2_a.py` around lines 154 - 171, In the loop
gated by self.type_one_side that currently does "for embedding_idx,
(compress_data_ii, compress_info_ii) in enumerate(zip(self.compress_data,
self.compress_info)):", change the zip call to use strict=True so the runtime
asserts the two sequences have identical lengths (i.e., zip(self.compress_data,
self.compress_info, strict=True)); update the loop header where compress_data
and compress_info are enumerated (referenced by embedding_idx, compress_data_ii,
compress_info_ii) so any mismatch will raise immediately while leaving the rest
of the logic (including calls to torch.ops.deepmd.tabulate_fusion_se_a and
accumulation into xyz_scatter) unchanged.
deepmd/pt_expt/descriptor/dpa1.py (2)

176-182: Prefix unused variable with underscore.

The diff variable is unpacked but never used.

♻️ Suggested fix
-        rr, diff, sw = self.se_atten.env_mat.call(
+        rr, _diff, sw = self.se_atten.env_mat.call(
             coord_ext,
             atype_ext,
             nlist,
             self.se_atten.mean[...],
             self.se_atten.stddev[...],
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/descriptor/dpa1.py` around lines 176 - 182, The unpacked
variable diff returned by self.se_atten.env_mat.call(...) is unused; change the
unpacking to prefix that element with an underscore (e.g., _diff or _) so it's
clear it is intentionally ignored — update the tuple assignment rr, diff, sw =
self.se_atten.env_mat.call(...) to rr, _diff, sw =
self.se_atten.env_mat.call(...) (references: rr, diff/_diff, sw,
self.se_atten.env_mat.call, coord_ext, atype_ext, nlist, self.se_atten.mean,
self.se_atten.stddev).

63-65: Consider using explicit exception instead of assert for validation.

Using assert for input validation is problematic because assertions can be disabled with Python's -O flag. For production code validation that should always run, prefer explicit if/raise.

♻️ Suggested fix
-        assert not self.se_atten.resnet_dt, (
-            "Model compression error: descriptor resnet_dt must be false!"
-        )
+        if self.se_atten.resnet_dt:
+            raise ValueError("Model compression error: descriptor resnet_dt must be false!")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/descriptor/dpa1.py` around lines 63 - 65, Replace the
runtime-only assert with an explicit validation that always runs: check the
boolean flag self.se_atten.resnet_dt in the constructor or init path in dpa1.py
and if it is True raise a clear exception (e.g., ValueError or RuntimeError)
with the same message "Model compression error: descriptor resnet_dt must be
false!" so the validation is enforced even when Python optimizations are
enabled.
deepmd/pt_expt/descriptor/se_r.py (2)

122-129: Prefix unused variable with underscore.

The diff variable is unpacked but never used.

♻️ Suggested fix
-        rr, diff, ww = self.env_mat.call(
+        rr, _diff, ww = self.env_mat.call(
             coord_ext,
             atype_ext,
             nlist,
             self.davg[...],
             self.dstd[...],
             True,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/descriptor/se_r.py` around lines 122 - 129, The unpacked
variable named diff from the call to self.env_mat.call (assigned as rr, diff,
ww) is unused; change the unpacking to underscore-prefixed name(s) (e.g., rr,
_diff, ww or rr, _, ww) so the unused value is clearly marked. Update the
assignment site where self.env_mat.call is invoked (in the function/method
containing rr, diff, ww and using coord_ext, atype_ext, nlist, self.davg,
self.dstd) to use the underscore-prefixed variable and leave the rest of the
logic unchanged.

139-153: Add strict=True to zip for safety.

♻️ Suggested fix
         for ii, (compress_data_ii, compress_info_ii) in enumerate(
-            zip(self.compress_data, self.compress_info)
+            zip(self.compress_data, self.compress_info, strict=True)
         ):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/descriptor/se_r.py` around lines 139 - 153, The loop zips
self.compress_data and self.compress_info without strict checking; change the
zip call in the for loop inside the method (the for ii, (compress_data_ii,
compress_info_ii) in enumerate(zip(self.compress_data, self.compress_info))) to
use zip(..., strict=True) so mismatched lengths raise immediately and avoid
silent misalignment when producing mm/ss/xyz_scatter in that block (refer to
variables compress_data_ii, compress_info_ii, sec, ss, and the
torch.ops.deepmd.tabulate_fusion_se_r call to locate the code).
deepmd/pt_expt/utils/tabulate.py (1)

40-47: Avoid mutable default argument and function call in defaults.

Using a mutable list [] as a default argument and calling ActivationFn("tanh") in the signature can cause subtle bugs. The mutable default is shared across all calls, and the function call default is evaluated once at definition time.

♻️ Suggested fix
     def __init__(
         self,
         descrpt: Any,
         neuron: list[int],
         type_one_side: bool = False,
-        exclude_types: list[list[int]] = [],
-        activation_fn: ActivationFn = ActivationFn("tanh"),
+        exclude_types: list[list[int]] | None = None,
+        activation_fn: ActivationFn | None = None,
     ) -> None:
+        if exclude_types is None:
+            exclude_types = []
+        if activation_fn is None:
+            activation_fn = ActivationFn("tanh")
         # Call BaseTabulate.__init__ directly (skip DPTabulatePT.__init__)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/utils/tabulate.py` around lines 40 - 47, The __init__
signature for the class uses a mutable default for exclude_types and calls
ActivationFn("tanh") at definition time; change the parameters to use
exclude_types: Optional[list[list[int]]] = None and activation_fn:
Optional[ActivationFn] = None, then inside __init__ set self.exclude_types = []
if exclude_types is None else exclude_types (or a shallow copy) and set
self.activation_fn = activation_fn if activation_fn is not None else
ActivationFn("tanh"); update any references to use these instance attributes
(referencing the __init__ method, parameter names exclude_types and
activation_fn, and ActivationFn) to avoid shared mutable state and premature
evaluation.
deepmd/pt_expt/descriptor/se_t_tebd.py (1)

159-165: Prefix unused variable with underscore.

The diff variable is unpacked but never used. Prefix it with an underscore to indicate it's intentionally ignored and satisfy the linter.

♻️ Suggested fix
-        rr, diff, sw = self.se_ttebd.env_mat.call(
+        rr, _diff, sw = self.se_ttebd.env_mat.call(
             coord_ext,
             atype_ext,
             nlist,
             self.se_ttebd.mean[...],
             self.se_ttebd.stddev[...],
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/descriptor/se_t_tebd.py` around lines 159 - 165, The unpacking
from self.se_ttebd.env_mat.call currently assigns rr, diff, sw but diff is never
used; update the unpacking in the caller of self.se_ttebd.env_mat.call (where
rr, diff, sw are assigned) to prefix the unused variable with an underscore
(e.g., rr, _diff, sw or rr, _, sw) so the intent to ignore it is clear to
linters and readers; leave rr and sw as-is and ensure any references to the old
name aren’t used elsewhere.
deepmd/pt_expt/utils/tabulate_ops.py (1)

26-29: Consider adding a comment explaining the side-effect access.

The static analysis tool flags this as a useless expression (B018), but it intentionally triggers lazy loading of the custom op library. Adding a brief comment would clarify intent and silence the warning.

💡 Suggested improvement
 # Ensure the custom op library is loaded (if available)
 try:
-    torch.ops.deepmd
+    torch.ops.deepmd  # Access triggers lazy library loading
 except AttributeError:
     pass
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/utils/tabulate_ops.py` around lines 26 - 29, The try/except
that accesses torch.ops.deepmd is intentionally used to trigger lazy loading of
the custom op library, so add a short comment immediately above that block
clarifying this side-effect (e.g., "Access torch.ops.deepmd to force lazy-load
the custom op library; keep except to avoid import error") and optionally append
a linter suppression (e.g., # noqa: B018) to the line to silence the
static-analysis warning while leaving the existing try/except (torch.ops.deepmd
and the AttributeError handler) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/pt_expt/descriptor/dpa2.py`:
- Line 215: Unpackings in dpa2.py are creating unused variables causing RUF059;
change the unpack of nlist.shape in the scope where nframes, nloc, nnei =
nlist.shape to use a prefixed unused name for the third value (e.g., nframes,
nloc, _nnei = nlist.shape) and likewise rename the unused variable diff at the
other occurrence to _diff so intent is explicit; update both occurrences
referenced in the file and then run ruff check . and ruff format . before
committing.

In `@deepmd/pt_expt/descriptor/se_t.py`:
- Line 133: Rename the unused unpacked variable diff from the call rr, diff, ww
= self.env_mat.call(...) to rr, _diff, ww = self.env_mat.call(...) in the
compressed forward path to satisfy RUF059, and update the subsequent loop that
currently uses zip(param_list_a, param_list_b) to zip(param_list_a,
param_list_b, strict=True) (use the exact parameter list variable names present
around line 155) to enforce equal-length iteration and satisfy B905; make these
two edits inside the same compressed forward function/method where
self.env_mat.call and the zip loop occur (preserving all other logic).

In `@deepmd/pt_expt/entrypoints/compress.py`:
- Around line 91-98: The call to model.enable_compression currently passes
arguments in the wrong order so extrapolate is being used as min_nbor_dist;
replace the call to model.enable_compression(...) so that min_nbor_dist_val is
supplied as the first positional argument (per the enable_compression signature
in descriptor/se_t.py), followed by extrapolate, stride, stride * 10, and
check_frequency; update the invocation at the model.enable_compression(...) site
to pass min_nbor_dist_val first to ensure correct table bounds.

In `@source/tests/pt_expt/descriptor/test_dpa2.py`:
- Around line 243-244: The tuple unpacking of self.nlist.shape assigns unused
variables nf and nloc causing a RUF059 lint error; change the unpack to use
placeholders (e.g., "_, _, nnei = self.nlist.shape" or "_, _, nnei = ..." ) so
only nnei is kept and rng = np.random.default_rng(GLOBAL_SEED) remains
unchanged—update the line that currently reads "nf, nloc, nnei =
self.nlist.shape" to use underscores for the unused members.

---

Nitpick comments:
In `@deepmd/dpmodel/descriptor/se_atten_v2.py`:
- Line 199: Remove the redundant assignment to self.compress in
DescrptSeAttenV2.__init__; DescrptDPA1.__init__ already initializes
self.compress = False, so delete the line "self.compress = False" in the
DescrptSeAttenV2 constructor to avoid duplication while preserving behavior.

In `@deepmd/pt_expt/descriptor/dpa1.py`:
- Around line 176-182: The unpacked variable diff returned by
self.se_atten.env_mat.call(...) is unused; change the unpacking to prefix that
element with an underscore (e.g., _diff or _) so it's clear it is intentionally
ignored — update the tuple assignment rr, diff, sw =
self.se_atten.env_mat.call(...) to rr, _diff, sw =
self.se_atten.env_mat.call(...) (references: rr, diff/_diff, sw,
self.se_atten.env_mat.call, coord_ext, atype_ext, nlist, self.se_atten.mean,
self.se_atten.stddev).
- Around line 63-65: Replace the runtime-only assert with an explicit validation
that always runs: check the boolean flag self.se_atten.resnet_dt in the
constructor or init path in dpa1.py and if it is True raise a clear exception
(e.g., ValueError or RuntimeError) with the same message "Model compression
error: descriptor resnet_dt must be false!" so the validation is enforced even
when Python optimizations are enabled.

In `@deepmd/pt_expt/descriptor/se_e2_a.py`:
- Around line 172-192: The zip over self.compress_data and self.compress_info
used in the loop (zip(self.compress_data, self.compress_info) in the block that
iterates embedding_idx and calls torch.ops.deepmd.tabulate_fusion_se_a) should
be made safe by adding strict=True so mismatched lengths raise immediately;
update that zip call and the analogous zip call in the non-type_one_side branch
to zip(self.compress_data, self.compress_info, strict=True) to ensure length
mismatches fail fast and prevent silent truncation.
- Around line 133-139: The unpacked return from self.env_mat.call assigns rr,
diff, ww but diff is unused; update the tuple unpacking at the env_mat.call
invocation to prefix the unused value (e.g., rr, _diff, ww or rr, _, ww) so the
unused variable is clearly marked, and verify no other references to diff remain
in the surrounding code (look for rr, diff, ww in the call site and any
subsequent uses).
- Around line 154-171: In the loop gated by self.type_one_side that currently
does "for embedding_idx, (compress_data_ii, compress_info_ii) in
enumerate(zip(self.compress_data, self.compress_info)):", change the zip call to
use strict=True so the runtime asserts the two sequences have identical lengths
(i.e., zip(self.compress_data, self.compress_info, strict=True)); update the
loop header where compress_data and compress_info are enumerated (referenced by
embedding_idx, compress_data_ii, compress_info_ii) so any mismatch will raise
immediately while leaving the rest of the logic (including calls to
torch.ops.deepmd.tabulate_fusion_se_a and accumulation into xyz_scatter)
unchanged.

In `@deepmd/pt_expt/descriptor/se_r.py`:
- Around line 122-129: The unpacked variable named diff from the call to
self.env_mat.call (assigned as rr, diff, ww) is unused; change the unpacking to
underscore-prefixed name(s) (e.g., rr, _diff, ww or rr, _, ww) so the unused
value is clearly marked. Update the assignment site where self.env_mat.call is
invoked (in the function/method containing rr, diff, ww and using coord_ext,
atype_ext, nlist, self.davg, self.dstd) to use the underscore-prefixed variable
and leave the rest of the logic unchanged.
- Around line 139-153: The loop zips self.compress_data and self.compress_info
without strict checking; change the zip call in the for loop inside the method
(the for ii, (compress_data_ii, compress_info_ii) in
enumerate(zip(self.compress_data, self.compress_info))) to use zip(...,
strict=True) so mismatched lengths raise immediately and avoid silent
misalignment when producing mm/ss/xyz_scatter in that block (refer to variables
compress_data_ii, compress_info_ii, sec, ss, and the
torch.ops.deepmd.tabulate_fusion_se_r call to locate the code).

In `@deepmd/pt_expt/descriptor/se_t_tebd.py`:
- Around line 159-165: The unpacking from self.se_ttebd.env_mat.call currently
assigns rr, diff, sw but diff is never used; update the unpacking in the caller
of self.se_ttebd.env_mat.call (where rr, diff, sw are assigned) to prefix the
unused variable with an underscore (e.g., rr, _diff, sw or rr, _, sw) so the
intent to ignore it is clear to linters and readers; leave rr and sw as-is and
ensure any references to the old name aren’t used elsewhere.

In `@deepmd/pt_expt/entrypoints/main.py`:
- Around line 198-210: Add a small smoke test that exercises the "compress"
dispatch in main: call the main entrypoint (or the function that reads
FLAGS/FLAGS.command) with FLAGS.command set to "compress" and set the
parser-provided fields FLAGS.step, FLAGS.frequency, FLAGS.training_script (and
FLAGS.INPUT/FLAGS.output) to safe dummy values, then assert that
enable_compression (imported in the snippet) is invoked (by patching/mocking it)
or that main returns/executes without error; this ensures the dispatch that
calls enable_compression(input_file=..., stride=..., extrapolate=...,
check_frequency=..., training_script=...) is wired correctly and prevents CLI
regressions.

In `@deepmd/pt_expt/utils/tabulate_ops.py`:
- Around line 26-29: The try/except that accesses torch.ops.deepmd is
intentionally used to trigger lazy loading of the custom op library, so add a
short comment immediately above that block clarifying this side-effect (e.g.,
"Access torch.ops.deepmd to force lazy-load the custom op library; keep except
to avoid import error") and optionally append a linter suppression (e.g., #
noqa: B018) to the line to silence the static-analysis warning while leaving the
existing try/except (torch.ops.deepmd and the AttributeError handler) unchanged.

In `@deepmd/pt_expt/utils/tabulate.py`:
- Around line 40-47: The __init__ signature for the class uses a mutable default
for exclude_types and calls ActivationFn("tanh") at definition time; change the
parameters to use exclude_types: Optional[list[list[int]]] = None and
activation_fn: Optional[ActivationFn] = None, then inside __init__ set
self.exclude_types = [] if exclude_types is None else exclude_types (or a
shallow copy) and set self.activation_fn = activation_fn if activation_fn is not
None else ActivationFn("tanh"); update any references to use these instance
attributes (referencing the __init__ method, parameter names exclude_types and
activation_fn, and ActivationFn) to avoid shared mutable state and premature
evaluation.

In `@source/tests/pt_expt/descriptor/test_dpa1.py`:
- Around line 151-160: The test instantiates DescrptDPA1 with attn_layer=0 which
only covers the no-attention path; update the instantiation to use a nonzero
attention layer (e.g., attn_layer=2) to exercise the attention-enabled
compression path consistent with other tests, or if compression is intentionally
limited to no-attention, add an explicit assertion or comment near the
DescrptDPA1 creation (referencing DescrptDPA1 and attn_layer) that
verifies/declares attn_layer==0 and documents the intentional limitation.

In `@source/tests/pt_expt/descriptor/test_se_e2_a.py`:
- Around line 135-163: The test_compressed_forward currently only checks eager
outputs after dd0.enable_compression(0.5); add a compressed export/tracing check
by exporting or tracing the compressed DescrptSeA instance (e.g., using
torch.export or torch.fx.make_fx on dd0 after enable_compression) and run the
traced/exported graph on the same coord_ext/atype_ext/nlist inputs, then assert
the traced/export outputs match result_ref/result_cmp (use
torch.testing.assert_close with the same atol/rtol); locate the test function
test_compressed_forward and the DescrptSeA instance/enable_compression call to
insert this additional export/trace + assertion.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 16ab0304-492e-4fb6-af81-6e1a61b3394a

📥 Commits

Reviewing files that changed from the base of the PR and between 4ce5c73 and adae1e0.

📒 Files selected for processing (26)
  • deepmd/dpmodel/descriptor/dpa1.py
  • deepmd/dpmodel/descriptor/dpa2.py
  • deepmd/dpmodel/descriptor/se_atten_v2.py
  • deepmd/dpmodel/descriptor/se_e2_a.py
  • deepmd/dpmodel/descriptor/se_r.py
  • deepmd/dpmodel/descriptor/se_t.py
  • deepmd/dpmodel/descriptor/se_t_tebd.py
  • deepmd/pt_expt/descriptor/dpa1.py
  • deepmd/pt_expt/descriptor/dpa2.py
  • deepmd/pt_expt/descriptor/se_atten_v2.py
  • deepmd/pt_expt/descriptor/se_e2_a.py
  • deepmd/pt_expt/descriptor/se_r.py
  • deepmd/pt_expt/descriptor/se_t.py
  • deepmd/pt_expt/descriptor/se_t_tebd.py
  • deepmd/pt_expt/entrypoints/compress.py
  • deepmd/pt_expt/entrypoints/main.py
  • deepmd/pt_expt/utils/__init__.py
  • deepmd/pt_expt/utils/tabulate.py
  • deepmd/pt_expt/utils/tabulate_ops.py
  • source/tests/pt_expt/descriptor/test_dpa1.py
  • source/tests/pt_expt/descriptor/test_dpa2.py
  • source/tests/pt_expt/descriptor/test_se_atten_v2.py
  • source/tests/pt_expt/descriptor/test_se_e2_a.py
  • source/tests/pt_expt/descriptor/test_se_r.py
  • source/tests/pt_expt/descriptor/test_se_t.py
  • source/tests/pt_expt/descriptor/test_se_t_tebd.py

@codecov
Copy link

codecov bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 91.36164% with 59 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.41%. Comparing base (b2c8511) to head (7f7a34d).

Files with missing lines Patch % Lines
deepmd/pt_expt/descriptor/dpa2.py 84.92% 19 Missing ⚠️
deepmd/pt_expt/entrypoints/compress.py 58.06% 13 Missing ⚠️
deepmd/pt_expt/descriptor/dpa1.py 90.09% 10 Missing ⚠️
deepmd/pt_expt/utils/tabulate_ops.py 79.41% 7 Missing ⚠️
deepmd/pt_expt/utils/tabulate.py 92.30% 3 Missing ⚠️
deepmd/pt_expt/descriptor/se_t_tebd.py 97.67% 2 Missing ⚠️
deepmd/pt_expt/entrypoints/main.py 71.42% 2 Missing ⚠️
deepmd/pt_expt/descriptor/se_e2_a.py 98.80% 1 Missing ⚠️
deepmd/pt_expt/descriptor/se_r.py 98.24% 1 Missing ⚠️
deepmd/pt_expt/descriptor/se_t.py 98.61% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5323      +/-   ##
==========================================
+ Coverage   82.33%   82.41%   +0.07%     
==========================================
  Files         777      780       +3     
  Lines       77814    78497     +683     
  Branches     3680     3675       -5     
==========================================
+ Hits        64068    64693     +625     
- Misses      12574    12630      +56     
- Partials     1172     1174       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

… compression pipeline

- torch_module decorator auto-registers dpmodel base classes in the
  converter registry, so auto-wrapped parent objects (e.g. atomic models)
  get pt_expt sub-components with enable_compression support
- Store min_nbor_dist as a buffer on the model (property with -1.0
  sentinel), matching the pt backend behavior for serialization
- Clean up compress entry point: remove dead fallback, set min_nbor_dist
  on model before calling enable_compression
- Add end-to-end model compression tests: model.enable_compression,
  freeze→compress→eval pipeline, min_nbor_dist round-trip
@wanghan-iapcm wanghan-iapcm marked this pull request as ready for review March 17, 2026 16:08
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
deepmd/pt_expt/entrypoints/compress.py (1)

91-97: ⚠️ Potential issue | 🔴 Critical

Critical bug: min_nbor_dist is not passed to enable_compression.

The enable_compression method on descriptors (see deepmd/pt_expt/descriptor/se_e2_a.py:29-36) expects min_nbor_dist as the first positional argument:

def enable_compression(
    self,
    min_nbor_dist: float,
    table_extrapolate: float = 5,
    table_stride_1: float = 0.01,
    table_stride_2: float = 0.1,
    check_frequency: int = -1,
) -> None:

Currently, extrapolate is passed as the first argument, which will be interpreted as min_nbor_dist, causing incorrect table bounds during compression.

🐛 Proposed fix
     # 3. Enable compression
     log.info("Enabling compression...")
     model.enable_compression(
+        min_nbor_dist,
         extrapolate,
         stride,
         stride * 10,
         check_frequency,
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/entrypoints/compress.py` around lines 91 - 97, The call to
model.enable_compression is passing arguments in the wrong order — the
descriptor method enable_compression expects min_nbor_dist as the first
parameter. Update the call to include the min_nbor_dist variable as the first
argument (so the call becomes model.enable_compression(min_nbor_dist,
extrapolate, stride, stride * 10, check_frequency)), ensuring the parameters
align with the enable_compression signature in descriptor/se_e2_a.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@deepmd/pt_expt/entrypoints/compress.py`:
- Around line 91-97: The call to model.enable_compression is passing arguments
in the wrong order — the descriptor method enable_compression expects
min_nbor_dist as the first parameter. Update the call to include the
min_nbor_dist variable as the first argument (so the call becomes
model.enable_compression(min_nbor_dist, extrapolate, stride, stride * 10,
check_frequency)), ensuring the parameters align with the enable_compression
signature in descriptor/se_e2_a.py.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d0856493-8293-4f4e-81a0-8f835b074124

📥 Commits

Reviewing files that changed from the base of the PR and between adae1e0 and c343a10.

📒 Files selected for processing (4)
  • deepmd/pt_expt/common.py
  • deepmd/pt_expt/entrypoints/compress.py
  • deepmd/pt_expt/model/make_model.py
  • source/tests/pt_expt/model/test_model_compression.py

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c343a10521

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

meta/fake tensor implementations for custom ops.
"""
# --- tabulate_fusion_se_a ---
if hasattr(torch.ops.deepmd, "tabulate_fusion_se_a"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Register fake tabulate ops after custom ops are loaded

_register_fake_if_available() only installs fake kernels when hasattr(torch.ops.deepmd, ...) is already true, but in pt_expt the module is imported early via deepmd.pt_expt.common._ensure_registrations() before deepmd.pt loads the custom library (deepmd/pt/__init__.py). In that import order every check is skipped and registration is never retried, so later symbolic tracing in deserialize_to_file(..., tracing_mode="symbolic") can hit torch.ops.deepmd.tabulate_fusion_* without fake/meta implementations and fail during compression export.

Useful? React with 👍 / 👎.

Comment on lines +101 to +104
data = {
"model": model.serialize(),
"model_def_script": model_dict.get("model_def_script"),
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve min_nbor_dist when writing compressed .pte metadata

The output payload omits min_nbor_dist even though this command reads that field on input (model_dict.get("min_nbor_dist")), and BaseModel.serialize() does not include model-level neighbor stats. As a result, a once-compressed .pte can lose neighbor-distance metadata in model.json, so re-running dp --pt_expt compress on that artifact may incorrectly require --training-script again.

Useful? React with 👍 / 👎.

@njzjz
Copy link
Member

njzjz commented Mar 18, 2026

pre-commit.ci autofix

# Conflicts:
#	deepmd/pt_expt/entrypoints/main.py
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/pt_expt/entrypoints/main.py`:
- Around line 283-285: The call to enable_compression is using the wrong
argparse attribute name (FLAGS.INPUT) which will raise AttributeError because
the compress parser defines -i/--input stored as FLAGS.input; update the call to
use FLAGS.input (i.e., replace FLAGS.INPUT with FLAGS.input) when invoking
enable_compression and ensure other callers use the correct FLAGS.<name> casing;
the function referenced is enable_compression and the argparse namespace is
FLAGS.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 09f64a66-637b-4560-9429-ee2d2e757f3d

📥 Commits

Reviewing files that changed from the base of the PR and between c343a10 and 83a245d.

📒 Files selected for processing (1)
  • deepmd/pt_expt/entrypoints/main.py

Han Wang added 3 commits March 19, 2026 22:30
# Conflicts:
#	deepmd/pt_expt/entrypoints/main.py
- Fix FLAGS.INPUT → FLAGS.input in compress command dispatch (critical bug)
- Add .pte suffix handling for compress input/output paths
- Preserve min_nbor_dist in compressed .pte output metadata
- Add explanatory comment to empty except in tabulate_ops.py
- Fix unused variables: diff → _diff, remove unused ng, _ placeholders
- Add strict=True to zip(compress_data, compress_info) calls
…-pt-expt-compress

# Conflicts:
#	deepmd/pt_expt/entrypoints/main.py
)


class DPTabulate(DPTabulatePT):

Check failure

Code scanning / CodeQL

Missing call to superclass `__init__` during object initialization Error

This class does not call
DPTabulate.__init__
during initialization. (
DPTabulate.__init__
may be missing a call to a base class __init__)
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/pt_expt/descriptor/dpa1.py`:
- Around line 63-65: Replace the runtime guard that currently uses assert on
self.se_atten.resnet_dt with an explicit if check and raise a RuntimeError when
resnet_dt is True; locate the check in the constructor or init path where
self.se_atten.resnet_dt is inspected (the current assert line) and change it to:
if self.se_atten.resnet_dt: raise RuntimeError("Model compression error:
descriptor resnet_dt must be false!"), ensuring the message matches the original
assertion text.
- Line 175: The unpacked variable "diff" in the tuple returned by
self.se_atten.env_mat.call(...) is unused and triggers Ruff RUF059; update the
unpacking in the method where rr, diff, sw = self.se_atten.env_mat.call(...) to
use a deliberately ignored name (e.g., rr, _, sw or rr, _diff, sw) so the unused
binding is explicit and the linter warning is suppressed, leaving all other
logic unchanged.

In `@deepmd/pt_expt/descriptor/se_t_tebd.py`:
- Around line 178-182: The returned tensor `sw` was flattened to shape (nfnl,
nnei, 1) earlier — restore it to the original 4D shape before returning so
compressed and uncompressed paths match; reshape `sw` using the known dimensions
(e.g., split `nfnl` into `nf` and `nloc` or use `nloc`/`nf` variables available
in this scope) back to (nf, nloc, nnei, 1) (or whatever original 4D ordering is
used elsewhere) right before the function return so the fifth return value has
the same shape in both execution modes, and ensure the device/dtype are
preserved.

In `@deepmd/pt_expt/utils/tabulate_ops.py`:
- Around line 25-29: Remove the redundant preamble that accesses
torch.ops.deepmd inside a try/except: delete the try/except block that simply
evaluates torch.ops.deepmd (the useless expression flagged by Ruff B018). Rely
on the existing guarded checks using hasattr(torch.ops.deepmd, ...) in the
functions (the later checks at lines where hasattr is used) to probe for op
availability; no additional namespace pre-access or exception handling is needed
around torch.ops.deepmd.

In `@deepmd/pt_expt/utils/tabulate.py`:
- Around line 51-54: The __init__ signature of DPTabulate uses a mutable default
(exclude_types: list[list[int]] = []) and a call-expression default
(activation_fn: ActivationFn = ActivationFn("tanh")) which are evaluated at
import time; change the signature to use None sentinels (exclude_types:
Optional[List[List[int]]] = None, activation_fn: Optional[ActivationFn] = None)
and inside DPTabulate.__init__ set exclude_types = [] if None and activation_fn
= ActivationFn("tanh") if None so each instance gets its own defaults and the
ActivationFn call is not evaluated at import.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: aaaf5b18-f53c-4b65-9cf4-43a15e8088fe

📥 Commits

Reviewing files that changed from the base of the PR and between 83a245d and 7f7a34d.

📒 Files selected for processing (12)
  • deepmd/pt_expt/descriptor/dpa1.py
  • deepmd/pt_expt/descriptor/dpa2.py
  • deepmd/pt_expt/descriptor/se_e2_a.py
  • deepmd/pt_expt/descriptor/se_r.py
  • deepmd/pt_expt/descriptor/se_t.py
  • deepmd/pt_expt/descriptor/se_t_tebd.py
  • deepmd/pt_expt/entrypoints/compress.py
  • deepmd/pt_expt/entrypoints/main.py
  • deepmd/pt_expt/utils/tabulate.py
  • deepmd/pt_expt/utils/tabulate_ops.py
  • source/tests/pt_expt/descriptor/test_dpa2.py
  • source/tests/pt_expt/model/test_model_compression.py
🚧 Files skipped from review as they are similar to previous changes (5)
  • source/tests/pt_expt/descriptor/test_dpa2.py
  • deepmd/pt_expt/entrypoints/compress.py
  • deepmd/pt_expt/descriptor/dpa2.py
  • source/tests/pt_expt/model/test_model_compression.py
  • deepmd/pt_expt/descriptor/se_e2_a.py

Comment on lines +63 to +65
assert not self.se_atten.resnet_dt, (
"Model compression error: descriptor resnet_dt must be false!"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '61,65p' deepmd/pt_expt/descriptor/dpa1.py
python - <<'PY'
from pathlib import Path
import ast

tree = ast.parse(Path("deepmd/pt_expt/descriptor/dpa1.py").read_text())
for node in ast.walk(tree):
    if isinstance(node, ast.Assert):
        print(f"assert at line {node.lineno}")
PY

Repository: deepmodeling/deepmd-kit

Length of output: 354


Use explicit exception instead of assert for runtime guard.

Using assert to guard unsupported compression mode is unsafe—assertions can be disabled when Python runs with optimization (-O or -OO flags), allowing resnet_dt=True to bypass this check. Replace with an explicit if statement and raise RuntimeError().

Proposed fix
-        assert not self.se_atten.resnet_dt, (
-            "Model compression error: descriptor resnet_dt must be false!"
-        )
+        if self.se_atten.resnet_dt:
+            raise RuntimeError(
+                "Model compression error: descriptor resnet_dt must be false!"
+            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert not self.se_atten.resnet_dt, (
"Model compression error: descriptor resnet_dt must be false!"
)
if self.se_atten.resnet_dt:
raise RuntimeError(
"Model compression error: descriptor resnet_dt must be false!"
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/descriptor/dpa1.py` around lines 63 - 65, Replace the runtime
guard that currently uses assert on self.se_atten.resnet_dt with an explicit if
check and raise a RuntimeError when resnet_dt is True; locate the check in the
constructor or init path where self.se_atten.resnet_dt is inspected (the current
assert line) and change it to: if self.se_atten.resnet_dt: raise
RuntimeError("Model compression error: descriptor resnet_dt must be false!"),
ensuring the message matches the original assertion text.

) -> Any:
"""Compressed forward for DPA1 descriptor."""
# env_mat: nf x nloc x nnei x 4
rr, diff, sw = self.se_atten.env_mat.call(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '173,176p' deepmd/pt_expt/descriptor/dpa1.py
rg -n 'rr,\s*diff,\s*sw' deepmd/pt_expt/descriptor/dpa1.py

Repository: deepmodeling/deepmd-kit

Length of output: 289


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# First, let's find the method containing line 175 and check if `diff` is used
# Get a larger context around line 175 to understand the method scope
sed -n '160,250p' deepmd/pt_expt/descriptor/dpa1.py | cat -n

# Search for usage of `diff` in the file after line 175
rg -n '\bdiff\b' deepmd/pt_expt/descriptor/dpa1.py | head -20

Repository: deepmodeling/deepmd-kit

Length of output: 4178


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Run ruff check on the file to see if RUF059 is actually reported
ruff check deepmd/pt_expt/descriptor/dpa1.py --select RUF059

Repository: deepmodeling/deepmd-kit

Length of output: 758


Rename the unused diff binding to suppress RUF059.

The variable diff is unpacked on line 175 but never used in the method. Ruff flags this as RUF059.

♻️ Proposed fix
-        rr, diff, sw = self.se_atten.env_mat.call(
+        rr, _diff, sw = self.se_atten.env_mat.call(
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rr, diff, sw = self.se_atten.env_mat.call(
rr, _diff, sw = self.se_atten.env_mat.call(
🧰 Tools
🪛 Ruff (0.15.6)

[warning] 175-175: Unpacked variable diff is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/descriptor/dpa1.py` at line 175, The unpacked variable "diff"
in the tuple returned by self.se_atten.env_mat.call(...) is unused and triggers
Ruff RUF059; update the unpacking in the method where rr, diff, sw =
self.se_atten.env_mat.call(...) to use a deliberately ignored name (e.g., rr, _,
sw or rr, _diff, sw) so the unused binding is explicit and the linter warning is
suppressed, leaving all other logic unchanged.

Comment on lines +178 to +182
sw = torch.where(
nlist_mask[:, :, None],
sw.view(nfnl, nnei, 1),
torch.zeros(nfnl, nnei, 1, dtype=sw.dtype, device=sw.device),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Restore sw to the original 4D shape before returning.

Line 178 flattens sw to (nf * nloc, nnei, 1), and Line 246 returns that flattened tensor unchanged. That makes the fifth return value shape differ between compressed and uncompressed execution.

🐛 Proposed fix
-        return result, None, None, None, sw
+        return result, None, None, None, sw.view(nf, nloc, nnei, 1)

Also applies to: 246-246

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/descriptor/se_t_tebd.py` around lines 178 - 182, The returned
tensor `sw` was flattened to shape (nfnl, nnei, 1) earlier — restore it to the
original 4D shape before returning so compressed and uncompressed paths match;
reshape `sw` using the known dimensions (e.g., split `nfnl` into `nf` and `nloc`
or use `nloc`/`nf` variables available in this scope) back to (nf, nloc, nnei,
1) (or whatever original 4D ordering is used elsewhere) right before the
function return so the fifth return value has the same shape in both execution
modes, and ensure the device/dtype are preserved.

Comment on lines +25 to +29
# Ensure the custom op library is loaded (if available)
try:
torch.ops.deepmd
except AttributeError:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '25,29p' deepmd/pt_expt/utils/tabulate_ops.py
rg -n 'hasattr\(torch\.ops\.deepmd,' deepmd/pt_expt/utils/tabulate_ops.py

Repository: deepmodeling/deepmd-kit

Length of output: 495


🏁 Script executed:

cd deepmd/pt_expt/utils && ruff check tabulate_ops.py && echo "=== Ruff format check ===" && ruff format --diff tabulate_ops.py

Repository: deepmodeling/deepmd-kit

Length of output: 511


🏁 Script executed:

sed -n '45,110p' deepmd/pt_expt/utils/tabulate_ops.py

Repository: deepmodeling/deepmd-kit

Length of output: 2300


🏁 Script executed:

ruff check deepmd/pt_expt/utils/tabulate_ops.py --select=B018

Repository: deepmodeling/deepmd-kit

Length of output: 532


🏁 Script executed:

head -35 deepmd/pt_expt/utils/tabulate_ops.py

Repository: deepmodeling/deepmd-kit

Length of output: 1127


🏁 Script executed:

sed -n '24,35p' deepmd/pt_expt/utils/tabulate_ops.py | cat -n

Repository: deepmodeling/deepmd-kit

Length of output: 478


🏁 Script executed:

ruff check deepmd/pt_expt/utils/tabulate_ops.py

Repository: deepmodeling/deepmd-kit

Length of output: 532


Remove the redundant namespace access.

Line 27 is flagged by Ruff as B018 (useless expression). The preamble does not add value since the later hasattr(torch.ops.deepmd, ...) checks at lines 49, 63, 76, 90, and 106 already safely access the same namespace.

♻️ Proposed fix
-# Ensure the custom op library is loaded (if available)
-try:
-    torch.ops.deepmd
-except AttributeError:
-    pass
-
-
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Ensure the custom op library is loaded (if available)
try:
torch.ops.deepmd
except AttributeError:
pass
🧰 Tools
🪛 Ruff (0.15.6)

[warning] 27-27: Found useless expression. Either assign it to a variable or remove it.

(B018)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/utils/tabulate_ops.py` around lines 25 - 29, Remove the
redundant preamble that accesses torch.ops.deepmd inside a try/except: delete
the try/except block that simply evaluates torch.ops.deepmd (the useless
expression flagged by Ruff B018). Rely on the existing guarded checks using
hasattr(torch.ops.deepmd, ...) in the functions (the later checks at lines where
hasattr is used) to probe for op availability; no additional namespace
pre-access or exception handling is needed around torch.ops.deepmd.

Comment on lines +51 to +54
neuron: list[int],
type_one_side: bool = False,
exclude_types: list[list[int]] = [],
activation_fn: ActivationFn = ActivationFn("tanh"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '48,55p' deepmd/pt_expt/utils/tabulate.py
rg -n 'exclude_types: list\[list\[int\]\] = \[\]|activation_fn: ActivationFn = ActivationFn\("tanh"\)' deepmd/pt_expt/utils/tabulate.py

Repository: deepmodeling/deepmd-kit

Length of output: 415


🏁 Script executed:

cd deepmd/pt_expt/utils && ruff check --select B006,B008 tabulate.py

Repository: deepmodeling/deepmd-kit

Length of output: 1430


Avoid shared defaults in DPTabulate.__init__.

Lines 53–54 have mutable/call-expression defaults that are evaluated once at import time and shared across all instances. Ruff flags B006 and B008 on this signature, and CI will fail per the coding guidelines.

Use None sentinels instead:

Proposed fix
     def __init__(
         self,
         descrpt: Any,
         neuron: list[int],
         type_one_side: bool = False,
-        exclude_types: list[list[int]] = [],
-        activation_fn: ActivationFn = ActivationFn("tanh"),
+        exclude_types: list[list[int]] | None = None,
+        activation_fn: ActivationFn | None = None,
     ) -> None:
+        if exclude_types is None:
+            exclude_types = []
+        if activation_fn is None:
+            activation_fn = ActivationFn("tanh")
         # Skip DPTabulatePT.__init__ to avoid its isinstance check against
🧰 Tools
🪛 Ruff (0.15.6)

[warning] 53-53: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


[warning] 54-54: Do not perform function call ActivationFn in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/utils/tabulate.py` around lines 51 - 54, The __init__
signature of DPTabulate uses a mutable default (exclude_types: list[list[int]] =
[]) and a call-expression default (activation_fn: ActivationFn =
ActivationFn("tanh")) which are evaluated at import time; change the signature
to use None sentinels (exclude_types: Optional[List[List[int]]] = None,
activation_fn: Optional[ActivationFn] = None) and inside DPTabulate.__init__ set
exclude_types = [] if None and activation_fn = ActivationFn("tanh") if None so
each instance gets its own defaults and the ActivationFn call is not evaluated
at import.

…iptors

Extend compression serialization (v3 format from se_e2_a) to all other
compressible descriptors so that `dp compress` output persists tabulated
data through the .pte serialize/deserialize round-trip.

Version bumps: se_r 2→3, se_t 2→3, se_t_tebd 1→2, dpa1 2→3,
se_atten_v2 2→3, dpa2 3→4 (only when compressed; uncompressed models
keep the old version for backward compatibility).

All non-dpmodel backends (pt, pd, tf) accept the new versions and
pop the "compress" key they don't use.

# Compressed forward should match
dtype = torch.float64
coord_ext = torch.tensor(

Check notice

Code scanning / CodeQL

Unused local variable Note test

Variable coord_ext is not used.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
deepmd/tf/descriptor/se_atten_v2.py (1)

139-150: deserialize() compatibility updates are unreachable in the current control flow.

Lines 139–150 never execute because deserialize() unconditionally raises at Line 133. Either remove this dead block or gate the raise so the version/compress path is actually usable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/tf/descriptor/se_atten_v2.py` around lines 139 - 150, The code block
performing check_version_compatibility(...) and popping/using fields
(deserialize_network(...), deserialize_attention_layers(...),
data.pop("env_mat"), variables = data.pop("@variables"), etc.) is unreachable
because deserialize() currently always raises earlier; either remove this dead
block or make the earlier raise conditional so the compatibility/legacy path
runs when applicable. Modify the control flow in deserialize() so the
unconditional raise is gated (e.g., only raised for unsupported versions) and
allow the existing compatibility branch to execute by checking the version or a
flag before invoking check_version_compatibility and the subsequent
deserialize_network/deserialize_attention_layers logic.
deepmd/pt_expt/utils/serialization.py (1)

294-296: Avoid full deepcopy on model_json_override to reduce peak memory during compression export.

In the compression path (deepmd/pt_expt/entrypoints/compress.py, Lines 114–118), model_json_override contains the compressed model tables. Deep-copying that payload here can cause a large temporary memory spike.

Possible adjustment
-    json_source = model_json_override if model_json_override is not None else data
-    data_for_json = deepcopy(json_source)
-    data_for_json = _numpy_to_json_serializable(data_for_json)
+    json_source = model_json_override if model_json_override is not None else data
+    data_for_json = _numpy_to_json_serializable(json_source)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/utils/serialization.py` around lines 294 - 296, The code
currently does deepcopy(json_source) unconditionally which causes a large memory
spike when model_json_override holds compressed tables; change the logic so you
only deepcopy the original data branch and avoid copying model_json_override:
set data_for_json = deepcopy(data) if model_json_override is None else
json_source (or a lightweight shallow copy) and then call
_numpy_to_json_serializable(data_for_json). Update the block that assigns
json_source/data_for_json (variables json_source, model_json_override,
data_for_json) so deepcopy is only used for the un-overridden data path to
prevent duplicating the large compressed payload.
source/tests/pt_expt/model/test_model_compression.py (1)

190-255: Consider moving the import to module level.

The DescrptSeA import on line 204-206 is redundant since it's already imported at module level (line 15-17). This can be removed.

However, the test logic is sound and provides valuable coverage for compression state persistence through serialize/deserialize round-trip, including descriptor-level forward equivalence verification.

♻️ Remove redundant import
         # Serialize and deserialize
         data = desc.serialize()
-        from deepmd.pt_expt.descriptor.se_e2_a import (
-            DescrptSeA,
-        )
-
         desc2 = DescrptSeA.deserialize(data)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/pt_expt/model/test_model_compression.py` around lines 190 - 255,
In test_compress_state_serialized remove the redundant local import of
DescrptSeA (the DescrptSeA import block inside the test) since DescrptSeA is
already imported at module level; delete those lines so the test uses the
module-level DescrptSeA symbol and run tests to confirm nothing else depends on
the local import.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/dpmodel/descriptor/dpa1.py`:
- Around line 661-669: Override _load_compress_data in
deepmd/pt_expt/descriptor/dpa1.py to mirror the PT save logic: take the
deserialized numpy arrays from the base _load_compress_data
(variables["type_embd_data"], variables["compress_data"],
variables["compress_info"]) and convert them into torch tensors and
ParameterList objects the runtime expects — e.g., replace list-of-ndarrays in
self.type_embd_data with list/tuple of torch.tensor(..., dtype=torch.long)
(matching how _store_type_embd_data creates them) and convert self.compress_data
and self.compress_info into torch.nn.ParameterList of
torch.nn.Parameter(tensor.contiguous()) (matching _store_compress_data);
preserve self.geo_compress and set self.compress = True. Ensure conversions use
the same device/dtype semantics used elsewhere in the class so indexing and
.contiguous() calls in call() work.

In `@deepmd/dpmodel/descriptor/se_e2_a.py`:
- Around line 588-601: The dpmodel's _load_compress_data currently assigns raw
arrays to compress_data/compress_info, but the PT subclass's deserialize and
forward expect torch.nn.ParameterList of tensors (used with .contiguous() and
.cpu()); to fix, override _load_compress_data in the PT class (the PT file
deepmd/pt_expt/descriptor/se_e2_a.py) to convert the deserialized
variables["@variables"]["compress_data"] and ["compress_info"] into torch
tensors and wrap them in torch.nn.ParameterList (matching the structure created
by _store_compress_data) so subsequent calls to deserialize()/forward() can call
.contiguous()/.cpu() safely. Ensure you preserve dtype/device semantics when
rebuilding the ParameterList.

---

Nitpick comments:
In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 294-296: The code currently does deepcopy(json_source)
unconditionally which causes a large memory spike when model_json_override holds
compressed tables; change the logic so you only deepcopy the original data
branch and avoid copying model_json_override: set data_for_json = deepcopy(data)
if model_json_override is None else json_source (or a lightweight shallow copy)
and then call _numpy_to_json_serializable(data_for_json). Update the block that
assigns json_source/data_for_json (variables json_source, model_json_override,
data_for_json) so deepcopy is only used for the un-overridden data path to
prevent duplicating the large compressed payload.

In `@deepmd/tf/descriptor/se_atten_v2.py`:
- Around line 139-150: The code block performing
check_version_compatibility(...) and popping/using fields
(deserialize_network(...), deserialize_attention_layers(...),
data.pop("env_mat"), variables = data.pop("@variables"), etc.) is unreachable
because deserialize() currently always raises earlier; either remove this dead
block or make the earlier raise conditional so the compatibility/legacy path
runs when applicable. Modify the control flow in deserialize() so the
unconditional raise is gated (e.g., only raised for unsupported versions) and
allow the existing compatibility branch to execute by checking the version or a
flag before invoking check_version_compatibility and the subsequent
deserialize_network/deserialize_attention_layers logic.

In `@source/tests/pt_expt/model/test_model_compression.py`:
- Around line 190-255: In test_compress_state_serialized remove the redundant
local import of DescrptSeA (the DescrptSeA import block inside the test) since
DescrptSeA is already imported at module level; delete those lines so the test
uses the module-level DescrptSeA symbol and run tests to confirm nothing else
depends on the local import.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0cccd57f-82cc-4352-8c2b-d8113b250dd9

📥 Commits

Reviewing files that changed from the base of the PR and between 7f7a34d and 433a4c6.

📒 Files selected for processing (26)
  • deepmd/dpmodel/descriptor/dpa1.py
  • deepmd/dpmodel/descriptor/dpa2.py
  • deepmd/dpmodel/descriptor/se_atten_v2.py
  • deepmd/dpmodel/descriptor/se_e2_a.py
  • deepmd/dpmodel/descriptor/se_r.py
  • deepmd/dpmodel/descriptor/se_t.py
  • deepmd/dpmodel/descriptor/se_t_tebd.py
  • deepmd/pd/model/descriptor/dpa1.py
  • deepmd/pd/model/descriptor/dpa2.py
  • deepmd/pd/model/descriptor/se_a.py
  • deepmd/pd/model/descriptor/se_atten_v2.py
  • deepmd/pd/model/descriptor/se_t_tebd.py
  • deepmd/pt/model/descriptor/dpa1.py
  • deepmd/pt/model/descriptor/dpa2.py
  • deepmd/pt/model/descriptor/se_a.py
  • deepmd/pt/model/descriptor/se_atten_v2.py
  • deepmd/pt/model/descriptor/se_r.py
  • deepmd/pt/model/descriptor/se_t.py
  • deepmd/pt/model/descriptor/se_t_tebd.py
  • deepmd/pt_expt/entrypoints/compress.py
  • deepmd/pt_expt/utils/serialization.py
  • deepmd/tf/descriptor/se_a.py
  • deepmd/tf/descriptor/se_atten_v2.py
  • deepmd/tf/descriptor/se_r.py
  • deepmd/tf/descriptor/se_t.py
  • source/tests/pt_expt/model/test_model_compression.py
✅ Files skipped from review due to trivial changes (1)
  • deepmd/pt/model/descriptor/se_t_tebd.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • deepmd/pt_expt/entrypoints/compress.py

Comment on lines +661 to +669
def _load_compress_data(self, compress: dict) -> None:
"""Load compression state from serialized data."""
variables = compress["@variables"]
self.type_embd_data = variables["type_embd_data"]
self.geo_compress = compress.get("geo_compress", False)
if self.geo_compress:
self.compress_data = variables["compress_data"]
self.compress_info = variables["compress_info"]
self.compress = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="deepmd/pt_expt/descriptor/dpa1.py"
echo "=== Inspecting ${file} ==="
rg -n -C3 'class DescrptDPA1|def deserialize|def _load_compress_data|type_embd_data|compress_data|compress_info' "$file"

Repository: deepmodeling/deepmd-kit

Length of output: 2958


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Check for deserialize and _load_compress_data in pt_expt/dpa1.py ==="
rg -n 'def (deserialize|_load_compress_data)' deepmd/pt_expt/descriptor/dpa1.py

echo ""
echo "=== Check class inheritance in pt_expt/dpa1.py ==="
rg -n 'class DescrptDPA1' deepmd/pt_expt/descriptor/dpa1.py

echo ""
echo "=== Check dpmodel/dpa1.py deserialize method ==="
rg -n -A20 'def deserialize' deepmd/dpmodel/descriptor/dpa1.py

echo ""
echo "=== Check full class definition of dpmodel DescrptDPA1 ==="
head -30 deepmd/dpmodel/descriptor/dpa1.py | grep -E 'class|from|import'

Repository: deepmodeling/deepmd-kit

Length of output: 139


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Search for class definition in dpmodel/dpa1.py ==="
head -100 deepmd/dpmodel/descriptor/dpa1.py | cat -n

echo ""
echo "=== Search for deserialize in dpmodel/dpa1.py ==="
rg -n 'deserialize' deepmd/dpmodel/descriptor/dpa1.py | head -20

echo ""
echo "=== Check if _load_compress_data is defined in dpmodel ==="
rg -n 'def _load_compress_data' deepmd/dpmodel/descriptor/dpa1.py

echo ""
echo "=== Check parent class in dpmodel ==="
rg -n 'class.*DescrptDPA1' deepmd/dpmodel/descriptor/dpa1.py

Repository: deepmodeling/deepmd-kit

Length of output: 4931


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== dpmodel deserialize method (line 624 onwards) ==="
sed -n '624,670p' deepmd/dpmodel/descriptor/dpa1.py | cat -n

echo ""
echo "=== Check if deserialize calls _load_compress_data ==="
rg -n -B5 -A10 'deserialize.*:' deepmd/dpmodel/descriptor/dpa1.py | sed -n '1,100p'

echo ""
echo "=== Check parent BaseDescriptor for deserialize ==="
rg -n 'def deserialize' deepmd/dpmodel/descriptor/base_descriptor.py | head -5

Repository: deepmodeling/deepmd-kit

Length of output: 5556


🏁 Script executed:

#!/bin/bash
set -eulo pipefail

echo "=== Check what DescrptDPA1DP is in pt_expt context ==="
rg -n 'DescrptDPA1DP' deepmd/pt_expt/descriptor/dpa1.py | head -10

echo ""
echo "=== Check if pt_expt DescrptDPA1 inherits from dpmodel ==="
rg -n -B2 -A5 'class DescrptDPA1' deepmd/pt_expt/descriptor/dpa1.py | head -20

echo ""
echo "=== Check imports in pt_expt/dpa1.py ==="
head -50 deepmd/pt_expt/descriptor/dpa1.py | grep -E 'import|from'

echo ""
echo "=== Check data types in serialize/store methods pt_expt ==="
sed -n '97,150p' deepmd/pt_expt/descriptor/dpa1.py | cat -n

Repository: deepmodeling/deepmd-kit

Length of output: 3682


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Check serialize method in dpmodel to see what gets stored ==="
sed -n '590,622p' deepmd/dpmodel/descriptor/dpa1.py | cat -n

echo ""
echo "=== Check if there's any type conversion in pt_expt context ==="
rg -n 'torch.tensor|torch.as_tensor|to_tensor' deepmd/pt_expt/descriptor/dpa1.py

Repository: deepmodeling/deepmd-kit

Length of output: 1854


PT implementation must override _load_compress_data() to reconstruct torch tensors.

When a compressed pt_expt.DescrptDPA1 model is deserialized, it loads raw numpy arrays via the inherited dpmodel method. However, the call() path expects torch tensors and ParameterList objects:

  • self.type_embd_data[index] requires a tensor (not a numpy array) for indexing with torch.long
  • self.compress_data[0].contiguous() and self.compress_info[0].contiguous() require ParameterList objects that have the .contiguous() method

Add an override in deepmd/pt_expt/descriptor/dpa1.py that converts the deserialized numpy arrays back to torch tensors and torch.nn.ParameterList, mirroring the structure created by _store_type_embd_data() and _store_compress_data().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/dpmodel/descriptor/dpa1.py` around lines 661 - 669, Override
_load_compress_data in deepmd/pt_expt/descriptor/dpa1.py to mirror the PT save
logic: take the deserialized numpy arrays from the base _load_compress_data
(variables["type_embd_data"], variables["compress_data"],
variables["compress_info"]) and convert them into torch tensors and
ParameterList objects the runtime expects — e.g., replace list-of-ndarrays in
self.type_embd_data with list/tuple of torch.tensor(..., dtype=torch.long)
(matching how _store_type_embd_data creates them) and convert self.compress_data
and self.compress_info into torch.nn.ParameterList of
torch.nn.Parameter(tensor.contiguous()) (matching _store_compress_data);
preserve self.geo_compress and set self.compress = True. Ensure conversions use
the same device/dtype semantics used elsewhere in the class so indexing and
.contiguous() calls in call() work.

Comment on lines +588 to +601
def _load_compress_data(self, compress: dict) -> None:
"""Load compression state from serialized data.

Parameters
----------
compress : dict
Must contain "@variables" with "compress_data" (list of arrays,
one per embedding network) and "compress_info" (list of arrays
with table bounds).
"""
variables = compress["@variables"]
self.compress_data = variables["compress_data"]
self.compress_info = variables["compress_info"]
self.compress = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="deepmd/pt_expt/descriptor/se_e2_a.py"
echo "=== Inspecting ${file} ==="
rg -n -C3 'class DescrptSeA|def deserialize|def _load_compress_data|compress_data|compress_info' "$file"

Repository: deepmodeling/deepmd-kit

Length of output: 3074


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find where DescrptSeADP is defined
echo "=== Searching for DescrptSeADP class definition ==="
rg -n 'class DescrptSeADP' --type py

# Check parent class of DescrptSeA in PT version
echo -e "\n=== Full class definition in PT version ==="
sed -n '26,35p' deepmd/pt_expt/descriptor/se_e2_a.py

# Check if deserialize method exists in PT version
echo -e "\n=== Checking for deserialize in PT version ==="
rg -n 'def deserialize' deepmd/pt_expt/descriptor/se_e2_a.py || echo "No deserialize found in PT version"

# Check dpmodel version for _load_compress_data and its signature
echo -e "\n=== Checking dpmodel version for loader context ==="
sed -n '588,601p' deepmd/dpmodel/descriptor/se_e2_a.py

Repository: deepmodeling/deepmd-kit

Length of output: 118


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check the imports and full class declaration in PT version
echo "=== Top of deepmd/pt_expt/descriptor/se_e2_a.py ==="
head -50 deepmd/pt_expt/descriptor/se_e2_a.py

# Search for any DescrptSeADP or similar parent reference
echo -e "\n=== Searching for parent class patterns ==="
rg -n 'DescrptSeADP|DescrptSeA' deepmd/pt_expt/descriptor/se_e2_a.py | head -20

# Check if there's a deserialize method in dpmodel version
echo -e "\n=== Checking deserialize in dpmodel version ==="
rg -n 'def deserialize' deepmd/dpmodel/descriptor/se_e2_a.py || echo "No deserialize in dpmodel version"

# Check the class hierarchy in dpmodel
echo -e "\n=== Class definition in dpmodel version ==="
head -100 deepmd/dpmodel/descriptor/se_e2_a.py | tail -50

Repository: deepmodeling/deepmd-kit

Length of output: 3636


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check the deserialize method in dpmodel to confirm it calls _load_compress_data
echo "=== Checking dpmodel deserialize implementation ==="
sed -n '569,610p' deepmd/dpmodel/descriptor/se_e2_a.py

Repository: deepmodeling/deepmd-kit

Length of output: 1615


Fix loader to restore torch.nn.ParameterList structure on deserialization.

The PT version (deepmd/pt_expt/descriptor/se_e2_a.py) inherits deserialize() from its dpmodel parent, which calls _load_compress_data() to restore compressed state. However, _load_compress_data() in dpmodel assigns raw arrays directly, while the PT forward pass expects torch.nn.ParameterList objects and calls .contiguous() and .cpu() on them (lines 165–166, 186–187 in PT version). This breaks compressed model serialization/deserialization round-trips at runtime.

Override _load_compress_data() in the PT version to rebuild compress_data and compress_info as torch.nn.ParameterList containing tensors, mirroring the structure created by _store_compress_data().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/dpmodel/descriptor/se_e2_a.py` around lines 588 - 601, The dpmodel's
_load_compress_data currently assigns raw arrays to compress_data/compress_info,
but the PT subclass's deserialize and forward expect torch.nn.ParameterList of
tensors (used with .contiguous() and .cpu()); to fix, override
_load_compress_data in the PT class (the PT file
deepmd/pt_expt/descriptor/se_e2_a.py) to convert the deserialized
variables["@variables"]["compress_data"] and ["compress_info"] into torch
tensors and wrap them in torch.nn.ParameterList (matching the structure created
by _store_compress_data) so subsequent calls to deserialize()/forward() can call
.contiguous()/.cpu() safely. Ensure you preserve dtype/device semantics when
rebuilding the ParameterList.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants