Skip to content

Refactor: simplify dsv4 ratio4 decode compressor#427

Merged
zhangqi-chen merged 2 commits into
hw-native-sys:mainfrom
zhangqi-chen:debug-cmp
Jun 1, 2026
Merged

Refactor: simplify dsv4 ratio4 decode compressor#427
zhangqi-chen merged 2 commits into
hw-native-sys:mainfrom
zhangqi-chen:debug-cmp

Conversation

@zhangqi-chen
Copy link
Copy Markdown
Collaborator

Summary

  • Drop the hadamard rotate path (dead since feat(dsv4): add prefill attention compressor for ratio-4 (#424) #424 pinned ROTATE=False): remove the rotate/hadamard params, the kv_hadamard/kv_write branch, the kv_final intermediate, and the matching golden/spec plumbing; kv_and_cache_write now reads normed_kv directly.
  • rmsnorm_rope: normalize only the NOPE span in the loop and re-derive the RoPE input slice locally from pooled_kv (bit-identical), instead of reading normed_kv back — drops a same-tensor GM round-trip.
  • Add a --golden-data CLI option for fixed-input validation runs.

rmsnorm_rope stays on pl.range + pl.at (verified 10/10 on a2a3); pl.spmd / pl.parallel are non-deterministic here.

Related Issues

Refs #419

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 30, 2026

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 79ef5f35-d2a4-471e-8626-97784d85eb62

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR removes the optional Hadamard/rotate post-processing transformation from the DeepSeek v4 decode compressor kernel. The compressor now writes normalized KV directly to cache without intermediate finalization, eliminating the conditional rotate path and simplifying the normalization sequence. Function signatures, golden reference, and test infrastructure are updated accordingly.

Changes

Hadamard/Rotate Post-Processing Removal

Layer / File(s) Summary
Function signature and constant cleanup
models/deepseek/v4/decode_compressor_ratio4.py
Removed ROTATE module constant and hadamard/rotate parameters from compressor() and compressor_test() signatures, keeping only cache/table inputs. Updated compressor_test call to compressor to drop removed arguments.
Compressor kernel logic refactoring
models/deepseek/v4/decode_compressor_ratio4.py
Removed conditional Hadamard/rotate finalization by eliminating kv_final buffer and rotate branch. Reorganized RMSNorm/RoPE normalization by splitting NOPE and RoPE ranges separately. During cache write, now directly derives kv_row_fp32 from normed_kv and computes cache_col from start_pos_b // COMPRESS_RATIO.
Golden reference implementation update
models/deepseek/v4/decode_compressor_ratio4.py
Updated golden_compressor to extract cmp_kv_cache, cmp_block_table, and start_pos_t. Changed RoPE dimension derivation from Hadamard shape to directly using COMPRESS_RATIO and ROPE_HEAD_DIM. Removed conditional rotate-controlled Hadamard transform.
Test harness and CLI updates
models/deepseek/v4/decode_compressor_ratio4.py
Removed ScalarSpec import and init_hadamard() helper. Updated build_tensor_specs to remove hadamard and rotate specs while retaining cache-related TensorSpec entries. Added --golden-data CLI argument forwarded to run_jit.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • hw-native-sys/pypto-lib#424: Sequentially modified rotate/Hadamard handling in the same file—first changing the ROTATE default, then this PR removes the entire path and updates signatures.
  • hw-native-sys/pypto-lib#402: Refactored the compressor cache contract around the new cmp_kv_cache/block-table pipeline; this PR continues by removing the rotate/Hadamard stage from that pipeline.

Poem

A rabbit hops through Hadamard gates no more,
Rotate and spin have left the compressor floor,
Now KV flows direct to cache so clean,
The simplest path that ever could be seen! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main objective of the PR: simplifying the dsv4 ratio4 decode compressor by removing unnecessary code paths.
Description check ✅ Passed The description is directly related to the changeset, providing detailed context about removing the hadamard/rotate path, modifying rmsnorm_rope, and adding CLI options that align with the file changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request simplifies the ratio-4 overlapping compressor model by removing unused rotation logic, including the rotate parameter, the hadamard tensor, and the associated Hadamard transform block. It also refactors the normalization loop to process only the non-RoPE (NOPE) span, while separately re-deriving the RoPE input slice. Additionally, a --golden-data command-line argument has been added to the test runner. No review comments were provided, so there is no feedback to address.

- Drop the hadamard rotate path (dead since hw-native-sys#424 pinned ROTATE=False):
  remove the rotate/hadamard params, the kv_hadamard/kv_write branch,
  the kv_final intermediate, and the matching golden/spec plumbing.
  kv_and_cache_write now reads normed_kv directly.
- rmsnorm_rope: normalize only the NOPE span in the loop and re-derive
  the RoPE input slice locally from pooled_kv (bit-identical) instead of
  reading normed_kv back, dropping a same-tensor GM round-trip.
- Add a --golden-data CLI option for fixed-input validation runs.
- Adapt the decode_attention_csa caller to the new compressor signature
  (it called the ratio4 compressor with rotate=False); verified PASS.

rmsnorm_rope stays on pl.range + pl.at (verified 10/10 on a2a3);
pl.spmd / pl.parallel are non-deterministic here, see hw-native-sys#419.

Refs hw-native-sys#419
Mirror the ratio4 refactor in decode_compressor_ratio128's rmsnorm_rope
spmd body:

- Narrow the normalization pipeline to the NOPE span
  (NOPE_HEAD_DIM // HEAD_TILE) instead of the full HEAD_DIM; the rope
  span is written once below by the RoPE'd rope_buf.
- Re-derive the RoPE input slice locally from pooled_kv with the same
  inv_rms/gamma instead of reading normed_kv back, removing a
  same-tensor GM write-then-read round-trip that hits the issue hw-native-sys#419
  runtime race.

Verified by 5 serial a2a3 NPU runs: kv / compress_state / cmp_kv_cache
all PASS.
@zhangqi-chen zhangqi-chen merged commit 09add41 into hw-native-sys:main Jun 1, 2026
4 of 7 checks passed
@zhangqi-chen zhangqi-chen deleted the debug-cmp branch June 1, 2026 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant