Refactor: simplify dsv4 ratio4 decode compressor by zhangqi-chen · Pull Request #427 · hw-native-sys/pypto-lib

zhangqi-chen · 2026-05-30T09:36:07Z

Summary

Drop the hadamard rotate path (dead since feat(dsv4): add prefill attention compressor for ratio-4 (#424) #424 pinned ROTATE=False): remove the rotate/hadamard params, the kv_hadamard/kv_write branch, the kv_final intermediate, and the matching golden/spec plumbing; kv_and_cache_write now reads normed_kv directly.
rmsnorm_rope: normalize only the NOPE span in the loop and re-derive the RoPE input slice locally from pooled_kv (bit-identical), instead of reading normed_kv back — drops a same-tensor GM round-trip.
Add a --golden-data CLI option for fixed-input validation runs.

rmsnorm_rope stays on pl.range + pl.at (verified 10/10 on a2a3); pl.spmd / pl.parallel are non-deterministic here.

Related Issues

Refs #419

coderabbitai · 2026-05-30T09:36:18Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 79ef5f35-d2a4-471e-8626-97784d85eb62

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR removes the optional Hadamard/rotate post-processing transformation from the DeepSeek v4 decode compressor kernel. The compressor now writes normalized KV directly to cache without intermediate finalization, eliminating the conditional rotate path and simplifying the normalization sequence. Function signatures, golden reference, and test infrastructure are updated accordingly.

Changes

Hadamard/Rotate Post-Processing Removal

Layer / File(s)	Summary
Function signature and constant cleanup `models/deepseek/v4/decode_compressor_ratio4.py`	Removed `ROTATE` module constant and `hadamard`/`rotate` parameters from `compressor()` and `compressor_test()` signatures, keeping only cache/table inputs. Updated `compressor_test` call to `compressor` to drop removed arguments.
Compressor kernel logic refactoring `models/deepseek/v4/decode_compressor_ratio4.py`	Removed conditional Hadamard/rotate finalization by eliminating `kv_final` buffer and rotate branch. Reorganized RMSNorm/RoPE normalization by splitting NOPE and RoPE ranges separately. During cache write, now directly derives `kv_row_fp32` from `normed_kv` and computes `cache_col` from `start_pos_b // COMPRESS_RATIO`.
Golden reference implementation update `models/deepseek/v4/decode_compressor_ratio4.py`	Updated `golden_compressor` to extract `cmp_kv_cache`, `cmp_block_table`, and `start_pos_t`. Changed RoPE dimension derivation from Hadamard shape to directly using `COMPRESS_RATIO` and `ROPE_HEAD_DIM`. Removed conditional rotate-controlled Hadamard transform.
Test harness and CLI updates `models/deepseek/v4/decode_compressor_ratio4.py`	Removed `ScalarSpec` import and `init_hadamard()` helper. Updated `build_tensor_specs` to remove `hadamard` and `rotate` specs while retaining cache-related `TensorSpec` entries. Added `--golden-data` CLI argument forwarded to `run_jit`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

hw-native-sys/pypto-lib#424: Sequentially modified rotate/Hadamard handling in the same file—first changing the ROTATE default, then this PR removes the entire path and updates signatures.
hw-native-sys/pypto-lib#402: Refactored the compressor cache contract around the new cmp_kv_cache/block-table pipeline; this PR continues by removing the rotate/Hadamard stage from that pipeline.

Poem

A rabbit hops through Hadamard gates no more,
Rotate and spin have left the compressor floor,
Now KV flows direct to cache so clean,
The simplest path that ever could be seen! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main objective of the PR: simplifying the dsv4 ratio4 decode compressor by removing unnecessary code paths.
Description check	✅ Passed	The description is directly related to the changeset, providing detailed context about removing the hadamard/rotate path, modifying rmsnorm_rope, and adding CLI options that align with the file changes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request simplifies the ratio-4 overlapping compressor model by removing unused rotation logic, including the rotate parameter, the hadamard tensor, and the associated Hadamard transform block. It also refactors the normalization loop to process only the non-RoPE (NOPE) span, while separately re-deriving the RoPE input slice. Additionally, a --golden-data command-line argument has been added to the test runner. No review comments were provided, so there is no feedback to address.

- Drop the hadamard rotate path (dead since hw-native-sys#424 pinned ROTATE=False): remove the rotate/hadamard params, the kv_hadamard/kv_write branch, the kv_final intermediate, and the matching golden/spec plumbing. kv_and_cache_write now reads normed_kv directly. - rmsnorm_rope: normalize only the NOPE span in the loop and re-derive the RoPE input slice locally from pooled_kv (bit-identical) instead of reading normed_kv back, dropping a same-tensor GM round-trip. - Add a --golden-data CLI option for fixed-input validation runs. - Adapt the decode_attention_csa caller to the new compressor signature (it called the ratio4 compressor with rotate=False); verified PASS. rmsnorm_rope stays on pl.range + pl.at (verified 10/10 on a2a3); pl.spmd / pl.parallel are non-deterministic here, see hw-native-sys#419. Refs hw-native-sys#419

Mirror the ratio4 refactor in decode_compressor_ratio128's rmsnorm_rope spmd body: - Narrow the normalization pipeline to the NOPE span (NOPE_HEAD_DIM // HEAD_TILE) instead of the full HEAD_DIM; the rope span is written once below by the RoPE'd rope_buf. - Re-derive the RoPE input slice locally from pooled_kv with the same inv_rms/gamma instead of reading normed_kv back, removing a same-tensor GM write-then-read round-trip that hits the issue hw-native-sys#419 runtime race. Verified by 5 serial a2a3 NPU runs: kv / compress_state / cmp_kv_cache all PASS.

gemini-code-assist Bot reviewed May 30, 2026

View reviewed changes

zhangqi-chen force-pushed the debug-cmp branch from adb2c37 to 0e26930 Compare May 30, 2026 09:44

zhangqi-chen merged commit 09add41 into hw-native-sys:main Jun 1, 2026
4 of 7 checks passed

zhangqi-chen deleted the debug-cmp branch June 1, 2026 02:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: simplify dsv4 ratio4 decode compressor#427

Refactor: simplify dsv4 ratio4 decode compressor#427
zhangqi-chen merged 2 commits into
hw-native-sys:mainfrom
zhangqi-chen:debug-cmp

zhangqi-chen commented May 30, 2026

Uh oh!

coderabbitai Bot commented May 30, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zhangqi-chen commented May 30, 2026

Summary

Related Issues

Uh oh!

coderabbitai Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 30, 2026 •

edited

Loading