Skip to content

[NPU A3] Fix benchmark issues for fused_linear_jsd and dyt.#1231

Open
sunyi0505 wants to merge 1 commit into
linkedin:mainfrom
sunyi0505:main
Open

[NPU A3] Fix benchmark issues for fused_linear_jsd and dyt.#1231
sunyi0505 wants to merge 1 commit into
linkedin:mainfrom
sunyi0505:main

Conversation

@sunyi0505
Copy link
Copy Markdown
Contributor

@sunyi0505 sunyi0505 commented May 22, 2026

Summary

Fix benchmark issues for fused_linear_jsd and dyt.

1.dyt throws errors when using torch.compile on NPU. Add logic in benchmark to disable torch.compile baseline for NPU devices.
2.fused_linear_jsd encounters out-of-limit grid error exceeding 65536 on NPU. The issue arises from taking num_row as grid size. Replace it with min(num_cores, n_rows) to fix the problem.

Testing Done

dyt:
image

fused_linear_jsd:
image

Atlas 800T-A3 x86

Complete the following tasks before sending your PR, and replace [ ] with
[x] to indicate you have done them.
-->

  • Hardware Type:
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

@sunyi0505 sunyi0505 changed the title [NPU] Fix benchmark issues for fused_linear_jsd and dyt. [NPU A3] Fix benchmark issues for fused_linear_jsd and dyt. May 22, 2026
@sunyi0505
Copy link
Copy Markdown
Contributor Author

@Tcc0403 This PR is ready for review. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant