Skip to content

RMS Norm Optimization#583

Open
aris134 wants to merge 9 commits into
devfrom
amartin/rmsnorm
Open

RMS Norm Optimization#583
aris134 wants to merge 9 commits into
devfrom
amartin/rmsnorm

Conversation

@aris134
Copy link
Copy Markdown
Contributor

@aris134 aris134 commented May 12, 2026

Description

Fixes # (16527)

RMSNorm falls back to general kernel implementation on several DeepSeek and Qwen shapes, causing poor performance. These shapes have been registered with the tuned kernel cache, and a performance benchmark for RMSNorm has been added.

Additionally, a fallback warning is printed the first time at which a tuned config is not found for a requested kernel. For example:

in function getKernel: Falling back to general normalization kernel because no tuned kernel is available for this config. hidden_size=128, wtype=bf16, itype=bf16, otype=bf16, ctype=fp32

E2E TFLOPS/s/GPU for proxy models (Previous -> Current with RMSNorm tuning) :

Qwen:
bf16: 369.4 -> 374.7
fp8: 352.1 ->358.2

Deepseek:
bf16: 501.4 -> 529.4
fp8: 463.9 -> 511.4

Also added matching tuned configs for LayerNorm.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Change A
  • Change B

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@aris134 aris134 requested a review from alextmagro May 12, 2026 12:13
@aris134 aris134 self-assigned this May 12, 2026
@aris134 aris134 marked this pull request as ready for review May 12, 2026 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant