Skip to content

Conversation

@cehongwang
Copy link
Collaborator

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes
  • I have added the relevant labels to my PR in so that relevant reviewers are notified

@meta-cla meta-cla bot added the cla signed label Jan 16, 2026
@github-actions github-actions bot added component: lowering Issues re: The lowering / preprocessing passes component: core Issues re: The core compiler component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Jan 16, 2026
@github-actions github-actions bot requested a review from zewenli98 January 16, 2026 21:28
@cehongwang cehongwang force-pushed the fused_rms_norm_lowering branch from 2bd6654 to 55010b6 Compare January 17, 2026 03:09
@github-actions github-actions bot added the component: tests Issues re: Tests label Jan 20, 2026

gm.graph.erase_node(node)

return x_normalized
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the replacement with the normalized, in line 80, should we also check for the second return value of the op? The rstd as we see in the test cases. Though yeah I think application wise, its mainly used for the gradient, but to be consistent with the op signature

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@zewenli98 zewenli98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good overall. But what I'm thinking is that, comparing with the converter-style implementation, the ops are almost same except impl.slice.expand and a few tensor casting. Do you know 1) which op/layer causes the perf discrepancy (60% as you mentioned in the last meeting)? 2) do these two approaches build the same size engines? If we can identify the issue, we probably can optimize other ops as well.

# If the getitem is extracting the first element (the output tensor)
if not x_normalized.meta:
x_normalized.meta = copy.copy(node.meta)
user.replace_all_uses_with(x_normalized)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Naren previously mentioned, can you add a log here when each node is changed?


gm.graph.erase_node(node)

return x_normalized
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cehongwang cehongwang linked an issue Jan 27, 2026 that may be closed by this pull request
@cehongwang cehongwang force-pushed the fused_rms_norm_lowering branch from 595aa5e to 6d99cd6 Compare January 27, 2026 20:17
@cehongwang cehongwang force-pushed the fused_rms_norm_lowering branch from 121d636 to a3ed926 Compare January 28, 2026 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed component: api [Python] Issues re: Python API component: core Issues re: The core compiler component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: lowering Issues re: The lowering / preprocessing passes component: tests Issues re: Tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

✨[Feature] Fused_Rms_Norm Lowering

5 participants