Add RoPE to ConformerEncoder by MahmoudAshraf97 · Pull Request #15714 · NVIDIA-NeMo/NeMo

MahmoudAshraf97 · 2026-05-20T16:22:09Z

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Adds RoPE as a positional encoding method to ConformerEncoder, it enables the use of torch SDPA while maintaining accuracy
This is a least-friction implementation, there is still path for more improvements (such as flex attention and fused qkv proj) but I preferred to not diverge from other conformer submodules
Collection: asr

Motivation

Since #9590, SDPA caused loss explosion in training when Relative Positional Embeddings were used, investigation showed that SDPA had issues with gradients when attention additive biases were used

as per this paper, RoPe has the same baseline or better performance than RelPos and trains faster because of its ability to use SDPA

Results

I tested two Fastconformer medium models, everything was constant except the positional encoding method (RoPE vs RelPos) and attention implementation (SDPA vs NeMo), both models were randomly initialized, although I forgot to fix the seed.

One takeaway is that when the two models were initialized from a pretrained checkpoint except for the positional encoding parameters, RelPos converged faster, implying that RoPE attention geometry is different than RelPos and not necessarily transferable

Improvements

The Rope was around 6% faster E2E, and used less memory which allowed for higher fused batch size for the RNNT loss and better compute utilization. The training time was dominated by RNNT loss calculation, so it diminishes any benefits from SDPA or torch.compile

RNNT loss compute time is data-dependent, that's why the step time is much lower at the beginning of the run when the model hasn't learned anything yet, once it starts to generate tokens, the time goes up

Changelog

Add RoPE as Positional Encoding Method for ConformerEncoder

Usage

pass self_attention_model: rope to encoder config

model:
  encoder:
    self_attention_model: rope

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

cc @pzelasko

Additional Information

Related to # (issue)

copy-pr-bot · 2026-05-20T16:22:14Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

pzelasko · 2026-05-29T19:15:00Z

/ok to test 44348a9

pzelasko

Thanks for the PR. The overall direction looks reasonable to me, and I'm leaving this as a non-blocking review comment. I don't see the global defaults changing for existing configs (self_attention_model remains rel_pos, use_pytorch_sdpa remains False), but I found a few compatibility issues that should be addressed before merge.

RoPE skips the existing pre-encoder scale/dropout contract

nemo/collections/asr/modules/conformer_encoder.py, around the new RoPE branch in forward_internal.

Existing rel_pos/abs_pos paths call self.pos_enc(x=audio_signal, ...), which applies xscale and dropout_pre_encoder. The new RoPE path sets pos_emb = None and leaves audio_signal untouched, while RotaryPositionalEncoding only rotates Q/K later in attention. As a result, xscaling=True and dropout_pre_encoder silently become no-ops for self_attention_model='rope'.

I would keep the Q/K rotation inside RoPEMultiHeadAttention, but preserve the existing encoder input transform semantics by applying the same input scale/dropout once before the layer stack.
The public change_attention_model() wrapper can corrupt the saved config

nemo/collections/asr/parts/mixins/mixins.py, in the update_config block.

The encoder resolves None values to the current attention model/context, but the mixin writes the raw arguments back into self.cfg.encoder. For example, calling change_attention_model(rotary_fraction=0.5) on an existing RoPE model can leave cfg.encoder.self_attention_model = None and cfg.encoder.att_context_size = None, and it skips copying the updated RoPE params because self_attention_model != 'rope'.

The top-level config should mirror the resolved values from self.encoder after the call, e.g. self.encoder.self_attention_model, self.encoder.att_context_size, and the resolved encoder RoPE fields.
change_attention_model('rope') does not preserve use_bias=False

nemo/collections/asr/modules/conformer_encoder.py, in the RoPE new_attn = RoPEMultiHeadAttention(...) conversion path.

The constructor path forwards use_bias into ConformerLayer/RoPEMultiHeadAttention, but the conversion helper creates RoPE attention without use_bias, so it defaults to True. A bias-free model converted to RoPE silently gains randomly initialized bias parameters and no longer matches its config.

Please pass through getattr(self._cfg, 'use_bias', True) or an equivalent stored value, and add a regression test for converting a use_bias=False encoder to RoPE.

nithinraok · 2026-05-29T19:27:52Z

/claude review

claude

Good addition — the implementation is clean, the RoPE math is correct (split-half rotation with duplicated cos/sin), and the test suite is thorough (translation invariance, norm preservation, reference equivalence, streaming parity).

One issue flagged inline:

dropout_pre_encoder silently ignored for RoPE: For rel_pos and abs_pos, self.pos_enc(x=audio_signal, ...) applies dropout_pre_encoder to the audio signal. For the rope path, this call is skipped entirely, so any configured dropout_pre_encoder > 0 has no effect. This could surprise users switching between attention models or using change_attention_model.

Co-authored-by: Copilot <copilot@github.com> Signed-off-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>

MahmoudAshraf97 · 2026-05-30T13:04:37Z

@pzelasko I implemented fixes for all three issues mentioned

pre-encoder dropout is now applied separately along with xscaling
updated the wrapper to use the current model parameters after updating them instead of using the function args
updated all attention modules because this bug existed in all of them, not just RoPE

as for changing the defaults, I can do that, do I have to also change ConformerEncoder init args or just the configs? although I'd prefer to do that in a followup PR

note, the time variable was removed because it causes the CI tests to fail, but it's unrelated to the PR

github-actions Bot added ASR community-request labels May 20, 2026

svcnvidia-nemo-ci added the waiting-on-maintainers Waiting on maintainers to respond label May 22, 2026

pzelasko requested a review from nithinraok May 29, 2026 19:13

copy-pr-bot Bot temporarily deployed to public May 29, 2026 19:15 Inactive

copy-pr-bot Bot temporarily deployed to public May 29, 2026 19:19 Inactive

copy-pr-bot Bot temporarily deployed to public May 29, 2026 19:20 Inactive

copy-pr-bot Bot temporarily deployed to public May 29, 2026 19:23 Inactive

pzelasko reviewed May 29, 2026

View reviewed changes

claude Bot reviewed May 29, 2026

View reviewed changes

Comment thread nemo/collections/asr/modules/conformer_encoder.py

claude Bot reviewed May 29, 2026

View reviewed changes

svcnvidia-nemo-ci added waiting-on-customer Waiting on the original author to respond and removed waiting-on-maintainers Waiting on maintainers to respond labels May 29, 2026

MahmoudAshraf97 and others added 2 commits May 30, 2026 08:56

initial commit

97e7c4f

Co-authored-by: Copilot <copilot@github.com> Signed-off-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>

implement review suggestions

ba98801

Co-authored-by: Copilot <copilot@github.com> Signed-off-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>

MahmoudAshraf97 force-pushed the conformer-rope branch from f8edc20 to ba98801 Compare May 30, 2026 12:56

MahmoudAshraf97 requested a review from pzelasko May 30, 2026 12:57

svcnvidia-nemo-ci removed the waiting-on-customer Waiting on the original author to respond label May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RoPE to ConformerEncoder#15714

Add RoPE to ConformerEncoder#15714
MahmoudAshraf97 wants to merge 2 commits into
NVIDIA-NeMo:mainfrom
MahmoudAshraf97:conformer-rope

MahmoudAshraf97 commented May 20, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 20, 2026

Uh oh!

pzelasko commented May 29, 2026

Uh oh!

pzelasko left a comment

Uh oh!

nithinraok commented May 29, 2026

Uh oh!

Uh oh!

claude Bot left a comment

Uh oh!

MahmoudAshraf97 commented May 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

MahmoudAshraf97 commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Motivation

Results

Improvements

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

copy-pr-bot Bot commented May 20, 2026

Uh oh!

pzelasko commented May 29, 2026

Uh oh!

pzelasko left a comment

Choose a reason for hiding this comment

Uh oh!

nithinraok commented May 29, 2026

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

MahmoudAshraf97 commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MahmoudAshraf97 commented May 20, 2026 •

edited

Loading

MahmoudAshraf97 commented May 30, 2026 •

edited

Loading