Skip to content

Use correct values to update encoder KV Cache for streaming models#15323

Open
MahmoudAshraf97 wants to merge 3 commits intoNVIDIA-NeMo:mainfrom
MahmoudAshraf97:fix_cache
Open

Use correct values to update encoder KV Cache for streaming models#15323
MahmoudAshraf97 wants to merge 3 commits intoNVIDIA-NeMo:mainfrom
MahmoudAshraf97:fix_cache

Conversation

@MahmoudAshraf97
Copy link
Contributor

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

The cache update includes invalid parts of the input that are discarded in streaming_post_process as invalid, but are included in the cache updates, this PR uses the correct values to update the cache, this PR resulted in much lower WER on our internal dataset

Collection: ASR

PR Type:

  • New Feature
  • Bugfix
  • Documentation

Who can review?

@nithinraok

@github-actions github-actions bot added the ASR label Jan 27, 2026
Signed-off-by: MahmoudAshraf97 <hassouna97.ma@gmail.com>
@MahmoudAshraf97
Copy link
Contributor Author

linting failure is unrelated to this PR

@chtruong814 chtruong814 added the needs-follow-up Issue needs follow-up label Jan 30, 2026
Signed-off-by: KunalDhawan <KunalDhawan@users.noreply.github.com>
@KunalDhawan
Copy link
Collaborator

Thanks for opening this PR, @MahmoudAshraf97, great catch! The changes look good to me. I’ve scheduled CI tests to make sure the updates don’t break any existing pipelines, and I’m also running internal WER evaluations to assess the impact on accuracy and performance.

It would be great if you could also share any benchmarks you have on WER and latency before vs. after these changes to help validate the improvements.

@chtruong814 chtruong814 removed the needs-follow-up Issue needs follow-up label Feb 7, 2026
@MahmoudAshraf97
Copy link
Contributor Author

Hi @KunalDhawan , the impact of this PR is felt the most with ctc models or rnnt models with long files, where the wrong cache effect starts to accumulate, the symptoms are increase in deletion errors and missing chunks from the transcript

I also suggest adding tests to verify that the encoder output with cache is identical to the encoder output with the actual audio passed as a context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants