Parallel chunking feature for RNNT and TDT models by nune-tadevosyan · Pull Request #15186 · NVIDIA-NeMo/NeMo

nune-tadevosyan · 2025-12-14T07:40:58Z

What does this PR do ?

Adds support for parallel chunking for all types of ASR models

Collection: [Note which collection this PR will affect]

Changelog

Added token_ids in the timestamps given by RNNT and TDT models.
Provide token_sequence in CTC models to have matching tokens between timestamps and text tokens
Provided chunking functionality for LhotseSpeechToTextBpeDataset
Change in TranscriptionMixin to have general support for chunking.
Tensor / numpy array support for chunking.

Usage

You can potentially add a usage example below

import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained("nvidia/parakeet-tdt-0.6b-v2")
transcript = asr_model.transcribe(["path/to/audio_file.wav"], enable_chunking=True)[0].text

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: Nune <ntadevosyan@nvidia.com>

nithinraok · 2026-03-20T13:57:06Z

/claude review

claude · 2026-03-20T13:59:24Z

+                    lang_id = 'en' if isinstance(self.tokenizer, tokenizers.AggregateTokenizer) else None
+                else:
+                    source_id = f'audio_{uuid.uuid4().int}'
+                    chunk_start = 0
+                    lang_id = 'en' if isinstance(self.tokenizer, tokenizers.AggregateTokenizer) else None


Bug: lang_id is hardcoded to 'en' for AggregateTokenizer in both the pre-chunked tensor path and the fallback path. If the user is transcribing non-English audio (e.g. via source_lang='de'), this will cause the merge logic in merge_chunked_hypotheses to call tokenizer.text_to_ids(text, lang_id='en'), producing incorrect token IDs and potentially garbled merge results.

Consider propagating the actual language from the prompt/config (e.g. trcfg.prompt.get('source_lang', 'en')) instead of hardcoding 'en'.

nithinraok

There are two ways here:
Using filepath and using tensor as input. Pls make sure to robustly test for both flows.

Some test cases need to be removed based on removed functions from chunking_utils.py

nithinraok · 2026-03-20T14:35:43Z

+    return best_chunk_size
+
+
+def chunk_waveform(


with lhotse cut windows, I think we are now not using this?

chunk_waveform is used in transcription.py when audio is provided as tensor.

nithinraok · 2026-03-20T14:36:34Z

+    return chunks, chunk_lens, chunk_starts
+
+
+def chunk_audio_sample(


same, are we using this function now?

nithinraok · 2026-03-20T14:39:21Z

    return char_timestamps


+def merge_flat_chunk_hypotheses(


same, are we using this?

Used in transcription.py

nithinraok · 2026-03-20T14:59:32Z

+        merged_hypotheses: Target hypothesis to update with merged timestamps
+        hypotheses: List of hypotheses from different chunks
+        chunk_offsets: Frame offsets for each chunk
+        subsampling_factor: Subsampling factor of the encoder
+        window_stride: Time stride per frame in seconds
+        tokenizer: Tokenizer for text operations
+        merged_tokens: Token sequence after LCS merge
+        timestamps_type: Types of timestamps to include ('word', 'segment', 'all')
+        lang_id: Language ID for multilingual models
+        similarity_threshold: Threshold for word similarity matching (0.0-1.0)


docstring documents params tokenizer, merged_tokens, lang_id that don't exist in the actual signature. The real params are merged_text, timestamps_type, similarity_threshold.

nithinraok · 2026-03-20T15:00:05Z

        merged_tokens = lcs_alignment_merge_buffer(
            buffer=merged_tokens,
-            data=data[: int(delay * 0.6)],  # only approximately 60% of the tokens are non blank
+            data=data[: int(delay * 0.6)],  # only approximately 60% of the frames have corresponding tokens


why this 60%?

Delay is number of frames here for overlapping part, we want to check the tokens that were in the overlapping segment. Approximately 60% of the frames output tokens.

nithinraok · 2026-03-20T15:01:20Z

+                        chunk_word_idx += 1
+                        continue
+                break
+        else:


Does this silently produce misleading timestamps?

This can happen in rare cases and for few words.

nithinraok · 2026-03-20T15:02:46Z

+    return hypotheses
+
+
+def join_alignments(


In this alignments are we considering overlap frames or simply concatenated?

Simply concatenating

Signed-off-by: Nune <ntadevosyan@nvidia.com>

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> Signed-off-by: nune-tadevosyan <152167970+nune-tadevosyan@users.noreply.github.com>

github-actions Bot added the ASR label Dec 14, 2025

nune-tadevosyan force-pushed the parakeet_chunking branch from 8deb746 to 2546e30 Compare December 14, 2025 07:42

github-advanced-security AI found potential problems Dec 14, 2025

View reviewed changes

Comment thread nemo/collections/asr/data/audio_to_text_lhotse_prompted.py Fixed

Comment thread nemo/collections/asr/models/rnnt_models.py Fixed

nune-tadevosyan force-pushed the parakeet_chunking branch 8 times, most recently from 5ac22ae to 7cfd616 Compare December 14, 2025 08:17

nune-tadevosyan added the Run CICD label Dec 14, 2025

nune-tadevosyan force-pushed the parakeet_chunking branch from fd7ae96 to 055c489 Compare December 14, 2025 08:32

chtruong814 added Run CICD and removed Run CICD labels Dec 14, 2025

nune-tadevosyan force-pushed the parakeet_chunking branch from de428f2 to f8baebb Compare December 14, 2025 08:36

chtruong814 added Run CICD and removed Run CICD labels Dec 14, 2025

nune-tadevosyan force-pushed the parakeet_chunking branch from 65aa263 to c621e35 Compare December 14, 2025 09:43

chtruong814 added Run CICD and removed Run CICD labels Dec 14, 2025

chtruong814 temporarily deployed to test December 14, 2025 09:46 — with GitHub Actions Inactive

nune-tadevosyan force-pushed the parakeet_chunking branch from 3402ef5 to 33d18df Compare December 14, 2025 19:22

chtruong814 removed the Run CICD label Dec 14, 2025

nune-tadevosyan added 22 commits March 13, 2026 17:57

Updates

36527e4

Signed-off-by: Nune <ntadevosyan@nvidia.com>

Clean up

ed07b66

Signed-off-by: Nune <ntadevosyan@nvidia.com>

Updates

d8cd5f4

Signed-off-by: Nune <ntadevosyan@nvidia.com>

Chunking set to false

a59a33d

Signed-off-by: Nune <ntadevosyan@nvidia.com>

Reverse code-formatting

a221cb5

Signed-off-by: Nune <ntadevosyan@nvidia.com>

Updates

530d3c8

Signed-off-by: Nune <ntadevosyan@nvidia.com>

Reverse SALM FIX

9f4de59

Signed-off-by: Nune <ntadevosyan@nvidia.com>

Fixes

3c38f36

Signed-off-by: Nune <ntadevosyan@nvidia.com>

Test fix

8eff81d

Signed-off-by: Nune <ntadevosyan@nvidia.com>

Update transcribe_speech

c71f77a

Signed-off-by: Nune <ntadevosyan@nvidia.com>

lhotsee changes

be73d36

Signed-off-by: Nune <ntadevosyan@nvidia.com>

working version

cb85e1c

Signed-off-by: Nune <ntadevosyan@nvidia.com>

Works with batches

8f479fd

Signed-off-by: Nune <ntadevosyan@nvidia.com>

fixes

623caa5

Signed-off-by: Nune <ntadevosyan@nvidia.com>

All changes that allows for bs>1

80c283f

Signed-off-by: Nune <ntadevosyan@nvidia.com>

Clean up

b2a4523

Signed-off-by: Nune <ntadevosyan@nvidia.com>

Clean up

831ed6f

Signed-off-by: Nune <ntadevosyan@nvidia.com>

Clean up

df89305

Signed-off-by: Nune <ntadevosyan@nvidia.com>

Clean up

9ef244c

Signed-off-by: Nune <ntadevosyan@nvidia.com>

Clean up

1b08ccc

Signed-off-by: Nune <ntadevosyan@nvidia.com>

Clean up

29cde56

Signed-off-by: Nune <ntadevosyan@nvidia.com>

fixes

eb25e5d

Signed-off-by: Nune <ntadevosyan@nvidia.com>

nune-tadevosyan mentioned this pull request Mar 17, 2026

Enabling timestamps ignores non-verbose mode #15472

Closed

claude Bot reviewed Mar 20, 2026

View reviewed changes

Comment thread nemo/collections/asr/parts/submodules/ctc_decoding.py Outdated

claude Bot reviewed Mar 20, 2026

View reviewed changes

nithinraok requested changes Mar 20, 2026

View reviewed changes

nune-tadevosyan and others added 3 commits March 23, 2026 10:50

Clena up

9594409

Signed-off-by: Nune <ntadevosyan@nvidia.com>

Clean up

1f30b4e

Signed-off-by: Nune <ntadevosyan@nvidia.com>

Apply suggestions from code review

5335d12

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> Signed-off-by: nune-tadevosyan <152167970+nune-tadevosyan@users.noreply.github.com>

		return chunks, chunk_lens, chunk_starts


		def chunk_audio_sample(

Conversation

nune-tadevosyan commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

Uh oh!

Uh oh!

nithinraok commented Mar 20, 2026

Uh oh!

Uh oh!

claude Bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

nithinraok left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

nune-tadevosyan commented Dec 14, 2025 •

edited

Loading