Skip to content

Fix mel spectrogram preprocessor allocating gigabytes of planned memory#18229

Merged
mergennachin merged 1 commit intomainfrom
fix-streaming-preprocessor-memory
Mar 17, 2026
Merged

Fix mel spectrogram preprocessor allocating gigabytes of planned memory#18229
mergennachin merged 1 commit intomainfrom
fix-streaming-preprocessor-memory

Conversation

@mergennachin
Copy link
Contributor

@mergennachin mergennachin commented Mar 17, 2026

The dynamic dimension max was computed as max_audio_len * n_samples
(samples per 30s chunk), not max_audio_len * sampling_rate. With
max_audio_len=300, this produced 144M samples (150 minutes) instead of
4.8M (5 minutes), causing a ~3.3 GB planned buffer for STFT
intermediates.

For streaming mode, the max was even worse: 600 * 480K = 288M samples,
producing a 6.6 GB planned buffer — even though streaming processes
~1640 samples per step.

Fix both paths:

  • Offline: use max_audio_len * sampling_rate (300s → 4.8M samples, ~110 MB)
  • Streaming: cap at 2 seconds (32K samples, ~0.7 MB)

Peak RSS for voxtral runner: (before) 9,556 MB, after (4,712 MB)

Copilot AI review requested due to automatic review settings March 17, 2026 13:09
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 17, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18229

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 9 Pending

As of commit d6c31f6 with merge base 1e17e28 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 17, 2026
@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes excessive planned-memory allocation during torch.export of the Whisper mel-spectrogram preprocessor by correcting the bound used for the waveform’s dynamic length dimension, with a tighter cap for streaming mode to keep STFT intermediate buffers small.

Changes:

  • Fix offline export dynamic max to max_audio_len * sampling_rate (seconds → samples) instead of mistakenly multiplying by n_samples.
  • Add a streaming-specific dynamic max cap at 2 * sampling_rate to prevent multi-GB memory plans.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

The dynamic dimension max was computed as max_audio_len * n_samples
(samples per 30s chunk), not max_audio_len * sampling_rate. With
max_audio_len=300, this produced 144M samples (150 minutes) instead of
4.8M (5 minutes), causing a ~3.3 GB planned buffer for STFT
intermediates.

For streaming mode, the max was even worse: 600 * 480K = 288M samples,
producing a 6.6 GB planned buffer — even though streaming processes
~1640 samples per step.

Fix both paths:
- Offline: use max_audio_len * sampling_rate (300s → 4.8M samples, ~110 MB)
- Streaming: cap at 2 seconds (32K samples, ~0.7 MB)
if model.streaming:
# Streaming processes small windows per step. 2 seconds gives
# comfortable headroom while keeping the memory plan tight.
max_samples = 2 * model.sampling_rate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any performance issues with this? In the streaming mode, each inference takes 2s worth of samples and start over again for next two seconds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No performance issue. The max_samples = 2 * sampling_rate is only the dynamic shape upper bound at export time. It tells the memory planner the maximum buffer size to allocate. It doesn't affect how inference runs.

At runtime, the streaming preprocessor is called with ~1,640 samples per step (~0.1s). The exported graph handles any input size from 1 up to the declared max.

The 2-second cap just means if someone somehow passed more than 32,000 samples in a single call, it would fail. In practice the streaming window is fixed at 1,640 samples.

@mergennachin mergennachin merged commit 776979f into main Mar 17, 2026
233 checks passed
@mergennachin mergennachin deleted the fix-streaming-preprocessor-memory branch March 17, 2026 15:56
@mergennachin
Copy link
Contributor Author

@pytorchbot cherry-pick --onto release/1.2 -c critical

pytorchbot pushed a commit that referenced this pull request Mar 17, 2026
…ry (#18229)

The dynamic dimension max was computed as max_audio_len * n_samples
(samples per 30s chunk), not max_audio_len * sampling_rate. With
max_audio_len=300, this produced 144M samples (150 minutes) instead of
4.8M (5 minutes), causing a ~3.3 GB planned buffer for STFT
intermediates.

For streaming mode, the max was even worse: 600 * 480K = 288M samples,
producing a 6.6 GB planned buffer — even though streaming processes
~1640 samples per step.

Fix both paths:
- Offline: use max_audio_len * sampling_rate (300s → 4.8M samples, ~110
MB)
- Streaming: cap at 2 seconds (32K samples, ~0.7 MB)

 Peak RSS for voxtral runner: (before) 9,556 MB, after (4,712 MB)

(cherry picked from commit 776979f)
@pytorchbot
Copy link
Collaborator

Cherry picking #18229

The cherry pick PR is at #18238 and it is recommended to link a critical cherry pick PR with an issue.

Details for Dev Infra team Raised by workflow job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants