Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions extension/audio/mel_spectrogram.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,10 +192,15 @@ def export_processor(model=None, output_file="whisper_preprocess.pte"):
if model is None:
model = WhisperAudioProcessor()

audio_tensor = torch.randn(93680)
if model.streaming:
# Streaming processes small windows per step. 2 seconds gives
# comfortable headroom while keeping the memory plan tight.
max_samples = 2 * model.sampling_rate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any performance issues with this? In the streaming mode, each inference takes 2s worth of samples and start over again for next two seconds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No performance issue. The max_samples = 2 * sampling_rate is only the dynamic shape upper bound at export time. It tells the memory planner the maximum buffer size to allocate. It doesn't affect how inference runs.

At runtime, the streaming preprocessor is called with ~1,640 samples per step (~0.1s). The exported graph handles any input size from 1 up to the declared max.

The 2-second cap just means if someone somehow passed more than 32,000 samples in a single call, it would fail. In practice the streaming window is fixed at 1,640 samples.

else:
max_samples = model.max_audio_len * model.sampling_rate
audio_tensor = torch.randn(min(93680, max_samples))
shapes_collection = torch.export.ShapesCollection()
max_n_chunks = int(model.max_audio_len * model.n_samples)
shapes_collection[audio_tensor] = {0: Dim.DYNAMIC(max=max_n_chunks)}
shapes_collection[audio_tensor] = {0: Dim.DYNAMIC(max=max_samples)}
with torch.no_grad(), torch.fx.experimental._config.patch(
backed_size_oblivious=True
):
Expand Down
Loading