Skip to content

Optimize WAV decoding with buffered read#1320

Open
Dan-Flores wants to merge 37 commits into
meta-pytorch:mainfrom
Dan-Flores:wav_decode_optimization
Open

Optimize WAV decoding with buffered read#1320
Dan-Flores wants to merge 37 commits into
meta-pytorch:mainfrom
Dan-Flores:wav_decode_optimization

Conversation

@Dan-Flores
Copy link
Copy Markdown
Contributor

@Dan-Flores Dan-Flores commented Apr 2, 2026

This PR enables the optimized file reading from this TODO:

TODO WavDecoder: Optimize decoding and converison by processing samples in
fixed size buffer. For now, we use a simpler implementation: create a
tensor from raw bytes and convert tensor to float32

The optimization: read samples and convert them through a fixed size buffer, rather than reading all values at once.

  • In WavDecoder::getSamplesInRange, we create a buffer close to DEFAULT_CHUNK_BUFFER_SIZE bytes.
    • The actual buffer size is adjusted based on the numBytesPerSample, to ensure we do not read partial samples.
  • In WavDecoder::getSamplesInRange, we process samples according to the buffer size and create the output samples tensor.
  • WavDecoder::convertSamplesToFloat does the actual conversion to float32.

Additional changes:
readValue is split into safeReadValue and readValue. This is necessary for performance, see inline comments for more details.

Benchmarks:

Details

Benchmarking code

Audio Decoding Benchmark

File: test/resources/sine_stereo_s32_10min.wav
Warmup: 3, Trials: 10

AudioDecoder 0.249s
WavDecoder 0.148s
SoundFile 0.146s

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 2, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/meta-pytorch/torchcodec/1320

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 8480eec with merge base 7e8c399 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 2, 2026
@Dan-Flores Dan-Flores marked this pull request as draft April 16, 2026 05:42
Comment thread src/torchcodec/_core/WavDecoder.cpp
Comment thread src/torchcodec/_core/WavDecoder.cpp Outdated
Comment thread src/torchcodec/_core/WavDecoder.cpp Outdated
@Dan-Flores Dan-Flores marked this pull request as ready for review April 21, 2026 02:51
Comment thread src/torchcodec/_core/WavDecoder.cpp Outdated
constexpr float scale =
1.0f / static_cast<float>(std::numeric_limits<int32_t>::max());
for (int64_t i = 0; i < totalSamples; ++i) {
int32_t sample = readValue<int32_t>(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we use readValue() without the additional TORCH_CHECKs in safeReadValue() for performance.

It is safe to read values without the bounds checks because the TORCH_CHECK above ensures that bufferData is large enough for totalSamples.

Comment thread src/torchcodec/_core/WavDecoder.cpp Outdated
std::vector<uint8_t> buffer(alignedBufferSize);

int64_t samplesProcessed = 0;
auto samples =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, numSamples = dataSize / numBytesPerSample. Would we want to validate that dataSize in case the headerValue is some arbitrarily large number due to corruption?

Copy link
Copy Markdown
Contributor Author

@Dan-Flores Dan-Flores Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For integer overflow, we do not need to validate before doing that division, because int32_t / uint16_t will always fit inside int64_t.

For general overallocation concerns, this is a bit trickier to validate. Since dataSize is int32_t, the maximum valid value it can have is substantial but not always incorrect. I'll add a clamp to ensure the tensor allocated is not larger than the actual file, to prevent an incorrect header from causing us to overallocate.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To address the other dimension of our allocated tensor, header_.numChannels: there is no equivalent ground truth as the filesystem's fileSize is to dataSize.

Instead, I've added a validation in validateHeader() for numBytesPerSample that the WAV specification already requires (labelled here as BytePerBloc):

numBytesPerSample == numChannels * bitsPerSample / 8

We can substitute the values used to calculate numSamples to see that the number of elements in the tensor is bounded by the file size:

samples tensor elements = numSamples * header_.numChannels
	= (std::min(header_.dataSize, fileSize_) / header_.numBytesPerSample)) * header_.numChannels
	= (fileSize_ / header_.numBytesPerSample) * header_.numChannels
	= (fileSize_ / (numChannels * bitsPerSample / 8)) * header_.numChannels
	= fileSize_ / (bitsPerSample / 8)

As an aside: If the header is corrupted such that numSamples is calculated incorrectly, we were already going to fail in WavDecoder::getSamplesInRange() when we call safeReadFile() to read data beyond what exists. Clamping dataSize to fileSize doesn't prevent that failure either, but it does prevent us from overallocating before failing.

Comment on lines +31 to +33
// Soundfile's default chunk size. See
// https://github.com/libsndfile/libsndfile/blob/master/src/common.h#L77
constexpr size_t DEFAULT_CHUNK_BUFFER_SIZE = 8192;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid naming this a "chunk".

A "chunk" in the context of wav files refer to the "format chunk" or "data chunk". This thing is just a temporary buffer that 'views' the data chunk, but it's not a chunk of its own (i.e. it's not a part of the wav specs).

int64_t totalSamples = samplesInBuffer * header_.numChannels;

// Normalize 32-bit PCM samples to [-1.0, 1.0] range.
// We use readValue because the buffer size is already validated above.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show me where this is validated?

By 'above' I think you mean 'by the caller', but I don't see where. There should also be a comment about this expectation at the top of the function, similar to the one that exists for outputPtr.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chain of intermediate variables passed to readValue() from getSamplesInRange() is complex, so I am less confident in this claim. Since we create the buffer and determine the number of samples we read, we should be safe to read from the buffer without checking the length each time, but this could probably use some refactoring to make this more clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants