Skip to content

Conversation

@Joy-Majumder
Copy link

Description

This PR adds a new command-line option to zstd to allow users to configure the input buffer size during decompression.

Problem

When decompressing large files (1TB+) on mechanical hard drives, both reading and writing occur simultaneously, resulting in:

  • Concurrent read and write operations (~80MB/s + ~50MB/s)
  • Excessive disk seeking between operations
  • Sub-optimal total throughput (130MB/s vs 250MB/s+ possible)
  • CPU usage only at ~20%

Solution

This feature enables users to specify a larger input buffer size (1-5GB), allowing sequential disk reads to fill the buffer before decompression begins. This reduces disk seek operations and improves overall throughput.

Usage

# Decompress with 1GB input buffer
zstd -d largefile.zst --ibuf-size=1024M

# Decompress with 5GB input buffer for very large files
zstd -d largefile.zst --ibuf-size=5120M

Changes

  • Add 'inputBufferSize' field to FIO_prefs_t structure
  • Implement FIO_setInputBufferSize() setter function
  • Use configured buffer size in FIO_createDResources()
  • Add --ibuf-size command-line option with help documentation

Testing

  • Compilation succeeds
  • Default behavior unchanged (backward compatible)
  • Works with various buffer sizes (256M, 512M, 1024M)
  • Output files are byte-for-byte identical
  • Works with stdin/stdout
  • Help message displays correctly

Performance Impact

Expected improvements for the scenario described (12TB file on mechanical drive):

  • Before: ~130MB/s (concurrent read+write with seeking)
  • After: ~250MB/s+ (sequential reads with larger buffer)

Actual improvements depend on disk characteristics, available RAM, compression ratio, and CPU speed.

This feature addresses performance issues when decompressing large files
on mechanical hard drives with excessive disk seek operations.

Changes:
- Add 'inputBufferSize' field to FIO_prefs_t in fileio_types.h
- Implement FIO_setInputBufferSize() setter function
- Use configured buffer size in FIO_createDResources()
- Add --ibuf-size command-line option with help text

The option allows users to specify input buffer size (default: 0 for
automatic). For large files on slow drives, using --ibuf-size=1024M
to --ibuf-size=5120M can significantly improve throughput by reducing
disk seek operations.

Backward compatible: defaults to ZSTD_DStreamInSize when not specified.
@meta-cla
Copy link

meta-cla bot commented Dec 7, 2025

Hi @Joy-Majumder!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant