Skip to content

GH-41863: [Python][Parquet] Support lz4_raw as a compression name alias#49135

Open
nwoolmer wants to merge 2 commits intoapache:mainfrom
nwoolmer:GH-41863
Open

GH-41863: [Python][Parquet] Support lz4_raw as a compression name alias#49135
nwoolmer wants to merge 2 commits intoapache:mainfrom
nwoolmer:GH-41863

Conversation

@nwoolmer
Copy link

@nwoolmer nwoolmer commented Feb 4, 2026

Closes #41863

Rationale for this change

Other tools in the parquet ecosystem distinguish between LZ4 and LZ4_RAW, matching the specification: https://parquet.apache.org/docs/file-format/data-pages/compression/

LZ4 (framing) is of course deprecated. PyArrow does not support it, and instead simplifies the user-facing API, using LZ4 as an alias for the LZ4_RAW codec.

However, PyArrow does not accept LZ4_RAW as a valid alias for the LZ4_RAW codec:

ArrowException: Unsupported compression: lz4_raw

This is a friction issue, and confusing for some users who are aware of the differences.

What changes are included in this PR?

  • Adding LZ4_RAW to the acceptable codec names list.
  • Modifying the LZ4->LZ4_RAW mapping to also accept LZ4_RAW->LZ4_RAW.
  • Adding a test

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes, an additive change to the accepted codec names.

@AlenkaF
Copy link
Member

AlenkaF commented Feb 4, 2026

Thank you for contributing @nwoolmer!
I think the changes look good. Can we add some additional text in the User Guide connected to the alias added: https://arrow.apache.org/docs/python/parquet.html#compression-encoding-and-file-compatibility? And maybe even mention alias in the updated Parquet writer docstrings?

…should cover both ParquetWriter and write_table. other paths use kwargs, so no change expected.
@nwoolmer
Copy link
Author

nwoolmer commented Feb 4, 2026

@AlenkaF Just checking the docs build failure. Some of the failures look unrelated to my change, but I am unfamiliar with any dependencies you might have between the pages.

@rok
Copy link
Member

rok commented Feb 4, 2026

Yes, docstest failures are unrelated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python][Parquet] Support LZ4_RAW for parquet writing

3 participants