Core: Implement LZ4 frame compression for Puffin#16348
Open
wombatu-kun wants to merge 1 commit into
Open
Conversation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the LZ4 codec for Puffin, replacing the long-standing TODOs in PuffinFormat.compress / PuffinFormat.decompress that pointed at airlift/aircompressor#142.
Motivation
Puffin declared
lz4as a valid codec (used unconditionally for footer compression via Puffin.write(...).compressFooter()), but the implementation threw UnsupportedOperationException("Unsupported codec: LZ4"). The referenced aircompressor PR #142 was never merged into the version Iceberg ships (io.airlift:aircompressor:2.0.3), which provides only raw LZ4 + Hadoop streams — not the standard LZ4 frame format the Puffin spec requires. As a result, footer compression was unusable and lz4 blob compression was unreachable.Implementation
LZ4 frame support is provided by net.jpountz.lz4 (shipped as at.yawk.lz4:lz4-java, already pinned in this repo via a CVE resolutionStrategy substitution). It is promoted from a transitive-only dependency to a direct implementation dependency of iceberg-core.
This conforms to the Puffin spec: "Single LZ4 compression frame, with content size present". Content size is encoded in the frame descriptor. BLOCK_INDEPENDENCE is required by lz4-java (it only supports independent blocks) and is orthogonal to the spec — it is also the reference lz4 CLI default. aircompressor is retained for ZSTD.
Tests
Verified locally: :iceberg-core:build -x integrationTest green; checkRuntimeDeps green for the spark-4.1 / flink-2.1 / kafka-connect bundles.
Runtime deps & LICENSE
Making lz4-java a direct dependency of iceberg-core propagates it onto the runtime classpath of every shaded runtime bundle that ships iceberg-core. Accordingly:
Open item for maintainers: please sanity-check the LICENSE attribution wording / project URL for the at.yawk.lz4 fork against ASF policy — this is the documented manual step in runtime-deps.gradle.