[GLUTEN-10215][VL] Delta write: Fix native partitioned layout accounting#12016
Open
malinjawi wants to merge 2 commits intoapache:mainfrom
Open
[GLUTEN-10215][VL] Delta write: Fix native partitioned layout accounting#12016malinjawi wants to merge 2 commits intoapache:mainfrom
malinjawi wants to merge 2 commits intoapache:mainfrom
Conversation
This was referenced May 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This patch fixes the native Delta write path for partitioned optimized writes in the Velox backend.
The change:
maxRecordsPerFilewithin native partition stripes by slicing columnar batches when needed.recordsInFileby the actual written chunk row count instead of the original input batch row count.dataColumns.The same writer/stat fixes are applied to Delta 3.3 and Delta 4.0 sources.
Why are the changes needed?
The existing native partitioned writer split incoming Velox batches by partition, but it accounted file layout at the original batch level. That can make partitioned optimized writes violate
maxRecordsPerFilewhen a single native partition stripe is larger than the file limit.There is also a conditional stats-schema issue when Delta keeps partition columns in the writer batch: the native stats tracker must still compute Delta AddFile stats over the data columns only.
Does this PR introduce any user-facing change?
No public API change. This improves correctness/layout behavior for native Delta writes.
How was this patch tested?
Built locally and ran:
Result: 10 tests passed, 0 failures.
I also ran a targeted local benchmark for partitioned optimized Delta writes with stats enabled, comparing native Delta write enabled vs native write disabled on the same branch.
Environment: Apple M3 Max, 14 logical CPUs, 36 GiB RAM, macOS arm64. Rows: 2,000,000, 2 warmups, 5 measured runs, 8 input partitions, 16 partition values,
maxRecordsPerFile=100000.Important comparison caveat: the pre-patch native path can be functionally invalid for partitioned
maxRecordsPerFilecases, so this benchmark compares the fixed native path against native-write-disabled execution. Invalid pre-patch layout cases should be treated as correctness failures, not as a fair performance baseline.Note: the targeted benchmark shows native output files are larger in this local setup; this PR focuses on partitioned layout correctness and native write accounting, not file-size/compression tuning.
Related issue: #10215