[GLUTEN-10215][VL] Delta write: Fix native partitioned layout accounting by malinjawi · Pull Request #12016 · apache/gluten

malinjawi · 2026-04-30T10:54:52Z

What changes were proposed in this pull request?

This patch fixes the native Delta write path for partitioned optimized writes in the Velox backend.

The change:

Writes each native partition stripe as its own accounting unit.
Enforces maxRecordsPerFile within native partition stripes by slicing columnar batches when needed.
Updates recordsInFile by the actual written chunk row count instead of the original input batch row count.
Preserves partition columns in split output only when Delta's write contract includes partition columns in dataColumns.
Ensures native Delta stats aggregation receives only Delta data columns when the written batch contains extra partition columns.
Adds Delta 4.0 tests for optimized partitioned native writes and Delta Iceberg-compatible partitioned writes with stats enabled.

The same writer/stat fixes are applied to Delta 3.3 and Delta 4.0 sources.

Why are the changes needed?

The existing native partitioned writer split incoming Velox batches by partition, but it accounted file layout at the original batch level. That can make partitioned optimized writes violate maxRecordsPerFile when a single native partition stripe is larger than the file limit.

There is also a conditional stats-schema issue when Delta keeps partition columns in the writer batch: the native stats tracker must still compute Delta AddFile stats over the data columns only.

Does this PR introduce any user-facing change?

No public API change. This improves correctness/layout behavior for native Delta writes.

How was this patch tested?

Built locally and ran:

JAVA_HOME=/Library/Java/JavaVirtualMachines/zulu-17.jdk/Contents/Home \
./dev/run-scala-test.sh \
  -Pjava-17,spark-4.0,scala-2.13,backends-velox,hadoop-3.3,spark-ut,delta \
  -pl backends-velox \
  -s org.apache.spark.sql.delta.DeltaNativeWriteSuite

Result: 10 tests passed, 0 failures.

I also ran a targeted local benchmark for partitioned optimized Delta writes with stats enabled, comparing native Delta write enabled vs native write disabled on the same branch.

Environment: Apple M3 Max, 14 logical CPUs, 36 GiB RAM, macOS arm64. Rows: 2,000,000, 2 warmups, 5 measured runs, 8 input partitions, 16 partition values, maxRecordsPerFile=100000.

workload	mode	median ms	avg ms	rows/sec	files	bytes	speedup
partitioned	native	1557.4	1617.6	1,284,217	32	28,856,342	1.66x
partitioned	native disabled	2588.1	2588.5	772,758	32	21,744,978	baseline
Delta Iceberg-compatible	native	1545.8	1525.8	1,293,817	32	28,878,678	1.64x
Delta Iceberg-compatible	native disabled	2538.0	2558.3	788,012	32	21,790,000	baseline

Important comparison caveat: the pre-patch native path can be functionally invalid for partitioned maxRecordsPerFile cases, so this benchmark compares the fixed native path against native-write-disabled execution. Invalid pre-patch layout cases should be treated as correctness failures, not as a fair performance baseline.

Note: the targeted benchmark shows native output files are larger in this local setup; this PR focuses on partitioned layout correctness and native write accounting, not file-size/compression tuning.

Related issue: #10215

[GLUTEN-10215][VL] Delta write: Fix native partitioned layout accounting

1c07804

github-actions Bot added the VELOX label Apr 30, 2026

[GLUTEN-10215][VL] Fix Delta write format checks

5a96054

This was referenced May 3, 2026

[VL][Delta] Track native Delta writer improvements and optimization support #12025

Open

[GLUTEN-10215][VL] Delta write: Native top-level NOT NULL checks #12030

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GLUTEN-10215][VL] Delta write: Fix native partitioned layout accounting#12016

[GLUTEN-10215][VL] Delta write: Fix native partitioned layout accounting#12016
malinjawi wants to merge 2 commits intoapache:mainfrom
malinjawi:feature/delta-native-write-layout-accounting

malinjawi commented Apr 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

malinjawi commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

malinjawi commented Apr 30, 2026 •

edited

Loading