Skip to content

[VL][Delta] Add UniForm Iceberg support wiring#12047

Open
malinjawi wants to merge 2 commits intoapache:mainfrom
malinjawi:vl-delta-uniform-iceberg
Open

[VL][Delta] Add UniForm Iceberg support wiring#12047
malinjawi wants to merge 2 commits intoapache:mainfrom
malinjawi:vl-delta-uniform-iceberg

Conversation

@malinjawi
Copy link
Copy Markdown
Contributor

@malinjawi malinjawi commented May 6, 2026

What changes are proposed in this pull request?

This PR adds the first Velox Delta write-side support slice for UniForm Iceberg.

The change keeps Delta as the source of truth for the transaction and UniForm metadata-generation flow. Gluten now passes Delta column-mapping field IDs through the Velox native Parquet writer path when IcebergCompatV2 is enabled, so newly written Delta files can satisfy the Parquet field-id part of the UniForm Iceberg contract.

This PR also:

  • adds delta-iceberg to the Velox Delta build/test profile
  • serializes Delta column-mapping field IDs into native Parquet writer options
  • parses that option on the Velox side and populates WriterOptions.parquetFieldIds
  • preserves the existing Delta async UniForm metadata generation path after commit
  • adds a focused end-to-end UniForm Iceberg suite with Hive Metastore-backed Iceberg readback
  • adds negative coverage for active deletion vectors
  • updates docs/get-started/VeloxDelta.md from Not tested to Partial

Current scope / behavior:

  • Spark 3.5 Velox native Delta write is covered for new UniForm Iceberg tables using IcebergCompatV2
  • the test verifies generated Iceberg metadata and reads the table back through Iceberg
  • active deletion vectors are rejected by Delta for UniForm Iceberg
  • native Velox Iceberg scan for UniForm column-mapping/name-mapping reads is still a follow-up gap, so the docs do not claim full native read support yet

How was this patch tested?

Build / validation:

  • ./build/mvn -Pbackends-velox,delta,spark-3.5 -pl backends-velox -am -DskipTests test-compile
  • ninja -j 6 gluten
  • ninja -j 6 libvelox.dylib
  • git diff --check

Focused local validation:

  • Spark 3.5 org.apache.spark.sql.delta.DeltaUniFormIcebergSuite with -Pbackends-velox,delta,iceberg,spark-3.5
  • Spark 3.5 org.apache.spark.sql.delta.DeltaUniFormIcebergSuite with -Pbackends-velox,delta,spark-3.5 to verify clean cancellation without Iceberg test classes
  • Spark 3.5 org.apache.spark.sql.delta.DeltaInsertIntoSQLNameColumnMappingSuite

Functional validation:

  • verified a UniForm Iceberg-enabled Delta table can be written through the Velox native Delta write path
  • verified Parquet files are written with IcebergCompatV2-compatible field IDs
  • verified Iceberg metadata JSON is generated after commit
  • verified generated metadata contains Delta version/timestamp information
  • verified the table can be read back through Iceberg with expected rows
  • verified active deletion vectors are rejected for UniForm Iceberg

Was this patch authored or co-authored using generative AI tooling?

Generated-by: IBM BOB

issue: #12039

Follow-up work

This PR intentionally keeps the scope limited to the supported Delta write and Iceberg readback path. Reasonable follow-ups are:

  • add native Velox Iceberg read support for UniForm column-mapping/name-mapping tables by plumbing Iceberg/Parquet field IDs through the scan path
  • add Spark 4.0 UniForm coverage if the same Delta/Iceberg dependency set is enabled there
  • broaden negative coverage for upgrade/rewrite paths that require REORG TABLE ... APPLY (UPGRADE UNIFORM(...))
  • tighten docs once native Iceberg scan support is proven

@github-actions github-actions Bot added CORE works for Gluten Core VELOX DOCS labels May 6, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

Run Gluten Clickhouse CI on x86

@malinjawi malinjawi force-pushed the vl-delta-uniform-iceberg branch from 983fba6 to 5cc239b Compare May 6, 2026 14:12
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

Run Gluten Clickhouse CI on x86

@malinjawi malinjawi force-pushed the vl-delta-uniform-iceberg branch from 5cc239b to 434c553 Compare May 7, 2026 09:30
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Run Gluten Clickhouse CI on x86

@malinjawi malinjawi marked this pull request as ready for review May 7, 2026 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core DOCS VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant