Skip to content

[VL][Delta] Support native OPTIMIZE ZORDER expressions#12062

Draft
malinjawi wants to merge 1 commit intoapache:mainfrom
malinjawi:vl-delta-optimize-zorder-native-expressions
Draft

[VL][Delta] Support native OPTIMIZE ZORDER expressions#12062
malinjawi wants to merge 1 commit intoapache:mainfrom
malinjawi:vl-delta-optimize-zorder-native-expressions

Conversation

@malinjawi
Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request?
This PR follows #12024 and adds native support for Delta OPTIMIZE ZORDER expression execution in the Velox backend.

The change:

  • adds Velox native functions for Delta ZORDER expressions:
    • interleave_bits for InterleaveBits
    • range_partition_id for RangePartitionId / PartitionerExpr
  • converts supported Delta ZORDER expressions in ExpressionConverter
  • allows supported OPTIMIZE ... ZORDER BY commands through GlutenOptimisticTransaction
  • keeps unsupported OPTIMIZE variants on the existing Delta command path
  • adds Delta 3.3 and Delta 4.0 coverage for path-based ZORDER and partition-predicate ZORDER

Why are the changes needed?
#12024 enabled plain OPTIMIZE compaction command offload. OPTIMIZE ZORDER still needed native expression coverage for InterleaveBits and RangePartitionId to keep the supported command execution native.

Does this PR introduce any user-facing change?
No public API change. It extends native Delta OPTIMIZE ZORDER support in the Velox backend.

How was this patch tested?
Built and ran locally:

  • Spark 3.5 test-compile with delta profile
  • Spark 4.0 test-compile with delta profile
  • C++ Velox backend build
  • focused Spark 3.5 DeltaNativeWriteSuite ZORDER tests
  • focused Spark 4.0 DeltaNativeWriteSuite ZORDER tests

Performance
I ran a targeted local benchmark for OPTIMIZE ZORDER on Spark 3.5. Workload: 2,000,000 rows, 14 columns, 128 input files, 1 warmup, 3 measured runs. The benchmark measures only:

OPTIMIZE delta.path ZORDER BY (z1, z2)

Table setup time is excluded.

Compared with native Delta write disabled:

mode avg ms median ms files removed files added
native ZORDER 5519.0 5361.2 128 1
fallback ZORDER 7459.8 7116.3 128 1

Compared with local vanilla Spark/Delta using spark.gluten.enabled=false:

mode avg ms median ms files removed files added
native ZORDER 5762.7 5756.6 128 1
vanilla Spark/Delta 5293.5 5320.6 128 1

The patch improves Gluten native ZORDER posture and is faster than the existing Gluten fallback path on this workload. It does not yet beat local vanilla Spark/Delta; remaining overhead is likely in Delta command planning/log/listing/commit work plus Gluten planning/listener and small terminal job overhead.

Related issue: #10215
Tracked by #12025

Was this patch authored or co-authored using generative AI tooling?
Generated-by: IBM BOB

@github-actions github-actions Bot added CORE works for Gluten Core VELOX labels May 9, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

Run Gluten Clickhouse CI on x86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant