Skip to content

[VL][Delta] Add native Delta DV reader support#12040

Open
malinjawi wants to merge 3 commits intoapache:mainfrom
malinjawi:split/delta-dv-native-reader-pr
Open

[VL][Delta] Add native Delta DV reader support#12040
malinjawi wants to merge 3 commits intoapache:mainfrom
malinjawi:split/delta-dv-native-reader-pr

Conversation

@malinjawi
Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request?

This PR is the second step in the split Delta deletion-vector (DV) stack, following #12001.

It adds the native Velox-side Delta DV reader layer that consumes the roaring bitmap payload facilities introduced by #12001, without adding the JVM-side Delta scan metadata handoff yet.

Main changes:

  • add a native Delta connector and data source backed by the Hive connector/data source infrastructure
  • register a scoped Delta connector alongside the existing scoped Hive connector for each Velox runtime
  • add Delta split metadata types for:
    • deletion-vector descriptors
    • protocol metadata
    • file statistics used for DV validation
    • serialized split payload buffer views
  • add DeltaDeletionVectorReader to load materialized Delta DV payloads using RoaringBitmapArray
  • add DeltaSplitReader to validate DV protocol/statistics metadata and apply row-index filtering semantics
  • add focused native unit coverage for connector setup, split metadata, and deletion-vector reader behavior

This PR is intentionally native-reader only:

  • no JVM-side Delta scan metadata handoff yet
  • no end-to-end Delta scan offload behavior change yet

Those pieces will be added in follow-up split PRs.

issue #11901.

How was this patch tested?

Added focused native test coverage in:

  • cpp/velox/compute/delta/tests/DeltaConnectorTest.cpp
  • cpp/velox/compute/delta/tests/DeltaSplitTest.cpp
  • cpp/velox/compute/delta/tests/DeltaDeletionVectorReaderTest.cpp

Covered cases:

  • Delta connector configuration and connector properties
  • split-carried deletion-vector descriptors and logical row-count accounting
  • loading materialized DV payloads from RoaringBitmapArray
  • row deletion checks and keep/drop filter decisions
  • empty payload handling and invalid payload rejection
  • protocol/statistics validation for DV-bearing splits

Validation run:

  • fork preview CI against malinjawi/incubator-gluten:main on the combined PR2 branch: all checks passed after rerunning two infra-flaky jobs
  • local git diff --check upstream/main...HEAD
  • local clang-format pass with /opt/homebrew/opt/llvm@15/bin/clang-format over changed C++ files

Was this patch authored or co-authored using generative AI tooling?

Generated-by: IBM BOB

@github-actions github-actions Bot added VELOX CORE works for Gluten Core labels May 5, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

Run Gluten Clickhouse CI on x86

@malinjawi malinjawi force-pushed the split/delta-dv-native-reader-pr branch from 1a16894 to 66ea460 Compare May 7, 2026 09:29
@github-actions github-actions Bot removed the CORE works for Gluten Core label May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant