Skip to content

[GLUTEN-10215][VL] Delta write: Native top-level NOT NULL checks#12030

Open
malinjawi wants to merge 1 commit intoapache:mainfrom
malinjawi:vl-delta-native-write-invariant-checks
Open

[GLUTEN-10215][VL] Delta write: Native top-level NOT NULL checks#12030
malinjawi wants to merge 1 commit intoapache:mainfrom
malinjawi:vl-delta-native-write-invariant-checks

Conversation

@malinjawi
Copy link
Copy Markdown
Contributor

@malinjawi malinjawi commented May 3, 2026

What changes were proposed in this pull request?
This patch keeps simple Delta write invariants native in the Velox backend.

The change:

  • Adds GlutenDeltaInvariantChecker for Delta 3.3 and Delta 4.0 sources.
  • Validates top-level NOT NULL constraints directly on Velox columnar batches before write.
  • Avoids DeltaInvariantCheckerExec and ColumnarToRow for supported NOT NULL-only writes.
  • Keeps unsupported constraints conservative: CHECK constraints and nested NOT NULL constraints still use Delta's row checker.
  • Adds a Velox JNI helper to detect nulls in selected columns, using cached nullCount when available.
  • Adds Spark 3.5 and Spark 4.0 tests for native NOT NULL checks, fallback behavior, and violation reporting.

This is independent from #12024 and follows the Delta write C2R reduction work in #11419 and #12016.

Why are the changes needed?
Delta writes with table invariants currently go through DeltaInvariantCheckerExec, which is row-based. Even simple top-level NOT NULL constraints can introduce a C2R transition in an otherwise native Delta write path.

This patch handles the safe common case natively while preserving Delta's existing fallback behavior for unsupported constraints.

Does this PR introduce any user-facing change?
No public API change. Delta constraint behavior is preserved; supported NOT NULL-only writes can stay native.

How was this patch tested?
Built locally and ran:

  • Spark 3.5 DeltaNativeWriteInvariantSuite: passed
  • Spark 4.0 DeltaNativeWriteSuite: passed
  • Spark 4.0 focused NOT NULL tests: passed
  • C++ Velox build: passed
  • JNI symbol verified in libvelox.dylib
  • Spark 3.5 and Spark 4.0 spotless/checkstyle/scalastyle: passed
  • git diff --check: passed

Covered cases:

  • native Delta write checks top-level NOT NULL without DeltaInvariantCheckerExec
  • native NOT NULL path avoids ColumnarToRow
  • CHECK constraints keep DeltaInvariantCheckerExec
  • NOT NULL violations report InvariantViolationException

I also ran a targeted local benchmark for append workloads comparing the native top-level NOT NULL path with an equivalent unsupported CHECK fallback path.

Workload: 2,000,000 rows, 14 columns, 3 append iterations.

path avg ms
native top-level NOT NULL checker 1713
row CHECK fallback path 1664

Excluding the first warm-up append:

path avg ms
native top-level NOT NULL checker 1605
row CHECK fallback path 1647

The benchmark is effectively neutral because Delta write setup, Parquet output, and commit/log work dominate this microbenchmark. This PR is mainly native write correctness/posture work: it removes a row invariant operator and C2R transition from a common constrained Delta write path.

Related issue: #10215

Tracked by #12025

@github-actions github-actions Bot added the VELOX label May 3, 2026
@malinjawi malinjawi force-pushed the vl-delta-native-write-invariant-checks branch 3 times, most recently from 025cebc to 6ce1a2b Compare May 3, 2026 21:55
@malinjawi malinjawi force-pushed the vl-delta-native-write-invariant-checks branch from 6ce1a2b to e702792 Compare May 3, 2026 23:27
@github-actions github-actions Bot added the CORE works for Gluten Core label May 3, 2026
Copy link
Copy Markdown
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a initial design question atm.

Comment on lines +32 to +34
private[delta] case class GlutenDeltaInvariantChecker private (
notNullConstraints: Seq[(Int, NotNull)])
extends Serializable {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants