[VL] Add cross-config / cross-build-cycle invariant tests for ColumnarCachedBatchSerializer#12124
Open
yaooqinn wants to merge 1 commit into
Open
Conversation
liuneng1994
approved these changes
May 21, 2026
…rCachedBatchSerializer
Extend ColumnarCachedBatchE2ESuite with three lifecycle invariant tests that
exercise the cached-batch wire format across SQLConf transitions:
1. cross-config: build with stats=true, read with stats=false
-- wire format is build-time-decided; v2-with-stats payload must
survive a reader-time downgrade and prune must still engage.
2. cross-config (reverse): build with stats=false, read with stats=true
-- legacy v1 payload (stats=null) at build time; reader must NOT
fabricate stats and must fall back to full scan.
3. cross-build-cycle: same logical query rebuilt twice with different
stats settings (round 1 stats=true, round 2 stats=false). Round 2
must re-honor stats=false; the serializer must not reuse stale gate
state from round 1.
Each test asserts (R-correct) result row count, (R-path) the cached
batches are served by ColumnarCachedBatchSerializer (not vanilla
DefaultCachedBatchSerializer), and (R-prune, when expectPrune=true)
that InMemoryTableScanExec.numOutputRows reflects partition pruning.
The shared assertion logic is factored into a private helper
`assertGlutenCachedPlanAndPrune(df, expectPrune)`. Its scaladoc
documents an intentional asymmetry: the reverse "no prune" direction
is not observable via numOutputRows on the Gluten native path (the same
baseline test "numOutputRows reflects post-filter row count" already
notes that outRows may legitimately be 0 even with full pruning, because
the surviving row is delivered through the native scan metrics path).
The expectPrune=false branch therefore intentionally performs path-only
verification.
All 15 suite tests pass locally (Spark 3.3, Velox backend).
Generated-by: Claude claude-opus-4.7
021d402 to
cfe1058
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes are proposed in this pull request?
Extend
ColumnarCachedBatchE2ESuitewith three lifecycle invariant tests forColumnarCachedBatchSerializer, exercising the cached-batch wire format across SQLConf transitions:spark.gluten.sql.columnar.maxBatchSize-stats path enabled, read with it disabled. The wire format is build-time-decided; a v2-with-stats payload must survive a reader-time downgrade and partition pruning must still engage.stats=null) at build time must NOT be retro-fitted by the reader; the query must fall back to a full scan.Each test asserts (a) result row-count correctness, (b) the cached batches are served by
ColumnarCachedBatchSerializer(R-path), and (c) whenexpectPrune=true, thatInMemoryTableScanExec.numOutputRowsreflects partition pruning (R-prune).The shared assertion logic is factored into a private helper
assertGlutenCachedPlanAndPrune(df, expectPrune). Its scaladoc documents an intentional asymmetry: the reverse "no prune" direction is not observable throughnumOutputRowson the Gluten native scan path — the existing baseline test "numOutputRows reflects post-filter row count" already notes thatoutRowsmay legitimately be 0 even under full pruning, because the surviving row is delivered via the native scan metrics path. TheexpectPrune=falsebranch therefore intentionally performs path-only verification.How was this patch tested?
ColumnarCachedBatchE2ESuite— 15 succeeded / 0 failed / 2 canceled (pre-existing baseline) locally on Spark 3.5, Velox backend.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7