[VL] Add cross-config / cross-build-cycle invariant tests for ColumnarCachedBatchSerializer by yaooqinn · Pull Request #12124 · apache/gluten

yaooqinn · 2026-05-21T10:42:13Z

What changes are proposed in this pull request?

Extend ColumnarCachedBatchE2ESuite with three lifecycle invariant tests for ColumnarCachedBatchSerializer, exercising the cached-batch wire format across SQLConf transitions:

cross-config (forward): build with spark.gluten.sql.columnar.maxBatchSize-stats path enabled, read with it disabled. The wire format is build-time-decided; a v2-with-stats payload must survive a reader-time downgrade and partition pruning must still engage.
cross-config (reverse): build with stats disabled, read with stats enabled. The legacy v1 payload (stats=null) at build time must NOT be retro-fitted by the reader; the query must fall back to a full scan.
cross-build-cycle: same logical query rebuilt twice with different stats settings. Round 2 must re-honor its own SQLConf rather than reuse stale gate state from round 1.

Each test asserts (a) result row-count correctness, (b) the cached batches are served by ColumnarCachedBatchSerializer (R-path), and (c) when expectPrune=true, that InMemoryTableScanExec.numOutputRows reflects partition pruning (R-prune).

The shared assertion logic is factored into a private helper assertGlutenCachedPlanAndPrune(df, expectPrune). Its scaladoc documents an intentional asymmetry: the reverse "no prune" direction is not observable through numOutputRows on the Gluten native scan path — the existing baseline test "numOutputRows reflects post-filter row count" already notes that outRows may legitimately be 0 even under full pruning, because the surviving row is delivered via the native scan metrics path. The expectPrune=false branch therefore intentionally performs path-only verification.

How was this patch tested?

ColumnarCachedBatchE2ESuite — 15 succeeded / 0 failed / 2 canceled (pre-existing baseline) locally on Spark 3.5, Velox backend.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

…rCachedBatchSerializer Extend ColumnarCachedBatchE2ESuite with three lifecycle invariant tests that exercise the cached-batch wire format across SQLConf transitions: 1. cross-config: build with stats=true, read with stats=false -- wire format is build-time-decided; v2-with-stats payload must survive a reader-time downgrade and prune must still engage. 2. cross-config (reverse): build with stats=false, read with stats=true -- legacy v1 payload (stats=null) at build time; reader must NOT fabricate stats and must fall back to full scan. 3. cross-build-cycle: same logical query rebuilt twice with different stats settings (round 1 stats=true, round 2 stats=false). Round 2 must re-honor stats=false; the serializer must not reuse stale gate state from round 1. Each test asserts (R-correct) result row count, (R-path) the cached batches are served by ColumnarCachedBatchSerializer (not vanilla DefaultCachedBatchSerializer), and (R-prune, when expectPrune=true) that InMemoryTableScanExec.numOutputRows reflects partition pruning. The shared assertion logic is factored into a private helper `assertGlutenCachedPlanAndPrune(df, expectPrune)`. Its scaladoc documents an intentional asymmetry: the reverse "no prune" direction is not observable via numOutputRows on the Gluten native path (the same baseline test "numOutputRows reflects post-filter row count" already notes that outRows may legitimately be 0 even with full pruning, because the surviving row is delivered through the native scan metrics path). The expectPrune=false branch therefore intentionally performs path-only verification. All 15 suite tests pass locally (Spark 3.3, Velox backend). Generated-by: Claude claude-opus-4.7

github-actions Bot added the VELOX label May 21, 2026

liuneng1994 approved these changes May 21, 2026

View reviewed changes

yaooqinn force-pushed the users/kentyao/cache-stats-fu-b4-test-coverage-v2 branch from 021d402 to cfe1058 Compare May 21, 2026 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL] Add cross-config / cross-build-cycle invariant tests for ColumnarCachedBatchSerializer#12124

[VL] Add cross-config / cross-build-cycle invariant tests for ColumnarCachedBatchSerializer#12124
yaooqinn wants to merge 1 commit into
apache:mainfrom
yaooqinn:users/kentyao/cache-stats-fu-b4-test-coverage-v2

yaooqinn commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yaooqinn commented May 21, 2026

What changes are proposed in this pull request?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants