[SPARK-56643][CONNECT][TESTS] Add DSv2 temp view with stored plan tests using Spark Connect mode#55571
[SPARK-56643][CONNECT][TESTS] Add DSv2 temp view with stored plan tests using Spark Connect mode#55571longvu-db wants to merge 29 commits into
Conversation
…-only view to drop+re-add tests Co-authored-by: Isaac
…tions, and S5 assertion fix - Fix alterTableWithData to physically remove dropped column values so they do not survive through ALTER chains (e.g. DROP COLUMN then ADD COLUMN same name) - Add scenario number comments (1.1, 1.2, 2.1, ..., 7) to each test - Update Scenario 5 tests to expect null salary after drop+re-add, add v_no_filter view to demonstrate null values Ported from the last 3 commits of PR apache#55571 (Connect counterpart). Co-authored-by: Isaac
c9345a9 to
a177cec
Compare
…tions, and S5 assertion fix - Fix alterTableWithData to physically remove dropped column values so they do not survive through ALTER chains (e.g. DROP COLUMN then ADD COLUMN same name) - Add scenario number comments (1.1, 1.2, 2.1, ..., 7) to each test - Update Scenario 5 tests to expect null salary after drop+re-add, add v_no_filter view to demonstrate null values Ported from the last 3 commits of PR apache#55571 (Connect counterpart). Co-authored-by: Isaac
…tions, and S5 assertion fix - Fix alterTableWithData to physically remove dropped column values so they do not survive through ALTER chains (e.g. DROP COLUMN then ADD COLUMN same name) - Add scenario number comments (1.1, 1.2, 2.1, ..., 7) to each test - Update Scenario 5 tests to expect null salary after drop+re-add, add v_no_filter view to demonstrate null values Ported from the last 3 commits of PR apache#55571 (Connect counterpart). Co-authored-by: Isaac
613e639 to
854c11f
Compare
…tions, and S5 assertion fix - Fix alterTableWithData to physically remove dropped column values so they do not survive through ALTER chains (e.g. DROP COLUMN then ADD COLUMN same name) - Add scenario number comments (1.1, 1.2, 2.1, ..., 7) to each test - Update Scenario 5 tests to expect null salary after drop+re-add, add v_no_filter view to demonstrate null values Ported from the last 3 commits of PR apache#55571 (Connect counterpart). Co-authored-by: Isaac
0dd8d1d to
5ffcae5
Compare
andreaschat-db
left a comment
There was a problem hiding this comment.
Thanks @longvu-db. I left a few comments.
8bf75d5 to
5813598
Compare
Auto-review results1 critical issue found and fixed:
Fix (commit 0b76175): Get the server-side catalog instance and call if (clearCachingCatalog) {
val serverSession = getServerSession(session)
serverCatalog[CachingInMemoryTableCatalog](serverSession, "cachingcat").clearCache()
}Also removed No other issues found. Test logic, assertions, error conditions, cleanup patterns, imports, and style all look good. ✅ |
cloud-fan
left a comment
There was a problem hiding this comment.
Test-only PR that adds DataSourceV2TempViewConnectSuite with 21 Connect-mode parity tests mirroring the classic DataSourceV2DataFrameSuite "temp view with stored plan" scenarios. Architecture is reasonable for this kind of parity coverage: in-process Connect server via SparkConnectServerTest, client-driven reads/writes through withSession, server-side catalog peeks via getServerSession + serverCatalog[T] for external mutations. Most of @andreaschat-db's earlier nits are addressed; a few helper-design questions remain.
PR description vs. diff
Two pieces of the PR description no longer match what's in the diff:
- "Add 11 new tests" — the file actually adds 21 tests (each scenario has session, external, and cached-connector variants). The opportunistic extra coverage is welcome, just worth updating the count.
- The "Shared test infrastructure changes" block describes a new
CachingInMemoryTableCatalogand aBasicInMemoryTableCatalog.alterTablefix, but per commits5c64c989f8e("Remove ... already in master") and58135982455("Restore catalog files to master version") neither is in this PR anymore. Trim that section before merge.
Duplication with the analogue
All 21 tests are line-for-line parity copies of tests already in DataSourceV2DataFrameSuite (lines 3009–3568) — no shared trait, just sql(...)/withTable/checkAnswer/catalog(name) swapped for session.sql(...).collect()/withTableAndViews/assertRows/serverCatalog[T]. Was a shared trait considered? The analogue extending InsertIntoTests makes factoring non-trivial so I understand if the answer is "duplication is the lesser evil," but a one-line acknowledgement in the PR description would help future scenario additions remember to update both files.
Process note
@andreaschat-db's approval predates five commits (the latest being the 2026-05-20 auto-review fix). A re-confirm seems appropriate before merge.
aef7da9 to
401a3e6
Compare
Move the 9 repeated-sql tests from DataSourceV2DataFrameSuite into a DSv2RepeatedSqlTests trait backed by DSv2ExternalMutationTestBase, following the same pattern as DSv2TempViewWithStoredPlanTests (PR apache#55571). This allows a future Connect suite to reuse the same tests by mixing in the trait and providing Connect-specific implementations. Co-authored-by: Isaac
3f48931 to
94a65be
Compare
|
CI failed with unrelated with unrelated flaky |
cloud-fan
left a comment
There was a problem hiding this comment.
The shared-trait refactor since the last round (extracting 21 tests into DSv2TempViewWithStoredPlanTests over a DSv2ExternalMutationTestBase hook surface) addresses the duplication concern from the prior round, and the Connect-side justifications for QueryTest.sameRows (DataSourceV2TempViewConnectSuite.scala:46-53) and dropping clearCache() (:69-72) are accurate — I verified both. Correction to my prior round: I claimed checkAnswer is usable from Connect server tests; that's true only for classic server-side DataFrames, not for the Connect-client DataFrames this suite uses (QueryTest.checkAnswer accesses queryExecution/logicalPlan/materializedRdd, which throw on Connect client DataFrames). Your sameRows switch is correct; ignore the earlier suggestion.
Status against the prior AI review: 3 addressed, 0 remaining, 2 new. The prior commit (0b76175) is no longer an ancestor of HEAD (rebased), so I'm treating new findings uniformly.
Bigger design question — coordination with #55356
@fwc's PR #55356 ("SharedSparkSession provides sql.SparkSession, add SharedClassicSparkSession, ClassicQueryTest") is pursuing the same goal of sharing tests between classic and Connect, but at the foundation layer: it lifts SharedSparkSession.spark to sql.SparkSession, moves classic-only helpers into classic.SharedSparkSession/classic.QueryTest subtraits, and makes QueryTest itself extendable in a Connect-providing suite.
If that direction lands, three of the four hooks in DSv2ExternalMutationTestBase become unnecessary:
withTestSession(fn)— becausesparkitself is the abstractsql.SparkSession.checkRows— ifcheckAnsweris made Connect-aware in shared infra, instead of every shared trait re-implementing it.withTestTableAndViews— oncewithTableoperates onsql.SparkSession.
Only getTableCatalog would remain, because the catalog-mutation bypass is genuinely Connect-specific.
Merge order doesn't matter, but the direction needs to be one direction, not two. Could you sync with @fwc on whether the right path is:
(a) wait for #55356 to land and rebase this PR onto the lifted SharedSparkSession, dropping the redundant hooks; or
(b) merge this PR first but contribute the generic Connect primitives (checkAnswer dispatch, withTable lift) into shared infrastructure as part of #55356, so future shared traits don't reinvent them.
What would be unfortunate is leaving a private 4-hook abstraction in this trait and a parallel foundational abstraction in #55356 — that's two test-sharing frameworks to reconcile later. The shared-trait pattern in StaticProcedureSuiteBase / sql.SparkSessionBuilderImplementationBindingSuite (plain extends AnyFunSuite, no per-trait routing) is what this should look like once the foundation is in place.
Process note
@andreaschat-db's LGTM (2026-05-08, commit ae21481) predates the entire shared-trait refactor (introduced in b879d2c). A re-confirm seems appropriate before merge.
Two smaller inline findings below.
…ts using Spark Connect mode Rebased on latest master.
Assert schema field names before every checkAnswer/assertRows with empty results to verify schema is preserved correctly.
Add missing test for Section 1 Scenario 3.1 where a session ALTER TABLE DROP COLUMN should trigger an analysis exception when querying a temp view that references the removed column.
Co-authored-by: Isaac
- Fix assertRows to use sortBy instead of toSet (preserves duplicates) - Replace serverCatalog/serverCachingCatalog with generic serverCatalog[T] - Add withTableAndView helper with try/finally for cleanup Co-authored-by: Isaac
…ngCatalog Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
…ps design Co-authored-by: Isaac
…la from this PR (already in master) Co-authored-by: Isaac
Co-authored-by: Isaac
…panion object CachingInMemoryTableCatalog has no companion object, so the static call CachingInMemoryTableCatalog.clearCache() does not compile. Fixed by getting the server-side catalog instance and calling clearCache() on it, matching the pattern used by the classic DataSourceV2DataFrameSuite. Co-authored-by: Isaac
- Replace assertRows toString comparison with QueryTest.sameRows for value-based comparison - Add ClassTag bound to serverCatalog for runtime type checking at helper boundary - Remove schema param from externalAppend, derive it from table columns via CatalogV2Util - Remove clearCachingCatalog param from withTableAndViews since sessions are isolated per-test Co-authored-by: Isaac
Addresses review feedback to share test cases instead of duplicating. New shared traits in sql/core (accessible from both sql/core and sql/connect): - DSv2ExternalMutationTestBase: base trait with 5 abstract methods (withTestSession, checkRows, getTableCatalog, withTestTableAndViews, testPrefix) and shared externalAppend helper - DSv2TempViewWithStoredPlanTests: 21 temp view tests (7 scenarios x 3 variants) Classic DataSourceV2DataFrameSuite: - Mixes in DSv2TempViewWithStoredPlanTests - Implements abstract methods using classic session, checkAnswer, catalog() - Removes 578 lines of duplicated inline tests Connect DataSourceV2TempViewConnectSuite: - 72 lines (was 739) thin wrapper mixing in the shared trait - Implements abstract methods for Connect: withSession, QueryTest.sameRows, getServerSession for catalog access Also addresses cloud-fan's earlier review comments: - Uses QueryTest.sameRows for value-based comparison (not toString) - Adds ClassTag bound to getTableCatalog for runtime type checking - Derives schema from table columns via CatalogV2Util (no schema param) - Removes clearCachingCatalog since sessions are isolated per-test Co-authored-by: Isaac
…roken Scaladoc - Remove `with DSv2RepeatedSQLTests` mixin (trait not in this PR) - Remove unused imports: BufferedRows, CatalogV2Util, InMemoryBaseTable, InternalRow - Fix Scaladoc to only reference DSv2TempViewWithStoredPlanTests - Use session.sql() in classic withTestTableAndViews to honor param contract Co-authored-by: Isaac
Co-authored-by: Isaac
…r Connect QueryTest.checkAnswer cannot be used with Connect client DataFrames because it accesses logicalPlan, queryExecution, and materializedRdd which throw ConnectClientUnsupportedErrors. sameRows provides the same value-based, order-agnostic comparison that checkAnswer uses internally. Co-authored-by: Isaac
…ents - DSv2ExternalMutationTestBase: extend QueryTest instead of SharedSparkSession so the trait does not pull in a SparkSession of its own; concrete suites provide sessions via their own inheritance. - DSv2TempViewWithStoredPlanTests: document why .collect() is appended to every session.sql() call (needed for Connect client DataFrames) and why testIdent is catalog-agnostic. - DataSourceV2TempViewConnectSuite: clarify that checkRows uses sameRows because the DataFrame is a Connect *client* DataFrame (not a server-side one), and explain why clearCache() is unnecessary (Connect sessions are isolated per-test). Co-authored-by: Isaac
- Rename `cat` variable to `catalog` throughout DSv2TempViewWithStoredPlanTests - Rename `cat` parameter to `catalog` in DSv2ExternalMutationTestBase.externalAppend - Simplify fully-qualified [[]] references to short class names - Remove identifier-is-catalog-agnostic comment - Change "assertion wiring" to "result comparison" in Connect suite - Change testPrefix default from "" to "[classic] " - Simplify verbose method Scaladoc in DSv2ExternalMutationTestBase Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
…nectSuite Co-authored-by: Isaac
…adoc Co-authored-by: Isaac
- Set classic testPrefix to "" to preserve existing test name history in CI dashboards (only Connect uses "[connect] " prefix) - Add require() check in classic getTableCatalog to match Connect impl, giving a clear error instead of a confusing ClassCastException Co-authored-by: Isaac
94a65be to
8fd6b01
Compare
Co-authored-by: Isaac
What changes were proposed in this pull request?
Extract the 21 DSv2 temp view with stored plan tests from
DataSourceV2DataFrameSuiteinto a shared traitDSv2TempViewWithStoredPlanTests, and add a Connect-mode runnerDataSourceV2TempViewConnectSuiteso the same tests run in both classic and Connect modes.The shared trait covers these scenarios:
Each test creates a DSv2 table, inserts initial data, builds a temp view with a filter (
salary < 999) to demonstrate stored plan behavior, and verifies the view after table modifications. External writes use the direct catalog API (loadTable+withData) to simulate writes from outside the session. Caching connector variants useCachingInMemoryTableCatalogto verify stale-cache behavior.This is the Spark Connect counterpart of #55540.
Refactoring
DSv2ExternalMutationTestBase(new): Abstract base trait with session-agnostic hooks (withTestSession,checkRows,getTableCatalog,withTestTableAndViews,externalAppend) so the same test logic runs in both classic and Connect modes.DSv2TempViewWithStoredPlanTests(new): Shared trait containing the 21 test methods, extracted fromDataSourceV2DataFrameSuite.DataSourceV2TempViewConnectSuite(new): Connect-mode runner that provides Connect-specific session, catalog access, and result comparison.DataSourceV2DataFrameSuite(modified): Mixes inDSv2TempViewWithStoredPlanTestsinstead of inlining the tests; adds classic-mode implementations of the abstract hooks.Why are the changes needed?
The existing temp view tests in
DataSourceV2DataFrameSuiteuse classic-only APIs (checkAnswer,catalog()helper) and cannot run in Connect mode. Extracting them into a shared trait enables Connect parity testing.Does this PR introduce any user-facing change?
No. This PR only adds tests and refactors test infrastructure.
How was this patch tested?
All 21 shared tests pass in both modes:
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (claude-opus-4-6)