Skip to content

[SPARK-56853] Improve PATH Tests#55866

Open
srielau wants to merge 1 commit into
apache:masterfrom
srielau:SPARK-56853-path-qa-gaps
Open

[SPARK-56853] Improve PATH Tests#55866
srielau wants to merge 1 commit into
apache:masterfrom
srielau:SPARK-56853-path-qa-gaps

Conversation

@srielau
Copy link
Copy Markdown
Contributor

@srielau srielau commented May 13, 2026

What changes were proposed in this pull request?

This is a test-only PR that closes coverage gaps identified by a follow-up QA audit of the SQL Standard PATH feature delivered in SPARK-56489, SPARK-56501, SPARK-56518, SPARK-56520, SPARK-56605, SPARK-56639, SPARK-56681, and SPARK-56750. No product code is changed.

New test surfaces:

  • Spark Connect E2Esql/connect/.../SqlPathE2ETestSuite.scala (new): SET PATH and current_path() round-trip over the gRPC client; a persisted view created under one path resolves its body under the frozen path even when the invoker switches PATH; SET PATH is rejected with UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED over Connect when the feature flag is off. Previously, ProtoToParsedPlanTestSuite pinned PATH_ENABLED=false for analyzer isolation and nothing exercised the feature over the wire.
  • PySpark APIpython/pyspark/sql/tests/test_catalog.py: three new methods added to CatalogTestsMixin (test_path_current_path_disabled, test_path_set_path_and_current_path, test_path_set_path_rejected_when_disabled). Because the mixin is shared with test_parity_catalog.py, the same tests run under classic PySpark and under Connect parity. Previously pyspark.sql.functions.current_path had no test calling it, and SET PATH was never exercised from Python.
  • Golden SQLsql-tests/inputs/sql-path.sql + matching results/ and analyzer-results/ outputs. Covers SET PATH grammar (literal, DEFAULT_PATH, SYSTEM_PATH, PATH append, current_schema / current_database), CURRENT_PATH() reflection and the ANSI keyword form, path-driven routine and relation resolution, and the static error conditions (DUPLICATE_SQL_PATH_ENTRY, INVALID_SQL_PATH_SCHEMA_REFERENCE, WRONG_NUM_ARGS). The disabled-feature branch (SET PATHUNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED) is exercised by flipping spark.sql.path.enabled=false inline. Previously zero .sql input files referenced PATH.
  • COUNT(*) → COUNT(1) rewrite gate — two new tests in SetPathSuite exercise the path-driven gate in Analyzer.matchesFunctionName / FunctionResolution.isSessionBeforeBuiltinInPath: with a temp count registered, flipping SET PATH so system.session precedes system.builtin must suppress the rewrite; with an unrelated temp on path the rewrite still fires.
  • ALTER VIEW ... WITH SCHEMA preserves the frozen path — new test in v1.AlterViewSchemaBindingSuite asserts that generateViewProperties(captureNewPath=false) keeps the persisted VIEW_RESOLUTION_PATH intact even when the caller's session PATH differs from the create-time path.
  • Catalyst unit tests — new PathElementSuite (case sensitivity, repeated CurrentSchemaEntry, literal-vs-CurrentSchemaEntry toleration, multi-part and dot-containing identifier error formatting), new SqlPathFormatSuite (toDescribeJson valid / malformed payloads, formatForDisplay quoting behavior, multi-level namespaces), and additional CatalogManagerSuite cases covering serializePathEntries round-tripping (including multi-level / quoted / space-containing identifiers) and deserializePathEntriesOrFail error messaging.
  • PATH-disabled compat for persisted views — two new tests in SQLViewSuite pin the documented behavior of CatalogManager.resolutionPathEntriesForAnalysis when spark.sql.path.enabled=false: the pinned frozen path is dropped and analysis falls back to the recorded viewCatalogAndNamespace. The fully-qualified body path keeps working; the unqualified body resolves through the view's recorded namespace (or raises TABLE_OR_VIEW_NOT_FOUND when nothing in scope matches).
  • Concurrency smoke test — new SetPathSuite case runs SET PATH (alternating between two paths) and unqualified count(*) lookups concurrently for 200 iterations each, validating the in-source claim that SessionCatalog.lookupBuiltinOrTempFunction is intentionally non-synchronized to avoid lock-order inversion with CatalogManager.synchronized.
  • cloneSession() propagation matrix — replaces the in-source TODO in SetPathSuite ("audit and pin down clone semantics in a follow-up") with six tests pinning what does and does not survive spark.cloneSession(). Confirmed: stored SET PATH, USE SCHEMA, temp views, and temp functions (because functionRegistry.clone() deep-copies them in BaseSessionStateBuilder) propagate; temp variables do not; a child's SET PATH does not leak back to the parent.
  • V2 catalogs in SET PATH — new SqlPathV2CatalogSuite registers two InMemoryCatalog instances and verifies first-match resolution for unqualified tables and unqualified V2 functions across the path, plus the negative case where the unqualified name only lives in a catalog that is not on the path. Previously ProcedureSuite had a single V2-catalog test for the procedure side; tables and functions through real V2 catalogs were uncovered.

Explicitly deferred:

  • DESCRIBE FUNCTION rendering of the frozen path (DescribeFunctionCommandUtils.storedResolutionPathString). DESCRIBE FUNCTION needs broader improvements first; will land in a follow-up.
  • The single-pass resolver counterpart (FunctionResolverUtils.isUnqualifiedCountShadowedByTemp). The single-pass resolver wiring for PATH was reverted in the SPARK-56750 follow-ups, so this is not actionable yet.

Why are the changes needed?

A QA audit of the PATH feature found several substantive gaps:

  • Spark Connect serialization of AnalysisContext.resolutionPathEntries was entirely untested; a regression dropping pinned entries from the proto plan would not have been caught.
  • The public current_path() PySpark function was documented (with a +SKIP doctest) but never called from any test.
  • The PATH feature had no SQLQueryTestSuite golden file, despite golden coverage being the project convention for SQL-level features of similar surface (CURRENT_SCHEMA, variables, identifier clauses).
  • The path-driven gate for COUNT(*) → COUNT(1) was a behavior change called out in the SPARK-56750 description but had no test that flipped PATH and verified the suppression.
  • ALTER VIEW ... WITH SCHEMA preserving the frozen path was undocumented in tests.
  • PathElement, SqlPathFormat, and CatalogManager.serializePathEntries were only exercised end-to-end via SQL.
  • The PATH-disabled read path through a view that already persisted a frozen path was untested (a forward-compat scenario likely to occur during rolling upgrades).
  • The deadlock-safety claim in SessionCatalog about non-synchronized lookups was unvalidated.
  • The cloneSession() propagation matrix was flagged as "incidental" with an in-source TODO.
  • Real V2 catalogs in SET PATH were not exercised end-to-end (only stub cat/cat2 for procedures).

These tests pin the documented intent so future changes have to update the assertions deliberately.

Does this PR introduce any user-facing change?

No. Test-only changes.

How was this patch tested?

The new tests are themselves the patch. They all pass locally:

  • build/sbt "connect-client-jvm/testOnly *SqlPathE2ETestSuite"
  • build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z sql-path.sql" (and regenerated via SPARK_GENERATE_GOLDEN_FILES=1)
  • build/sbt "sql/testOnly org.apache.spark.sql.SetPathSuite"
  • build/sbt "sql/testOnly org.apache.spark.sql.execution.command.v1.AlterViewSchemaBindingSuite"
  • build/sbt "sql/testOnly *SimpleSQLViewSuite -- -z SPARK-56853"
  • build/sbt "sql/testOnly *SqlPathV2CatalogSuite"
  • build/sbt "catalyst/testOnly *PathElementSuite *SqlPathFormatSuite *CatalogManagerSuite"
  • ./dev/lint-scala passes (Scalastyle and Scalafmt).

PySpark tests (test_path_* in CatalogTestsMixin and the Connect parity variant) are syntactically validated; they are picked up automatically by the existing pyspark.sql.tests.test_catalog and pyspark.sql.tests.connect.test_parity_catalog modules registered in dev/sparktestsupport/modules.py and will run in CI.

Was this patch authored or co-authored using generative AI tooling?

Yes. Test design and implementation were iterated with coding-assistant tooling; the author reviewed and owns the final patch.

Generated-by: Cursor with Claude Opus 4.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant