[SPARK-56853] Improve PATH Tests#55866
Open
srielau wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This is a test-only PR that closes coverage gaps identified by a follow-up QA audit of the SQL Standard PATH feature delivered in SPARK-56489, SPARK-56501, SPARK-56518, SPARK-56520, SPARK-56605, SPARK-56639, SPARK-56681, and SPARK-56750. No product code is changed.
New test surfaces:
sql/connect/.../SqlPathE2ETestSuite.scala(new):SET PATHandcurrent_path()round-trip over the gRPC client; a persisted view created under one path resolves its body under the frozen path even when the invoker switches PATH;SET PATHis rejected withUNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLEDover Connect when the feature flag is off. Previously,ProtoToParsedPlanTestSuitepinnedPATH_ENABLED=falsefor analyzer isolation and nothing exercised the feature over the wire.python/pyspark/sql/tests/test_catalog.py: three new methods added toCatalogTestsMixin(test_path_current_path_disabled,test_path_set_path_and_current_path,test_path_set_path_rejected_when_disabled). Because the mixin is shared withtest_parity_catalog.py, the same tests run under classic PySpark and under Connect parity. Previouslypyspark.sql.functions.current_pathhad no test calling it, andSET PATHwas never exercised from Python.sql-tests/inputs/sql-path.sql+ matchingresults/andanalyzer-results/outputs. CoversSET PATHgrammar (literal,DEFAULT_PATH,SYSTEM_PATH,PATHappend,current_schema/current_database),CURRENT_PATH()reflection and the ANSI keyword form, path-driven routine and relation resolution, and the static error conditions (DUPLICATE_SQL_PATH_ENTRY,INVALID_SQL_PATH_SCHEMA_REFERENCE,WRONG_NUM_ARGS). The disabled-feature branch (SET PATH→UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED) is exercised by flippingspark.sql.path.enabled=falseinline. Previously zero.sqlinput files referenced PATH.COUNT(*) → COUNT(1)rewrite gate — two new tests inSetPathSuiteexercise the path-driven gate inAnalyzer.matchesFunctionName/FunctionResolution.isSessionBeforeBuiltinInPath: with a tempcountregistered, flippingSET PATHsosystem.sessionprecedessystem.builtinmust suppress the rewrite; with an unrelated temp on path the rewrite still fires.ALTER VIEW ... WITH SCHEMApreserves the frozen path — new test inv1.AlterViewSchemaBindingSuiteasserts thatgenerateViewProperties(captureNewPath=false)keeps the persistedVIEW_RESOLUTION_PATHintact even when the caller's session PATH differs from the create-time path.PathElementSuite(case sensitivity, repeatedCurrentSchemaEntry, literal-vs-CurrentSchemaEntrytoleration, multi-part and dot-containing identifier error formatting), newSqlPathFormatSuite(toDescribeJsonvalid / malformed payloads,formatForDisplayquoting behavior, multi-level namespaces), and additionalCatalogManagerSuitecases coveringserializePathEntriesround-tripping (including multi-level / quoted / space-containing identifiers) anddeserializePathEntriesOrFailerror messaging.SQLViewSuitepin the documented behavior ofCatalogManager.resolutionPathEntriesForAnalysiswhenspark.sql.path.enabled=false: the pinned frozen path is dropped and analysis falls back to the recordedviewCatalogAndNamespace. The fully-qualified body path keeps working; the unqualified body resolves through the view's recorded namespace (or raisesTABLE_OR_VIEW_NOT_FOUNDwhen nothing in scope matches).SetPathSuitecase runsSET PATH(alternating between two paths) and unqualifiedcount(*)lookups concurrently for 200 iterations each, validating the in-source claim thatSessionCatalog.lookupBuiltinOrTempFunctionis intentionally non-synchronized to avoid lock-order inversion withCatalogManager.synchronized.cloneSession()propagation matrix — replaces the in-sourceTODOinSetPathSuite("audit and pin down clone semantics in a follow-up") with six tests pinning what does and does not survivespark.cloneSession(). Confirmed: storedSET PATH,USE SCHEMA, temp views, and temp functions (becausefunctionRegistry.clone()deep-copies them inBaseSessionStateBuilder) propagate; temp variables do not; a child'sSET PATHdoes not leak back to the parent.SET PATH— newSqlPathV2CatalogSuiteregisters twoInMemoryCataloginstances and verifies first-match resolution for unqualified tables and unqualified V2 functions across the path, plus the negative case where the unqualified name only lives in a catalog that is not on the path. PreviouslyProcedureSuitehad a single V2-catalog test for the procedure side; tables and functions through real V2 catalogs were uncovered.Explicitly deferred:
DescribeFunctionCommandUtils.storedResolutionPathString). DESCRIBE FUNCTION needs broader improvements first; will land in a follow-up.FunctionResolverUtils.isUnqualifiedCountShadowedByTemp). The single-pass resolver wiring for PATH was reverted in the SPARK-56750 follow-ups, so this is not actionable yet.Why are the changes needed?
A QA audit of the PATH feature found several substantive gaps:
AnalysisContext.resolutionPathEntrieswas entirely untested; a regression dropping pinned entries from the proto plan would not have been caught.current_path()PySpark function was documented (with a+SKIPdoctest) but never called from any test.SQLQueryTestSuitegolden file, despite golden coverage being the project convention for SQL-level features of similar surface (CURRENT_SCHEMA, variables, identifier clauses).COUNT(*) → COUNT(1)was a behavior change called out in the SPARK-56750 description but had no test that flipped PATH and verified the suppression.ALTER VIEW ... WITH SCHEMApreserving the frozen path was undocumented in tests.PathElement,SqlPathFormat, andCatalogManager.serializePathEntrieswere only exercised end-to-end via SQL.SessionCatalogabout non-synchronized lookups was unvalidated.cloneSession()propagation matrix was flagged as "incidental" with an in-source TODO.SET PATHwere not exercised end-to-end (only stubcat/cat2for procedures).These tests pin the documented intent so future changes have to update the assertions deliberately.
Does this PR introduce any user-facing change?
No. Test-only changes.
How was this patch tested?
The new tests are themselves the patch. They all pass locally:
build/sbt "connect-client-jvm/testOnly *SqlPathE2ETestSuite"build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z sql-path.sql"(and regenerated viaSPARK_GENERATE_GOLDEN_FILES=1)build/sbt "sql/testOnly org.apache.spark.sql.SetPathSuite"build/sbt "sql/testOnly org.apache.spark.sql.execution.command.v1.AlterViewSchemaBindingSuite"build/sbt "sql/testOnly *SimpleSQLViewSuite -- -z SPARK-56853"build/sbt "sql/testOnly *SqlPathV2CatalogSuite"build/sbt "catalyst/testOnly *PathElementSuite *SqlPathFormatSuite *CatalogManagerSuite"./dev/lint-scalapasses (Scalastyle and Scalafmt).PySpark tests (
test_path_*inCatalogTestsMixinand the Connect parity variant) are syntactically validated; they are picked up automatically by the existingpyspark.sql.tests.test_catalogandpyspark.sql.tests.connect.test_parity_catalogmodules registered indev/sparktestsupport/modules.pyand will run in CI.Was this patch authored or co-authored using generative AI tooling?
Yes. Test design and implementation were iterated with coding-assistant tooling; the author reviewed and owns the final patch.
Generated-by: Cursor with Claude Opus 4.7