[SPARK-56750] default path config#55717
Open
srielau wants to merge 2 commits intoapache:masterfrom
Open
Conversation
8d33747 to
83dcf21
Compare
srielau
commented
May 6, 2026
srielau
commented
May 6, 2026
srielau
commented
May 6, 2026
srielau
commented
May 6, 2026
srielau
commented
May 6, 2026
srielau
commented
May 6, 2026
Contributor
Author
There was a problem hiding this comment.
AI-assisted self-review. The high-level direction (mirror spark.sql.defaultCatalog for PATH; align routine-resolution shortcuts with the live PATH) is sound. Findings below are mostly correctness/perf concerns and a few naming/docs nits — I've flagged the deadlock + test-default + last-mode behavior change as the higher-priority items.
Summary
- Prior state:
SET PATHexists, but with no admin-side default — every session mustSET PATHitself, andSET PATH = DEFAULT_PATHonly resolves to a fixeddefaultPathOrder. Function-resolution shortcuts (1-part lookup, the security gate that blocks temp-shadowing-builtin, thecount(*)→count(1)gate) readspark.sql.legacy.sessionFunctionResolutionOrderdirectly rather than the live PATH, so aSET PATH = system.session, system.builtin, ...doesn't actually flip those gates. - Design approach: new
spark.sql.defaultPathmirrorsspark.sql.defaultCatalog— session-level conf, accepts the fullSET PATHgrammar via a reused entry rule (singlePathElementList+parseSqlPathElements), with anisWorkspaceDefaultExpansioncycle break for innerDEFAULT_PATHtokens. Routine-resolution shortcuts now readCatalogManager.leadingSystemFunctionKinds/isSessionBeforeBuiltinInPath, both derived from the effective PATH. - Key decisions: (1) parse the conf on every read, no caching (see A3); (2) path-derive the system-kinds shortcut order rather than keep the legacy switch (see A4 for the "last" mode consequence); (3) wire kinds via a
varcallback set from theCatalogManagerconstructor body (see A1/A2); (4) validateSchemaInPatharity at expansion time, not parse time (see A8). - Implementation sketch: parser adds
singlePathElementList+parseSqlPathElements; ASTPathElementmoves into catalyst (connector.catalog) soCatalogManagercan callPathElement.expandfrom its conf-fallback path whileSetPathCommandstill calls it fromrun();CatalogManagergrowsworkspaceDefaultPathEntries,leadingSystemFunctionKinds,isSessionBeforeBuiltinInPath;SessionCatalogexposessessionFunctionKindsProvider(var) wired post-construction.
Inline comments are split across this review and the next (was hitting an API issue with a single 12-comment payload).
srielau
commented
May 6, 2026
Contributor
Author
Review follow-up (SPARK-56750 / #55717)Thanks for the detailed review — here is how the latest commits address it. Correctness / threading
PATH → builtin/session “shortcut” semantics
Performance / naming / validation
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR implements SPARK-56750 by adding a session-level
spark.sql.defaultPathconfiguration (SQLConf.DEFAULT_PATH) and wiring it into SQL PATH handling whenspark.sql.path.enabledis true.High-level changes:
spark.sql.defaultPath: String session conf (default empty). When PATH is enabled and the user has not issuedSET PATH, the session uses this value as the effective PATH (same syntax asSET PATH).SET PATH = DEFAULT_PATHresets to this configured default. An innerDEFAULT_PATHtoken inside the conf value expands to Spark’s built-in default ordering (cycle break), as documented on the conf.singlePathElementListandAbstractSqlParser.parseSqlPathElementsso path fragments are parsed with the normal grammar (no ad-hoc tokenization).CatalogManager: Parses and expands the default path from conf; integrates storedSET PATHvs default; exposes helpers used for resolution andCURRENT_PATH(). Wires a path-derived builtin/session function kind order intoSessionCatalogviasessionFunctionKindsProviderso unqualified routine resolution and the “temp must not shadow builtin” checks follow the effective PATH instead of relying only onspark.sql.functionResolution.sessionOrder(SESSION_FUNCTION_RESOLUTION_ORDER).SessionCatalog/Analyzer: Uses the catalog-manager-provided order for builtin vs temp resolution semantics where needed (e.g.COUNT(*)rewrite gating when session shadows builtin).sessionFunctionKindsProvider(), which may synchronize onCatalogManager. Other code paths (for example namespace changes) can holdCatalogManager’s lock and call back intoSessionCatalog. HoldingSessionCatalog’s intrinsic lock (synchronizedonthis) in the same stack would invert lock order and can deadlock.lookupBuiltinOrTempTableFunctionis intentionally notsynchronizedonSessionCatalogfor that reason (with an inline comment referencing SPARK-56750).lookupFunctionInfois likewise not wrapped insynchronizedwhen resolving unqualified names, because its builtin/temp branch uses the same provider-backed path.Design note: The default PATH does not need to track every detail of
SESSION_FUNCTION_RESOLUTION_ORDER; that knob may be narrowed or removed in follow-up work. The deadlock fix stands on its own rule: avoidSessionCatalog.synchronizedon any path that invokessessionFunctionKindsProvider()while PATH-driven order is consulted fromCatalogManager.SetPathCommand/SparkSqlParser: AlignsSET PATHparsing with the shared path-element visitor path.SetPathSuitefor default path,SET PATH = DEFAULT_PATH, composition, parse errors, PATH-disabled behavior, and path-driven security around temp vs builtincount.Why are the changes needed?
Downstream deployments need an OSS way to set a default SQL PATH before any
SET PATHruns, similar in role tospark.sql.defaultCatalogfor catalogs. Without it, every session must explicitlySET PATHfor consistent routine resolution andCURRENT_PATH()behavior.Aligning builtin vs temporary function resolution order with the effective PATH keeps PATH, routine resolution, and security checks consistent.
Does this PR introduce any user-facing change?
Yes.
spark.sql.defaultPath.spark.sql.path.enabled=true, sessions with no explicitSET PATHmay observe a non-empty effective PATH and different unqualified routine resolution if this conf is set.SET PATH = DEFAULT_PATHandCURRENT_PATH()behavior can change whenspark.sql.defaultPathis non-empty vs empty.How was this patch tested?
build/sbt "sql/testOnly org.apache.spark.sql.SetPathSuite"build/sbt "catalyst/Compile/compile"/sql/compile(or your usualsql/catalystcompile workflow)Was this patch authored or co-authored using generative AI tooling?
Yes. Some implementation and tests were iterated with coding-assistant tooling; the author reviewed and owns the final patch.