Skip to content

[SPARK-56750] default path config#55717

Open
srielau wants to merge 2 commits intoapache:masterfrom
srielau:SPARK-56750-default-path
Open

[SPARK-56750] default path config#55717
srielau wants to merge 2 commits intoapache:masterfrom
srielau:SPARK-56750-default-path

Conversation

@srielau
Copy link
Copy Markdown
Contributor

@srielau srielau commented May 6, 2026

What changes were proposed in this pull request?

This PR implements SPARK-56750 by adding a session-level spark.sql.defaultPath configuration (SQLConf.DEFAULT_PATH) and wiring it into SQL PATH handling when spark.sql.path.enabled is true.

High-level changes:

  • New config spark.sql.defaultPath: String session conf (default empty). When PATH is enabled and the user has not issued SET PATH, the session uses this value as the effective PATH (same syntax as SET PATH). SET PATH = DEFAULT_PATH resets to this configured default. An inner DEFAULT_PATH token inside the conf value expands to Spark’s built-in default ordering (cycle break), as documented on the conf.
  • Parser: Adds an ANTLR entry rule singlePathElementList and AbstractSqlParser.parseSqlPathElements so path fragments are parsed with the normal grammar (no ad-hoc tokenization).
  • CatalogManager: Parses and expands the default path from conf; integrates stored SET PATH vs default; exposes helpers used for resolution and CURRENT_PATH(). Wires a path-derived builtin/session function kind order into SessionCatalog via sessionFunctionKindsProvider so unqualified routine resolution and the “temp must not shadow builtin” checks follow the effective PATH instead of relying only on spark.sql.functionResolution.sessionOrder (SESSION_FUNCTION_RESOLUTION_ORDER).
  • SessionCatalog / Analyzer: Uses the catalog-manager-provided order for builtin vs temp resolution semantics where needed (e.g. COUNT(*) rewrite gating when session shadows builtin).
  • Concurrency / deadlock (SPARK-56750): Builtin and temporary name resolution asks sessionFunctionKindsProvider(), which may synchronize on CatalogManager. Other code paths (for example namespace changes) can hold CatalogManager’s lock and call back into SessionCatalog. Holding SessionCatalog’s intrinsic lock (synchronized on this) in the same stack would invert lock order and can deadlock.
    • lookupBuiltinOrTempTableFunction is intentionally not synchronized on SessionCatalog for that reason (with an inline comment referencing SPARK-56750).
    • lookupFunctionInfo is likewise not wrapped in synchronized when resolving unqualified names, because its builtin/temp branch uses the same provider-backed path.

Design note: The default PATH does not need to track every detail of SESSION_FUNCTION_RESOLUTION_ORDER; that knob may be narrowed or removed in follow-up work. The deadlock fix stands on its own rule: avoid SessionCatalog.synchronized on any path that invokes sessionFunctionKindsProvider() while PATH-driven order is consulted from CatalogManager.

  • SetPathCommand / SparkSqlParser: Aligns SET PATH parsing with the shared path-element visitor path.
  • Tests: Extends SetPathSuite for default path, SET PATH = DEFAULT_PATH, composition, parse errors, PATH-disabled behavior, and path-driven security around temp vs builtin count.

Why are the changes needed?

Downstream deployments need an OSS way to set a default SQL PATH before any SET PATH runs, similar in role to spark.sql.defaultCatalog for catalogs. Without it, every session must explicitly SET PATH for consistent routine resolution and CURRENT_PATH() behavior.

Aligning builtin vs temporary function resolution order with the effective PATH keeps PATH, routine resolution, and security checks consistent.

Does this PR introduce any user-facing change?

Yes.

  • New configuration: spark.sql.defaultPath.
  • When spark.sql.path.enabled=true, sessions with no explicit SET PATH may observe a non-empty effective PATH and different unqualified routine resolution if this conf is set.
  • SET PATH = DEFAULT_PATH and CURRENT_PATH() behavior can change when spark.sql.defaultPath is non-empty vs empty.

How was this patch tested?

  • build/sbt "sql/testOnly org.apache.spark.sql.SetPathSuite"
  • build/sbt "catalyst/Compile/compile" / sql/compile (or your usual sql / catalyst compile workflow)

Was this patch authored or co-authored using generative AI tooling?

Yes. Some implementation and tests were iterated with coding-assistant tooling; the author reviewed and owns the final patch.

@srielau srielau force-pushed the SPARK-56750-default-path branch from 8d33747 to 83dcf21 Compare May 6, 2026 17:59
Copy link
Copy Markdown
Contributor Author

@srielau srielau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(stale debug review — see the AI-assisted self-review further down for the actual content)

Copy link
Copy Markdown
Contributor Author

@srielau srielau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(stale debug review — see the AI-assisted self-review further down for the actual content)

Copy link
Copy Markdown
Contributor Author

@srielau srielau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(stale debug review — see the AI-assisted self-review further down for the actual content)

Copy link
Copy Markdown
Contributor Author

@srielau srielau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(stale debug review — see the AI-assisted self-review further down for the actual content)

Copy link
Copy Markdown
Contributor Author

@srielau srielau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(stale debug review — see the AI-assisted self-review further down for the actual content)

Copy link
Copy Markdown
Contributor Author

@srielau srielau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI-assisted self-review. The high-level direction (mirror spark.sql.defaultCatalog for PATH; align routine-resolution shortcuts with the live PATH) is sound. Findings below are mostly correctness/perf concerns and a few naming/docs nits — I've flagged the deadlock + test-default + last-mode behavior change as the higher-priority items.

Summary

  • Prior state: SET PATH exists, but with no admin-side default — every session must SET PATH itself, and SET PATH = DEFAULT_PATH only resolves to a fixed defaultPathOrder. Function-resolution shortcuts (1-part lookup, the security gate that blocks temp-shadowing-builtin, the count(*)→count(1) gate) read spark.sql.legacy.sessionFunctionResolutionOrder directly rather than the live PATH, so a SET PATH = system.session, system.builtin, ... doesn't actually flip those gates.
  • Design approach: new spark.sql.defaultPath mirrors spark.sql.defaultCatalog — session-level conf, accepts the full SET PATH grammar via a reused entry rule (singlePathElementList + parseSqlPathElements), with an isWorkspaceDefaultExpansion cycle break for inner DEFAULT_PATH tokens. Routine-resolution shortcuts now read CatalogManager.leadingSystemFunctionKinds / isSessionBeforeBuiltinInPath, both derived from the effective PATH.
  • Key decisions: (1) parse the conf on every read, no caching (see A3); (2) path-derive the system-kinds shortcut order rather than keep the legacy switch (see A4 for the "last" mode consequence); (3) wire kinds via a var callback set from the CatalogManager constructor body (see A1/A2); (4) validate SchemaInPath arity at expansion time, not parse time (see A8).
  • Implementation sketch: parser adds singlePathElementList + parseSqlPathElements; AST PathElement moves into catalyst (connector.catalog) so CatalogManager can call PathElement.expand from its conf-fallback path while SetPathCommand still calls it from run(); CatalogManager grows workspaceDefaultPathEntries, leadingSystemFunctionKinds, isSessionBeforeBuiltinInPath; SessionCatalog exposes sessionFunctionKindsProvider (var) wired post-construction.

Inline comments are split across this review and the next (was hitting an API issue with a single 12-comment payload).

Copy link
Copy Markdown
Contributor Author

@srielau srielau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(continuation of the AI-assisted self-review — see the parent review above for the full summary; this entry hosts inline comments 7-12 because the GitHub API rejected the 12-comment payload as a single review)

Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala Outdated
@srielau
Copy link
Copy Markdown
Contributor Author

srielau commented May 6, 2026

Review follow-up (SPARK-56750 / #55717)

Thanks for the detailed review — here is how the latest commits address it.

Correctness / threading

  • Deadlock (SC ⇄ CM): Removed synchronized from SessionCatalog.lookupBuiltinOrTempTableFunction so the temp metadata path does not hold SessionCatalog while invoking sessionFunctionKindsProviderCatalogManager (re-entrant synchronized on CM is still possible; the AB-BA lock order with setCurrentNamespace is avoided).
  • sessionFunctionKindsProvider default: Restored the SESSION_FUNCTION_RESOLUTION_ORDER-based default when no CatalogManager wiring is present (direct SessionCatalog in tests).
  • @volatile on the provider var for safe publication after CatalogManager ctor wiring.

PATH → builtin/session “shortcut” semantics

  • \"last\" mode / DESCRIBE / quick-check: System kinds are taken from the prefix before the first user-catalog entry when that prefix contains any system.* entries (legacy builtin-only shortcut preserved).
  • If that prefix has no system entries but the full path does (e.g. user schema before system.builtin), kinds come from the full path (no magic default list).
  • Removed the implicit Seq(Builtin, Temp) fallback when no system segments match; added a test that a user-only PATH does not resolve unqualified builtins (UNRESOLVED_ROUTINE for abs).
  • isSystemBuiltinPathEntry / isSystemSessionPathEntry: Now case-insensitive so stored paths like System.Builtin still classify as system entries (fixes empty kinds + current_path() for case-preserved literals).

Performance / naming / validation

  • spark.sql.defaultPath: Renamed workspace*confDefaultPath* / isConfDefaultExpansion; parseSqlPathElementsparsePathElements on AbstractSqlParser.
  • Parse cache: Cached parsed PathElement list by conf string (ANTLR only); expand + duplicate validation still run on each materialization so CURRENT_SCHEMA / live catalog stay correct.
  • Duplicates: Shared PathElement.validateNoDuplicateResolvedPath for SET PATH and conf default materialization (same rules as command run).
  • Single-part schema in PATH: Validated in AstBuilder.visitPathElement (fail at parse/analysis with path position), not on every expand.
  • DEFAULT_PATH: .checkValue parses via parsePathElements; doc string fixed (no broken [[...]] in user-facing text; references spark.sql.functionResolution.sessionOrder by key).

DROP TEMPORARY FUNCTION IF EXISTS session.count (cleanup in tests)

  • This was not the analyzer binding session.count to the builtin. DropFunctionCommand stripped to count and then threw cannotDropBuiltinFuncError when no temp existed but a builtin collided — before calling dropTempFunction, which made IF EXISTS cleanup fail incorrectly.
  • Fix: Removed that guard; dropTempFunction only touches the temp registry and already honors ignoreIfNotExists. Tests use SQL DROP TEMPORARY FUNCTION IF EXISTS session.count again.

Non-action / follow-up

  • “Coordination vs strategy” note on where leadingSystemFunctionKinds lives: acknowledged; left as-is for this PR to avoid a larger refactor (happy to move toward FunctionResolution in a follow-up if reviewers want that split explicitly).

If anything above should be tweaked before merge, call it out inline and I will adjust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant