Skip to content

[SPARK-56697][SQL][DML] Refactor Catalog Manager#55646

Open
andreaschat-db wants to merge 7 commits into
apache:masterfrom
andreaschat-db:dsv2TransactionCatalogManagerRefactor
Open

[SPARK-56697][SQL][DML] Refactor Catalog Manager#55646
andreaschat-db wants to merge 7 commits into
apache:masterfrom
andreaschat-db:dsv2TransactionCatalogManagerRefactor

Conversation

@andreaschat-db
Copy link
Copy Markdown
Contributor

@andreaschat-db andreaschat-db commented May 1, 2026

What changes were proposed in this pull request?

Currently, the TransactionAwareCatalogManager is a decorator of the CatalogManager. This introduces some risk because newly introduced methods in the CatalogManager might not be implemented in TransactionAwareCatalogManager. To resolve this issue we create a new CatalogManager trait that all CatalogManager implementations can implement.

Furthermore, the PR fixes an issue where previously every withTransaction call constructed a TACM whose super-constructor re-ran v1SessionCatalog.bindCatalogManagerForSessionFunctionKinds(this), rebinding the v1 SessionCatalog's session-function-kinds source to the transient TACM. Because old TACM never overrode sessionFunctionKindsForUnqualifiedResolution, subsequent unqualified function resolution that took the path-kinds route observed TACM's own (always-empty) _currentCatalogName / _currentNamespace / _sessionPath fields - i.e., a "no USE / no SET PATH" view — even after the transaction had ended. The new design binds only in DefaultCatalogManager's constructor, so the binding stays on the long-lived DefaultCM.

Why are the changes needed?

Ensures all newly introduced methods in the CatalogManager are going to be implemented in both the DefaultCatalogManager and the TransactionAwareCatalogManager.

Does this PR introduce any user-facing change?

Yes. Fixes an issue described in the description above.

How was this patch tested?

Refactor PR. Used existing tests.

Was this patch authored or co-authored using generative AI tooling?

Claude Sonnet 4.6.

@andreaschat-db andreaschat-db force-pushed the dsv2TransactionCatalogManagerRefactor branch from a7a29ac to b4de792 Compare May 19, 2026 10:22
@andreaschat-db andreaschat-db marked this pull request as ready for review May 19, 2026 10:23
// ---- Transactions ----
def transaction: Option[Transaction] = None

def withTransaction(transaction: Transaction): CatalogManager =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to do by default? Why not throw an exception to be on a safer side?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or not implement at all

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes better. This is now implemented in the DefaultCatalogManager.

def setCurrentNamespace(namespace: Array[String]): Unit

// ---- Session path ----
def sessionPathEntries: Option[Seq[CatalogManager.SessionPathEntry]]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about importing SessionPathEntry directly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned it up.

def currentNamespace: Array[String]
def setCurrentNamespace(namespace: Array[String]): Unit

// ---- Session path ----
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan, could you help review this, please?

Copy link
Copy Markdown
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me, but I would want @cloud-fan and @gengliangwang to take a look too.

@andreaschat-db andreaschat-db force-pushed the dsv2TransactionCatalogManagerRefactor branch from 46320d4 to b6035b9 Compare May 20, 2026 17:06
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clean refactor of the previous TODO ("extract a CatalogManager trait so TACM doesn't carry inherited mutable state it never uses"). The old class CatalogManager becomes a trait, with two concrete implementations: DefaultCatalogManager (owns the session state) and TransactionAwareCatalogManager (decorator). Construction sites move from new CatalogManager(...) to new DefaultCatalogManager(...); all other call sites continue to use CatalogManager as the abstract type. The intended benefit is realized: the compiler now enforces that any new method on the trait is implemented in both classes.

One thing worth surfacing — the PR description says "No user-facing change" but the refactor silently fixes a real bug. In the old code, every withTransaction call constructed a TACM whose super-constructor re-ran v1SessionCatalog.bindCatalogManagerForSessionFunctionKinds(this), rebinding the v1 SessionCatalog's session-function-kinds source to the transient TACM. Because old TACM never overrode sessionFunctionKindsForUnqualifiedResolution, subsequent unqualified function resolution that took the path-kinds route observed TACM's own (always-empty) _currentCatalogName / _currentNamespace / _sessionPath fields — i.e., a "no USE / no SET PATH" view — even after the transaction had ended. The new design binds only in DefaultCatalogManager's constructor, so the binding stays on the long-lived DefaultCM. Worth updating the PR description to call this out; the bug fix is more interesting than the refactor itself.

Also: as a side-effect of the trait extraction, two Scaladoc references in sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala (lines 124 and 138) now point to [[CatalogManager]]'s "constructor"; the binding now lives in DefaultCatalogManager's constructor. Worth a one-line touch-up in this PR or a follow-up.

private[sql] class DefaultCatalogManager(
val defaultSessionCatalog: CatalogPlugin,
val v1SessionCatalog: SessionCatalog) extends SQLConfHelper with Logging {
val v1SessionCatalog: SessionCatalog) extends CatalogManager {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor-vals here (and val tempVariableManager at line 122) implement abstract def members of the trait without the override modifier. Scala allows omitting override for abstract overrides, but every other method in this class uses override def, and adding it on these three makes the implements-from-trait relationship explicit (and prevents accidental shadow-definition if the abstract is later removed from the trait).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

override def sessionFunctionKindsForUnqualifiedResolution(): Seq[SessionFunctionKind] =
delegate.sessionFunctionKindsForUnqualifiedResolution()

override private[sql] def reset(): Unit = delegate.reset()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a one-line comment: this is a behavior change versus the pre-refactor code. The old reset() was inherited from CatalogManager and cleared TACM's own (separately initialized) super-state without touching the delegate. The new override clobbers the underlying DefaultCatalogManager — clears its catalogs map, session path, current catalog/namespace, etc. reset() is test-only and not currently called on a TACM, so this is harmless today, but a future test reaching for it would silently affect session-level state outside the transaction.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would omit the comment because the base behaviour was only introduced a few days ago. This PR is a direct follow up to fix issues of the base PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants