-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-55024][SQL][FOLLOWUP] Delay namespace length check to v1 identifier creation #54247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Could you double-check the following two failures, @cloud-fan ?
|
szehon-ho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments, mostly lgtm.
i can look at adding some test to mimic CatalogExtension and multi part ns like Iceberg for this behavior if it helps
| def asMultipartIdentifier: Seq[String] = (ident.namespace :+ ident.name).toImmutableArraySeq | ||
|
|
||
| def asTableIdentifier: TableIdentifier = ident.namespace match { | ||
| case ns if ns.isEmpty => TableIdentifier(ident.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious here, is it necessary to remove these if ns is not empty? I see that asTableIdentifierOpt still handles the empty ns case.
I assume in V1SessionCatalog case, the namespace is never empty, but just checking if its necessary to remove, in case some v2 connector is using the method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should never be empty, we have stricter check in other places before: https://github.com/apache/spark/pull/54247/changes#diff-415c3b78d7558933c759da01218075e011c45c4d0fb5ff3fe522f24d6e4b8e20L730
| withIdentClause(ctx.identifierReference(), Seq(qPlan), (ident, otherPlans) => { | ||
| val tableIdentifier = ident.asTableIdentifier | ||
| if (tableIdentifier.database.isDefined) { | ||
| if (ident.length > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this case covered by some test? (its easy to understand though, so not blocking)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| exception = intercept[AnalysisException](testIdent.asTableIdentifier), | ||
| condition = "IDENTIFIER_TOO_MANY_NAME_PARTS", | ||
| parameters = Map("identifier" -> "`a`.`b`.`c`", "limit" -> "2") | ||
| condition = "REQUIRES_SINGLE_PART_NAMESPACE", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just opinion that previously having the identifier was a bit easier to debug. I wonder if we can extend the error message to have a version with identifier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, working on it.
…ifier creation This is a followup to apache#53788 which moved the namespace length check from individual command handlers to the `CatalogAndIdentifier` extractor. That approach is too aggressive: users can extend the session catalog via `CatalogExtension` and support multi-part namespaces through v2 APIs. The check should only happen when we actually create v1 identifiers like `TableIdentifier`, not at the shared name-resolution layer. Changes: - Remove the namespace check from `CatalogAndIdentifier.unapply` so it remains a pure name-resolution mechanism. - Tighten `CatalogV2Implicits.IdentifierHelper.asTableIdentifier` and `asFunctionIdentifier` to require exactly one namespace part and throw `REQUIRES_SINGLE_PART_NAMESPACE` (instead of the less precise `IDENTIFIER_TOO_MANY_NAME_PARTS`). - Remove the now-redundant `V2SessionCatalog.TableIdentifierHelper` and use the unified `CatalogV2Implicits` conversion everywhere. - Simplify `ResolveSessionCatalog` extractors (`ResolvedV1Identifier`, `ResolvedIdentifierInSessionCatalog`, `ResolvedViewIdentifier`) to delegate to `ident.asTableIdentifier`. - Fix `SparkSqlParser` temp view creation to check name length before calling `asTableIdentifier`, so the user always sees `notAllowedToAddDBPrefixForTempViewError` instead of a generic error. Co-authored-by: Cursor <cursoragent@cursor.com>
Remove redundant namespace parameter since identifier already includes the namespace. The message is now: "<sessionCatalog> requires a single-part namespace, but got identifier <identifier>." Co-authored-by: Cursor <cursoragent@cursor.com>

What changes were proposed in this pull request?
This is a followup to #53788 which moved the namespace length check from individual command handlers to the
CatalogAndIdentifierextractor. That approach is too aggressive: users can extend the session catalog viaCatalogExtensionand support multi-part namespaces through v2 APIs. The check should only happen when we actually create v1 identifiers likeTableIdentifier, not at the shared name-resolution layer.Changes
CatalogAndIdentifier.unapplyso it remains a pure name-resolution mechanism, preservingCatalogExtensionflexibility.CatalogV2Implicits.IdentifierHelper.asTableIdentifierandasFunctionIdentifierto require exactly one namespace part and throwREQUIRES_SINGLE_PART_NAMESPACE(instead of the less preciseIDENTIFIER_TOO_MANY_NAME_PARTS). This centralizes the validation.V2SessionCatalog.TableIdentifierHelperand use the unifiedCatalogV2Implicitsconversion everywhere.ResolveSessionCatalogextractors (ResolvedV1Identifier,ResolvedIdentifierInSessionCatalog,ResolvedViewIdentifier) to delegate toident.asTableIdentifier.SparkSqlParsertemp view creation to check name length before callingasTableIdentifier, so the user always seesnotAllowedToAddDBPrefixForTempViewErrorinstead of a generic error.Why are the changes needed?
CatalogExtensionallows users to extend the built-in session catalog and potentially support multi-part namespaces for v2 operations. The early check inCatalogAndIdentifierwould block such extensions. The namespace length should only be validated when we actually need to create v1 identifiers (e.g.TableIdentifier), which inherently require a single-part namespace.Additionally, this PR unifies the scattered namespace validation into a single point (
CatalogV2Implicits.IdentifierHelper), reducing code duplication and ensuring consistentREQUIRES_SINGLE_PART_NAMESPACEerrors.Does this PR introduce any user-facing change?
Yes. Multi-part namespace identifiers now flow through
CatalogAndIdentifierwithout error, allowingCatalogExtensionimplementations to handle them. The error is only thrown when the session catalog actually needs to convert to a v1 identifier.How was this patch tested?
Existing tests updated:
LookupCatalogSuite— removed the early-rejection test, added multi-part namespace cases back.V2SessionCatalogSuite— updated to expectREQUIRES_SINGLE_PART_NAMESPACE.DataSourceV2SQLSuiteV1FilterandDataSourceV2FunctionSuite— all 313 tests pass.Was this patch authored or co-authored using generative AI tooling?
Yes.
Made with Cursor