[SPARK-57039][SQL] Fold InnerJoin with single-row LocalRelation/OneRowRelation into Project#56091
Open
yaooqinn wants to merge 1 commit into
Open
Conversation
aa23299 to
b65de56
Compare
…wRelation into Project Currently ConvertToLocalRelation folds Project/Filter/Limit over LocalRelation but does not fold Inner Join where one side is a single-row LocalRelation or OneRowRelation. This PR adds two rules to do that fold: 1. ConvertToLocalRelation gains two cases (left/right symmetric) for InnerJoin(LocalRelation(out, [row], false, _), other, Inner, cond, NONE). 2. FoldInnerJoinWithOneRowRelation (new rule, INNER_LIKE_JOIN tree-pattern) handles InnerJoin(OneRowRelation(), other, Inner, cond, NONE). Two rules because OneRowRelation does not publish LOCAL_RELATION tree-pattern, so a combined rule using transformWithPruning(_.containsPattern(LOCAL_RELATION)) would silently miss the OneRowRelation case. Four guards keep both rules conservative: Inner only, JoinHint.NONE only, data.length == 1 only, !cond.exists(hasUnevaluableExpr) && !other.isStreaming. Tests: new unit cases in ConvertToLocalRelationSuite + FoldInnerJoinWithOneRowRelationSuite. Generated-by: Claude Opus 4.7
b65de56 to
efa377b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Fold
Inner Joinwhere one side is a single-rowLocalRelationor aOneRowRelationinto aProject(or just the other side, when the single-row side has no columns).Two rules, because the two relation types do not share a tree pattern:
ConvertToLocalRelationgains two new cases forJoin(LocalRelation(out, Seq(row), false, _), other, Inner, cond, JoinHint.NONE)and the symmetric right arm. The single row materializes asAlias(Literal.create(row.get(i, attr.dataType), attr.dataType), attr.name)(attr.exprId), preservingExprIdso downstream references survive. Result isProject(literals ++ other.output, other), wrapped inFilter(cond, _)if a condition exists.FoldInnerJoinWithOneRowRelationrule (tree patternINNER_LIKE_JOIN) handlesJoin(OneRowRelation(), other, Inner, cond, JoinHint.NONE).OneRowRelationhas zero columns, so noProjectis needed — the result is justother(orFilter(cond, other)). Registered inRuleIdCollectionand added to bothLocalRelation earlyandLocalRelationbatches.A single combined rule using
transformWithPruning(_.containsPattern(LOCAL_RELATION))would miss theOneRowRelationsubtrees, becauseOneRowRelationdoes not publishLOCAL_RELATION. Hence two rules.Four guards keep both rules conservative:
JoinType == InnerJoinHint == JoinHint.NONELocalRelation.data.length == 1(0 rows are handled byPropagateEmptyRelation; >1 row would Cartesian-explode)!cond.exists(hasUnevaluableExpr) && !other.isStreamingThe streaming guard is required because state-store ordering forbids rewriting a
Joininto aProjectat plan time.This fulfills the TODO in
DecorrelateInnerQuery.scala:435:This rule converges with
OptimizeOneRowPlan: the newLocalRelationproduced byLimit 1over theOneRowRelationside is folded byEliminateOuterJoin/ConvertToLocalRelationin the next iteration.Why are the changes needed?
The optimizer currently leaves a
Joinnode in plans like the ones below, which blocks downstream column pruning and constant folding on the single-row side:Before:
After:
This pattern shows up in BI-tool-generated SQL and in views whose body collapses to a single row after other rules fold it.
Does this PR introduce any user-facing change?
No. Query results are unchanged; only the optimized logical plan shape changes.
How was this patch tested?
New unit tests:
FoldInnerJoinWithOneRowRelationSuite— 9 cases: OneRow × table (left/right/no condition),LeftOuternot folded,Array/Map/Struct/nested-struct columns preserved on the other side, condition withRand()(Unevaluable) not folded.ConvertToLocalRelationSuite— 4 new cases (T7–T10):LocalRelation× non-LocalRelationsymmetric, condition referencing both sides folded intoFilter, andExprIdpreservation withcollectFirst { case _: Join }.isEmpty && output.length == 4.The existing
PlanStability*suites pass with no plan diff for any TPC-DS / TPC-H query — none of them currently contain the targeted pattern.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7