Skip to content

[SPARK-57039][SQL] Fold InnerJoin with single-row LocalRelation/OneRowRelation into Project#56091

Open
yaooqinn wants to merge 1 commit into
apache:masterfrom
yaooqinn:SPARK-PR24-inner-join-single-row-local-relation
Open

[SPARK-57039][SQL] Fold InnerJoin with single-row LocalRelation/OneRowRelation into Project#56091
yaooqinn wants to merge 1 commit into
apache:masterfrom
yaooqinn:SPARK-PR24-inner-join-single-row-local-relation

Conversation

@yaooqinn
Copy link
Copy Markdown
Member

@yaooqinn yaooqinn commented May 24, 2026

What changes were proposed in this pull request?

Fold Inner Join where one side is a single-row LocalRelation or a OneRowRelation into a Project (or just the other side, when the single-row side has no columns).

Two rules, because the two relation types do not share a tree pattern:

  • ConvertToLocalRelation gains two new cases for Join(LocalRelation(out, Seq(row), false, _), other, Inner, cond, JoinHint.NONE) and the symmetric right arm. The single row materializes as Alias(Literal.create(row.get(i, attr.dataType), attr.dataType), attr.name)(attr.exprId), preserving ExprId so downstream references survive. Result is Project(literals ++ other.output, other), wrapped in Filter(cond, _) if a condition exists.
  • A new FoldInnerJoinWithOneRowRelation rule (tree pattern INNER_LIKE_JOIN) handles Join(OneRowRelation(), other, Inner, cond, JoinHint.NONE). OneRowRelation has zero columns, so no Project is needed — the result is just other (or Filter(cond, other)). Registered in RuleIdCollection and added to both LocalRelation early and LocalRelation batches.

A single combined rule using transformWithPruning(_.containsPattern(LOCAL_RELATION)) would miss the OneRowRelation subtrees, because OneRowRelation does not publish LOCAL_RELATION. Hence two rules.

Four guards keep both rules conservative:

  • JoinType == Inner
  • JoinHint == JoinHint.NONE
  • LocalRelation.data.length == 1 (0 rows are handled by PropagateEmptyRelation; >1 row would Cartesian-explode)
  • !cond.exists(hasUnevaluableExpr) && !other.isStreaming

The streaming guard is required because state-store ordering forbids rewriting a Join into a Project at plan time.

This fulfills the TODO in DecorrelateInnerQuery.scala:435:

// A special optimization for OneRowRelation.
// TODO: add a more general rule to optimize join with OneRowRelation.
case _: OneRowRelation => domain

This rule converges with OptimizeOneRowPlan: the new LocalRelation produced by Limit 1 over the OneRowRelation side is folded by EliminateOuterJoin/ConvertToLocalRelation in the next iteration.

Why are the changes needed?

The optimizer currently leaves a Join node in plans like the ones below, which blocks downstream column pruning and constant folding on the single-row side:

SELECT * FROM t CROSS JOIN (SELECT 1 AS c) s
SELECT * FROM t JOIN (VALUES (1, 'x')) AS s(a, b) ON t.k = s.a

Before:

Join Inner, (t.k = s.a)
:- Relation t
+- LocalRelation [a, b], [[1, x]]

After:

Project [t.*, 1 AS a, x AS b]
+- Filter (t.k = 1)
   +- Relation t

This pattern shows up in BI-tool-generated SQL and in views whose body collapses to a single row after other rules fold it.

Does this PR introduce any user-facing change?

No. Query results are unchanged; only the optimized logical plan shape changes.

How was this patch tested?

New unit tests:

  • FoldInnerJoinWithOneRowRelationSuite — 9 cases: OneRow × table (left/right/no condition), LeftOuter not folded, Array/Map/Struct/nested-struct columns preserved on the other side, condition with Rand() (Unevaluable) not folded.
  • ConvertToLocalRelationSuite — 4 new cases (T7–T10): LocalRelation × non-LocalRelation symmetric, condition referencing both sides folded into Filter, and ExprId preservation with collectFirst { case _: Join }.isEmpty && output.length == 4.

The existing PlanStability* suites pass with no plan diff for any TPC-DS / TPC-H query — none of them currently contain the targeted pattern.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

@yaooqinn yaooqinn marked this pull request as draft May 24, 2026 14:47
@yaooqinn yaooqinn marked this pull request as ready for review May 24, 2026 15:08
@yaooqinn yaooqinn force-pushed the SPARK-PR24-inner-join-single-row-local-relation branch 3 times, most recently from aa23299 to b65de56 Compare May 25, 2026 08:25
…wRelation into Project

Currently ConvertToLocalRelation folds Project/Filter/Limit over LocalRelation
but does not fold Inner Join where one side is a single-row LocalRelation or
OneRowRelation. This PR adds two rules to do that fold:

1. ConvertToLocalRelation gains two cases (left/right symmetric) for
   InnerJoin(LocalRelation(out, [row], false, _), other, Inner, cond, NONE).
2. FoldInnerJoinWithOneRowRelation (new rule, INNER_LIKE_JOIN tree-pattern)
   handles InnerJoin(OneRowRelation(), other, Inner, cond, NONE).

Two rules because OneRowRelation does not publish LOCAL_RELATION tree-pattern,
so a combined rule using transformWithPruning(_.containsPattern(LOCAL_RELATION))
would silently miss the OneRowRelation case.

Four guards keep both rules conservative: Inner only, JoinHint.NONE only,
data.length == 1 only, !cond.exists(hasUnevaluableExpr) && !other.isStreaming.

Tests: new unit cases in ConvertToLocalRelationSuite + FoldInnerJoinWithOneRowRelationSuite.

Generated-by: Claude Opus 4.7
@yaooqinn yaooqinn force-pushed the SPARK-PR24-inner-join-single-row-local-relation branch from b65de56 to efa377b Compare May 25, 2026 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant