Skip to content

[CALCITE-7514] MultiJoinOptimizeBushyRule throws AssertionError when a join condition references 3 or more factors#4934

Open
sbroeder wants to merge 1 commit into
apache:mainfrom
sbroeder:7514
Open

[CALCITE-7514] MultiJoinOptimizeBushyRule throws AssertionError when a join condition references 3 or more factors#4934
sbroeder wants to merge 1 commit into
apache:mainfrom
sbroeder:7514

Conversation

@sbroeder
Copy link
Copy Markdown

Jira Link

CALCITE-7514

Changes Proposed

MultiJoinOptimizeBushyRule crashes with an AssertionError when a
MultiJoin's join filters contain a condition that references anything
other than exactly two factors (e.g. a CASE expression spanning three
tables). Such conditions cannot be represented as binary join edges, so
passing them to createEdge produced an edge with
factors.cardinality() != 2, which violated assertions in the edge
comparator and the greedy ordering loop.

Fix: conditions that do not touch exactly two factors are separated
from the edge list before the greedy loop runs. After the join tree is
built, these conditions are remapped from the original MultiJoin field
positions to the output positions of the final join tree using
RexPermuteInputsShuttle, then applied as a LogicalFilter above the
join tree (before the reordering project). For inner joins this is
semantically equivalent to applying them as join predicates.

This also resolves two long-standing TODOs in the class Javadoc:

  • "Join conditions that touch 3 factors." — fully handled.
  • "Join conditions that touch 1 factor." — handled defensively (no
    crash, correct result); optimal push-down to the scan is left as a
    future improvement.

Reproduction: the following query previously threw AssertionError:

SELECT e1.ename
FROM emp e1, dept d, emp e2
WHERE e1.deptno = d.deptno
  AND e2.deptno = d.deptno
  AND d.deptno = CASE WHEN e1.sal > 1000 THEN e2.empno ELSE e1.empno END
A new test testMultiJoinOptimizeBushyThreeFactorCondition in
RelOptRulesTest covers this case.

@sbroeder sbroeder marked this pull request as draft May 11, 2026 23:00
@sbroeder sbroeder marked this pull request as ready for review May 11, 2026 23:55
@xiedeyantu xiedeyantu changed the title [[CALCITE-7514] MultiJoinOptimizeBushyRule throws AssertionError when a join condition references 3 or more factors [CALCITE-7514] MultiJoinOptimizeBushyRule throws AssertionError when a join condition references 3 or more factors May 12, 2026
Comment thread core/src/main/java/org/apache/calcite/rel/rules/MultiJoinOptimizeBushyRule.java Outdated
Copy link
Copy Markdown
Contributor

@mihaibudiu mihaibudiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no other comments, let's merge this.
Please squash the commits to a single one.

@mihaibudiu mihaibudiu added the LGTM-will-merge-soon Overall PR looks OK. Only minor things left. label May 12, 2026
…a join condition references 3 or more factors

Conditions in a MultiJoin's joinFilters that reference anything other
than exactly two factors cannot be represented as binary join edges.
Passing such a condition to createEdge produced an edge with
factors.cardinality() != 2, causing an AssertionError in the edge
comparator's rowCountDiff method, and at two further assertion sites in
the greedy loop.

The fix separates these conditions from the edge list upfront. After the
greedy join-ordering loop completes, the remaining conditions are remapped
from original MultiJoin field positions to the final join tree's output
positions via RexPermuteInputsShuttle, then applied as a LogicalFilter
above the join tree before the reordering project. For inner joins this
is semantically equivalent to applying them as join predicates.

Two TODO items are resolved:
- "Join conditions that touch 3 factors" is fully handled.
- "More than 1 join conditions that touch the same pair of factors" was
  stale from the original commit; the conditions loop already collects
  all edges subsumed by newFactors at each greedy step.

A remaining TODO notes that 1-factor conditions are applied as a filter
above the join tree rather than pushed down to the individual scan.
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

LGTM-will-merge-soon Overall PR looks OK. Only minor things left.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants