Skip to content

Add join commutativity rule for reordering fact dimension join#18551

Open
siddharthteotia wants to merge 1 commit into
apache:masterfrom
siddharthteotia:mse-join-commute-dim-tables
Open

Add join commutativity rule for reordering fact dimension join#18551
siddharthteotia wants to merge 1 commit into
apache:masterfrom
siddharthteotia:mse-join-commute-dim-tables

Conversation

@siddharthteotia
Copy link
Copy Markdown
Contributor

@siddharthteotia siddharthteotia commented May 20, 2026

Improvement Summary

  • Adds PinotJoinCommuteRule to the MSE logical planner.
  • When a user writes dim JOIN fact, the rule swaps the inputs so the dim ends up on the right — fixing the build-side memory pressure that the syntactic-default planner gets wrong today

Problem

  • In MSE, PinotJoinExchangeNodeInsertRule decides distribution purely syntactically, and the runtime always builds the hash table from the right input (BaseJoinOperator invariant).
  • So when a user writes dim JOIN fact:
    • The runtime builds a hash table from fact_tbl — the wrong side, memory-wise. For large fact tables this can exceed max_rows_in_join and fail queries that would otherwise succeed.
    • In specific cases** (non-equi joins, user-hinted broadcast, dynamic-broadcast semi-joins): the fact table is also broadcast to every worker, multiplying network cost.

Both problems share the same root: the right side is special (build side always, broadcast side sometimes), and the user's syntactic ordering puts the wrong table there.

BEFORE (user-written)                      AFTER (rule fires)

             JOIN                                       JOIN
            /    \                                     /    \
       dim_tbl  fact_tbl                          fact_tbl  dim_tbl
       (small)   (BIG)                              (BIG)   (small)

      Build side (runtime invariant):             Build side (runtime invariant):
        fact_tbl  (large hash table)  ❌            dim_tbl  (small hash table)  ✅

      Broadcast (when planner picks it,            Broadcast (when planner picks it,
      e.g. non-equi / hinted):                     e.g. non-equi / hinted):
        fact_tbl  (network blowup)    ❌            dim_tbl  (small)              ✅

Solution

  • dim INNER/LEFT/RIGHT/FULL JOIN fact — auto-commuted so dim ends up on the right.
  • dim JOIN fact with /*+ join_strategy='lookup' */ — previously a runtime error (LookupJoinOperator requires dim on the right); now auto-corrected. Lookup join no longer requires users to remember table order when writing the query.
  • ResourceBasedQueriesTest now propagates isDimTable=true to the planner, so dim-aware rules can be exercised end-to-end (was previously a silent gap).

Testing

  • Added PinotJoinCommuteRuleTest — isolated HepPlanner, every predicate branch including idempotency and the lookup auto-correct case.
  • Added Plan-shape tests in PinotJoinCommutePlanTest) — real QueryEnvironment, asserts rule fires, doesn't over-fire, and respects the off-switch.
  • Runtime query correctness via ResourceBasedQueriesTest — H2-compared INNER / LEFT / RIGHT / FULL OUTER, filters
  • Ran Full pinot-query-planner and ResourceBasedQueriesTest - pass with zero failures.

@siddharthteotia siddharthteotia added the multi-stage Related to the multi-stage query engine label May 20, 2026
@siddharthteotia siddharthteotia changed the title [MSE] Add join commutativity rule for reordering fact dimension join Add join commutativity rule for reordering fact dimension join May 20, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 21, 2026

Codecov Report

❌ Patch coverage is 87.27273% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.25%. Comparing base (2910445) to head (ba0e0a4).
⚠️ Report is 20 commits behind head on master.

Files with missing lines Patch % Lines
.../pinot/calcite/rel/rules/PinotJoinCommuteRule.java 90.69% 1 Missing and 3 partials ⚠️
...ava/org/apache/pinot/query/catalog/PinotTable.java 50.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18551      +/-   ##
============================================
+ Coverage     63.75%   64.25%   +0.50%     
+ Complexity     1932     1120     -812     
============================================
  Files          3292     3312      +20     
  Lines        201470   203871    +2401     
  Branches      31316    31736     +420     
============================================
+ Hits         128442   130996    +2554     
+ Misses        62735    62358     -377     
- Partials      10293    10517     +224     
Flag Coverage Δ
custom-integration1 ?
integration 0.00% <ø> (-100.00%) ⬇️
integration1 ?
integration2 0.00% <ø> (ø)
java-21 64.25% <87.27%> (+0.50%) ⬆️
temurin 64.25% <87.27%> (+0.50%) ⬆️
unittests 64.25% <87.27%> (+0.50%) ⬆️
unittests1 56.71% <87.27%> (+0.91%) ⬆️
unittests2 35.49% <20.00%> (+0.23%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants