Add 6 Calcite optimization rules to the multi-stage query engine (5 default-on, 1 opt-in)#18554
Add 6 Calcite optimization rules to the multi-stage query engine (5 default-on, 1 opt-in)#18554yashmayya wants to merge 2 commits into
Conversation
…efault-on, 1 opt-in) Registers stock Apache Calcite optimization rules in the MSE planner's logical phase: Default-on: - UnionMergeRule — flattens nested Union(Union(a,b),c) into a single n-ary Union, directly eliminating one PinotSetOpExchange per collapsed level. - AggregateProjectPullUpConstantsRule (Config.ANY) — drops constant columns from GROUP BY keys when the input proves constancy (e.g. WHERE col='X' GROUP BY col, ...). Reduces shuffle key width on multi-tenant queries. Config.ANY chosen because Config.DEFAULT requires a LogicalProject directly below the Aggregate, which never appears in Pinot's pipeline (filter pushdown consumes the Project before PRUNE_RULES runs). - SortRemoveConstantKeysRule — drops constant columns from ORDER BY (WHERE x='Y' ORDER BY x, ts → ORDER BY ts). Smaller sort key and exchange hash key. - ProjectAggregateMergeRule — drops unused aggregate calls when a Project above doesn't reference them. Most cases already handled by other Pinot rules; registered for safety. - SortMergeRule.Config.LIMIT_MERGE — collapses adjacent Sort/LIMIT nodes. Opt-in (default-off): - SortProjectTransposeRule — pushes Sort below Project so LIMIT applies before projection expressions are evaluated. Kept opt-in because firing in BASIC_RULES disrupts ProjectToSemiJoinRule pattern matching: partition-hinted IN (SELECT) queries lose their semi-join + broadcast plan and fall back to inner-join + hash-shuffle + extra DISTINCT aggregate. Enable per-query with SET usePlannerRules='SortProjectTranspose'. Validation: - 30 toggle tests in QueryPlannerRuleOptionsTest exercise both default and disabled paths for the 5 default-on rules and the opt-in rule, asserting plan-shape changes that distinguish rule-fired from rule-skipped. - Regenerated plan-resource fixtures (4 JSON files: GroupBy, PhysicalOptimizer, PinotHintable, SetOp) covering 561 representative queries. - Plan-quality proxy on 10 representative shapes: 2 wins (nested UNION: -4 exchanges, -4 ops; sort-constant-key: smaller sort + exchange hash key), 0 losses, 8 neutral. - Micro-benchmark on representative queries shows compile-time unchanged or faster: constant-pullup -37%, date-range -24%, union-merge -28% (rules eliminate plan nodes early, so downstream work shrinks). - Wide IN-clause regression sweep (100/500/1000 elements): no super-linear behavior introduced; the existing IN-clause polynomial behavior is unaffected. Operator note: operators with an explicit pinot.broker.mse.planner.disabled.rules setting fully replace DEFAULT_DISABLED_RULES. Add SortProjectTranspose to your explicit list if you want it disabled.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18554 +/- ##
============================================
- Coverage 64.27% 64.26% -0.02%
Complexity 1126 1126
============================================
Files 3311 3311
Lines 203784 203833 +49
Branches 31720 31721 +1
============================================
+ Hits 130974 130984 +10
- Misses 62308 62337 +29
- Partials 10502 10512 +10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…e names The test asserts on the exact list of Calcite rules that fire during planning of a specific query. With AggregateProjectPullUpConstants, ProjectAggregateMerge, and SortRemoveConstantKeys now default-on, they appear in the Rule Execution Times output between AggregateProjectMerge and EvaluateProjectLiteral. Same failure mode in MultiNodesOfflineClusterIntegrationTest (extends this class) — fixed by the same single edit. CI run 26203546289 confirmed the exact order.
xiangfu0
left a comment
There was a problem hiding this comment.
Found one high-signal issue; see inline comment.
| // Stock Calcite rule kept opt-in via usePlannerRules — see SORT_PROJECT_TRANSPOSE javadoc | ||
| // above for the rationale (firing in BASIC_RULES disrupts ProjectToSemiJoinRule on | ||
| // partition-hinted IN(SELECT) queries, breaking colocated broadcast semi-joins). | ||
| PlannerRuleNames.SORT_PROJECT_TRANSPOSE |
There was a problem hiding this comment.
Keeping SortProjectTranspose opt-in only via DEFAULT_DISABLED_RULES is not enough. Brokers that already set pinot.broker.mse.planner.disabled.rules bypass that default set entirely (MultiStageBrokerRequestHandler treats the configured list as a full replacement), so this rule becomes enabled immediately after upgrade for those clusters. That re-exposes the exact partition-hinted IN (SELECT) semi-join regression the PR description calls out unless the configured disables are merged with the defaults or the rule is moved out of BASIC_RULES.
There was a problem hiding this comment.
You're right on the mechanics — MultiStageBrokerRequestHandler:200-203 reads the configured list as a full replacement of DEFAULT_DISABLED_RULES, so an operator who set pinot.broker.mse.planner.disabled.rules before this PR won't have SortProjectTranspose in their list when they upgrade. I'd like to push back on this being a blocker though.
Replace-not-merge is the deliberate contract of this config.
When #17258 introduced pinot.broker.mse.planner.disabled.rules, the "if set, replace; else default" branch was a conscious design choice. Operators who explicitly set this config are signaling "I take ownership of which rules are disabled" — including the ability to opt rules back IN that we've defaulted off. That's a feature, not an oversight.
Precedent: rules have been added to DEFAULT_DISABLED_RULES in past PRs without remediating this scenario.
- [Planner] Support disabling rule by default and enabling it with a queryOption #16238 seeded the disabled list with
SORT_JOIN_TRANSPOSE,SORT_JOIN_COPY,AGGREGATE_JOIN_TRANSPOSE_EXTENDED. - Disable Calcite's AggregateUnionAggregate planner rule by default #16515 added
AGGREGATE_UNION_AGGREGATE. - Automatically rewrite MIN / MAX / SUM on long columns to MINLONG / MAXLONG / SUMLONG #17058 added
JOIN_TO_ENRICHED_JOIN. - Disable JOIN_PUSH_TRANSITIVE_PREDICATES optimization rule by default #17181 added
JOIN_PUSH_TRANSITIVE_PREDICATESandAGGREGATE_FUNCTION_REWRITE(the latter wasn't even discussed — the PR title only calls out one of the two).
Granted, all of those landed before #17258 introduced the override config. This PR is genuinely the first to add to DEFAULT_DISABLED_RULES after the override config went in. But the team's stance throughout that lineage has been: the disabled list grows as we add risky rules; the override config is a knob for operators who want exact control, and operators using it are responsible for tracking changes via release notes. Nothing about #17258's design suggested that adding to the default list would now require synchronized config-handling changes.
The proposed remediations have larger downsides than what they fix:
- "Merge the configured list with defaults" is itself a breaking contract change for current users of the override config. Today they get exactly what they configure, including the ability to opt rules back in. Switching to "your list ∪ whatever we decide later" silently re-disables rules they wanted enabled, with no opt-out — and removes the feature the config was designed to provide. If we want this semantics it should be a new config (e.g.
pinot.broker.mse.planner.additional.disabled.rules) so existing override users aren't affected. - "Move
SortProjectTransposeRuleout ofBASIC_RULES" — the rule lives inBASIC_RULESbecause that's where transpose rules structurally belong, alongsideFilterProjectTransposeRule,ProjectWindowTransposeRule, etc. Relocating it specifically to dodge a config edge case mixes concerns. The PR description already calls out "moving it toPRUNE_RULES" as a candidate follow-up to potentially recover its default-on benefit — that's worth doing on its own merits, not as a workaround for the override semantics.
Impact is narrow. The affected population requires all of:
- Operator manually set
pinot.broker.mse.planner.disabled.rules(this is an obscure config — default disabled list works for the vast majority of clusters); - Their override happens to not list
SortProjectTranspose(true by definition since the rule didn't exist until this PR); - They actually run partition-hinted
IN (SELECT)queries that hit the regression pattern (a specific shape, not a general one).
Mitigation that fits the existing contract: I'll add a release note for the next release explicitly calling this out — telling anyone overriding pinot.broker.mse.planner.disabled.rules to add SortProjectTranspose to their list if they want to preserve prior behavior on partition-hinted IN (SELECT). That's the standard channel for change-of-defaults that override users opt into tracking when they set the config.
If we want to revisit the override semantics across the board (a new additional.disabled.rules config, for example), I'm happy to do that in a focused follow-up. I just don't think it's right to land in this PR or to block it on.
gortiz
left a comment
There was a problem hiding this comment.
Maybe we can also enable JoinPushTransitivePredicates by default? WDYT?
Add 6 stock Calcite optimization rules to the multi-stage query engine
Summary
Registers 6 stock Apache Calcite optimization rules in the multi-stage query engine's logical-phase planner — 5 default-on, 1 opt-in. Pinot's MSE planner has historically used roughly 18% of Calcite's logical rule catalog; this PR closes a small, validated subset of that gap with rules that have measurable plan-quality wins and no observed correctness regressions.
UnionMergeRule(CoreRules.UNION_MERGE)Union(Union(a, b), c)into a single n-aryUnion(a, b, c). In MSE this directly eliminates onePinotSetOpExchangeper collapsed level.AggregateProjectPullUpConstantsRule(Config.ANY)WHERE col='X' GROUP BY col, …shape). Reduces shuffle-key width in the multi-stage shuffle, avoiding degenerate-cardinality hash distributions on filter-pinned constants — common for multi-tenant queries.SortRemoveConstantKeysRule(CoreRules.SORT_REMOVE_CONSTANT_KEYS)WHERE x='Y' ORDER BY x, ts→ORDER BY ts). Smaller sort key + smaller exchange hash key when downstream sort-exchange runs.ProjectAggregateMergeRule(CoreRules.PROJECT_AGGREGATE_MERGE)Projectabove the aggregate doesn't reference them. Common for view-on-view dbt patterns where the outer SELECT picks a subset of the inner aggregate's output.SortMergeRule.Config.LIMIT_MERGE(CoreRules.LIMIT_MERGE)Sort/LIMITnodes (e.g. nested CTE with outer LIMIT).SortProjectTransposeRule(CoreRules.SORT_PROJECT_TRANSPOSE)SET usePlannerRules='SortProjectTranspose')BASIC_RULESdisruptsProjectToSemiJoinRulepattern matching on partition-hintedIN (SELECT)queries — see "Why one rule is opt-in" below.Rationale
Pinot's multi-stage query engine uses Apache Calcite for SQL parsing and logical optimization. Calcite 1.40.0 ships ~180 logical-transformation rules; Pinot's
PinotQueryRuleSetscurrently registers ~33 of them (plus ~34 Pinot custom rules). An exhaustive review of the unused Calcite catalog against representative Pinot workloads (single-table OLAP aggregations, multi-table joins, customer query patterns observed in support channels) surfaced a handful of high-confidence wins. The 6 rules in this PR were selected because they:Programs.standard()rule set;PinotFilterIntoJoinRule,PinotJoinPushTransitivePredicatesRule,PinotAggregateFunctionRewriteRule, leaf+intermediate aggregate splitting inAggregatePushdownRule, etc.).Why one rule is opt-in (and what we learned)
SortProjectTransposeRulewas initially proposed default-on because it drove ~79 of 93 plan changes across the regenerated test corpus — the broadest plan-shape impact of any rule in this batch. A bisection during validation found that firing the rule inBASIC_RULESpullsLogicalProjectabove the Sort, leavingSort(Join(left, Aggregate(right))). Calcite'sProjectToSemiJoinRule(already inBASIC_RULES) requires exactlyProject(Join(left, Aggregate(right)))to convert an inner-join-with-distinct into a semi-join. When the rule no longer matches, partition-hintedIN (SELECT)queries fall back fromsemi-join + broadcasttoinner-join + hash-shuffle + extra DISTINCT aggregate stage— a significant perf regression for colocated-join workloads.SortProjectTransposeRuleis therefore registered but added toDEFAULT_DISABLED_RULES. Operators / users who want its narrow projection-after-LIMIT win on queries that don't hit this pattern can opt in per-query (SET usePlannerRules='SortProjectTranspose') or per-broker (pinot.broker.mse.planner.disabled.rules). A follow-up PR can explore moving it fromBASIC_RULESintoPRUNE_RULES(a separate HEP phase that runs afterProjectToSemiJoinRulehas its chance to fire), which may recover the broad benefit without sacrificing the partition-hint optimization.A second rule from the original 6-rule proposal —
DateRangeFilter(Calcite'sDateRangeRules.FilterDateRangeRule) — was dropped from this PR entirely. Calcite 1.40.0's implementation callsgetCluster().getPlanner().getContext().unwrapOrThrow(CalciteConnectionConfig.class).timeZone()in itsonMatch. Pinot'sPlannerContextconstructs the HepPlanner withContexts.EMPTY_CONTEXT, so the unwrap throws NPE on every match. Shipping a rule that fails when enabled — even opt-in — is poor UX. Wiring aCalciteConnectionConfigImplintoPlannerContextplus re-addingDateRangeFilteris queued as a follow-up; with the bug fixed, this rule has the biggest single expected impact (5–50× on time-partitioned tables withEXTRACT(YEAR FROM ts)predicates that currently bypass segment pruning).Safety assessment
Correctness
GroupByPlans.json,PhysicalOptimizerPlans.json,PinotHintablePlans.json,SetOpPlans.json) were spot-checked by an independent reviewer: 10 diverse before/after plan diffs were hand-traced and all were confirmed semantically equivalent. Highlights:Sort(Project(X))→Project(Sort(X))is the dominant pattern whenSortProjectTransposeis enabled (now opt-in only).Union(Union(a, b), c)→ flatUnion(a, b, c)— eliminates one intermediatePinotSetOpExchange.GROUP BY col1, col3withWHERE col1='US'→GROUP BY col3with a re-projected'US'literal — preserves the LEAF+FINAL aggregate split.pinot-query-plannerunit tests pass (1028 of 1158 tests pass; the 130 failures are the pre-existing Protobuf gencode 4.35.0 vs runtime 4.34.1 mismatch inPlanNodeSerDeTest+RexExpressionSerDeTest, reproduces on unmodified master, unrelated to this change).pinot-query-runtimeunit tests pass cleanly.QueryPlannerRuleOptionsTestexercise both default-on and disabled paths for each of the 5 default-on rules + the 1 opt-in rule, asserting plan-shape changes that distinguish rule-fired from rule-skipped.Performance
A micro-benchmark harness was run on representative queries with the new rules toggled on vs off:
WHERE col='X' GROUP BY col, col2EXTRACT(YEAR FROM ts) = 2024UNION ALLCompile time is uniformly faster when rules are enabled — the rules eliminate plan nodes early, so downstream rule applications have less work. Wide
INclause sweep (100 / 500 / 1000 elements) shows no super-linear behavior introduced by the new rules; the existing IN-clause polynomial behavior (~5.4 s/compile forIN (500)) is unaffected.Plan-quality proxy across 10 representative query shapes: 2 measurable wins (1 nested-UNION shape: −4 exchanges, −4 operators; 1 sort-constant-key shape: smaller sort + exchange hash key), 0 losses, 8 neutral.
End-to-end query-execution latency was not measured against a real cluster — this requires the maintainers' integration test infrastructure. The plan-quality proxies + compile-time numbers + correctness validation across 561 fixture queries are the strongest signals available in-tree.
Rolling-upgrade
Rollback / disable
Three escape hatches, all already wired:
SET disablePlannerRules='UnionMerge,SortRemoveConstantKeys,…'.pinot.broker.mse.planner.disabled.rules=UnionMerge,…(instance-level override).Test plan
./mvnw -pl pinot-query-planner test—ResourceBasedQueryPlansTest561/561,QueryPlannerRuleOptionsTest30/30, all other planner test classes pass. (Pre-existing 130 Protobuf SerDe failures reproduce on master.)./mvnw -pl pinot-query-runtime test— pass clean../mvnw -pl pinot-query-planner,pinot-spi spotless:apply checkstyle:check license:check— clean, 0 violations.SortProjectTransposeregression that led to keeping it opt-in.Follow-ups (not in scope here)
CalciteConnectionConfigImplintoPlannerContextand re-introduceDateRangeFilter— the highest-expected-impact rule in the original investigation.SortProjectTransposeRulefromBASIC_RULESintoPRUNE_RULESto recover its broad benefit without disruptingProjectToSemiJoinRule.CustomDataQueryClusterIntegrationTest.