Skip to content

Push limit to leaf stage by default for DISTINCT / no-aggregate GROUP BY#18598

Open
yashmayya wants to merge 1 commit into
apache:masterfrom
yashmayya:default-leaf-limit-pushdown-distinct
Open

Push limit to leaf stage by default for DISTINCT / no-aggregate GROUP BY#18598
yashmayya wants to merge 1 commit into
apache:masterfrom
yashmayya:default-leaf-limit-pushdown-distinct

Conversation

@yashmayya
Copy link
Copy Markdown
Contributor

@yashmayya yashmayya commented May 27, 2026

For SELECT DISTINCT col ... LIMIT n and GROUP BY col ... LIMIT n without aggregate functions, the multi-stage engine currently ships every distinct group key from each server to the intermediate stage before applying the limit. This pushes the limit (and order-by-on-key) down to the leaf aggregate by default, so each server emits at most limit groups.

This is safe for the no-aggregate case: each leaf produces complete group keys (no partial aggregation), so leaf-level trimming is exact for ordered queries and a valid subset for unordered ones. Queries with aggregate functions are unchanged — they remain gated behind the existing is_enable_group_trim hint/config. Limited to a single group set, so ROLLUP / CUBE / GROUPING SETS are excluded. Opt out per query with /*+ aggOptions(is_enable_group_trim='false') */.

Note: for an unordered ... LIMIT (no ORDER BY), the specific rows returned may differ from before — this is already unspecified in SQL and was non-deterministic previously.

Covered by new planner plan tests (DISTINCT/GROUP BY + LIMIT, ORDER BY on key, HAVING, OFFSET, opt-out hint, and a multi-group-set negative case) and GroupByOptionsTest integration tests for paginated DISTINCT/GROUP BY.

@yashmayya yashmayya added backward-incompat Introduces a backward-incompatible API or behavior change multi-stage Related to the multi-stage query engine and removed backward-incompat Introduces a backward-incompatible API or behavior change labels May 27, 2026
@yashmayya yashmayya requested a review from Jackie-Jiang May 27, 2026 18:44
@yashmayya yashmayya added the enhancement Improvement to existing functionality label May 27, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 27, 2026

Codecov Report

❌ Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.29%. Comparing base (0e95355) to head (3048529).

Files with missing lines Patch % Lines
...el/rules/PinotAggregateExchangeNodeInsertRule.java 75.00% 0 Missing and 1 partial ⚠️
...t/calcite/rel/rules/PinotLogicalAggregateRule.java 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18598      +/-   ##
============================================
+ Coverage     64.26%   64.29%   +0.02%     
  Complexity     1137     1137              
============================================
  Files          3335     3335              
  Lines        206042   206044       +2     
  Branches      32142    32143       +1     
============================================
+ Hits         132420   132483      +63     
+ Misses        62977    62913      -64     
- Partials      10645    10648       +3     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.29% <60.00%> (+0.02%) ⬆️
temurin 64.29% <60.00%> (+0.02%) ⬆️
unittests 64.29% <60.00%> (+0.02%) ⬆️
unittests1 56.81% <60.00%> (+0.03%) ⬆️
unittests2 36.81% <0.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yashmayya yashmayya force-pushed the default-leaf-limit-pushdown-distinct branch from 896b567 to 3048529 Compare May 27, 2026 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvement to existing functionality multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants