feat: sort file groups by statistics during sort pushdown (Sort pushdown phase 2) by zhuqi-lucas · Pull Request #21182 · apache/datafusion

zhuqi-lucas · 2026-03-26T15:54:08Z

Which issue does this PR close?

Closes #17348
Closes #19329

Rationale for this change

This PR implements the core optimization described in the EPIC Sort pushdown / partially sorted scans: using file-level min/max statistics to optimize scan order and eliminate unnecessary sort operators.

Currently, when a query has ORDER BY, DataFusion always inserts a SortExec even when the data is already sorted across files. This PR enables:

Sort elimination when files are non-overlapping and internally sorted
Statistics-based file reordering to approximate the requested order
Automatic ordering inference from Parquet sorting_columns metadata (no WITH ORDER needed)

What changes are included in this PR?

Architecture

Query: SELECT ... ORDER BY col ASC [LIMIT N]

  PushdownSort optimizer
        │
        ▼
  FileScanConfig::try_pushdown_sort()
        │
        ├─► FileSource::try_pushdown_sort()
        │     │
        │     ├─ natural ordering matches? ──► Exact
        │     │   (Parquet WITH ORDER or              │
        │     │    inferred from metadata)             ▼
        │     │                           rebuild_with_source(exact=true)
        │     │                             ├─ sort files by min/max stats
        │     │                             ├─ verify non-overlapping
        │     │                             ├─ redistribute across groups
        │     │                             └─► keep output_ordering
        │     │                                  → SortExec removed
        │     │
        │     ├─ reversed ordering? ──► Inexact
        │     │   (reverse_row_groups)        │
        │     │                                ▼
        │     │                    rebuild_with_source(exact=false)
        │     │                      └─► clear output_ordering
        │     │                           → SortExec kept
        │     │
        │     └─ neither ──► Unsupported
        │
        └─► try_sort_file_groups_by_statistics()
              (best-effort: reorder files by stats)
              └─► Inexact if reordered

Three Optimization Paths

Path 1: Sort Elimination (Exact) — removes SortExec entirely

When the file source's natural ordering satisfies the query (e.g., Parquet files with sorting_columns metadata), and files within each group are non-overlapping, the SortExec is completely eliminated.

Before:                              After:
  SortExec [col ASC]                   DataSourceExec [files sorted]
    DataSourceExec [files]             (output_ordering=[col ASC])

Path 2: Reverse Scan (Inexact) — existing optimization, enhanced

When the requested order is the reverse of the natural ordering, reverse_row_groups=true is set. SortExec stays but benefits from approximate ordering.

Path 3: Statistics-Based File Reordering — new fallback

When the FileSource returns Unsupported, files are reordered by their min/max statistics to approximate the requested order. This benefits TopK queries via better dynamic filter pruning.

Multi-Partition Design

For multiple execution partitions, the optimization works per-partition:

Multi-partition (each partition's SortExec eliminated):
  SortPreservingMergeExec [col ASC]        ← O(n) merge, cheap
    DataSourceExec [group 0: f1, f2]       ← no SortExec, parallel I/O
    DataSourceExec [group 1: f3, f4]       ← no SortExec, parallel I/O

When bin-packing interleaves file ranges across groups, files are redistributed using consecutive assignment to ensure groups are ordered relative to each other:

Before (bin-packed, interleaved):
  Group 0: [f1(0-9),  f3(20-29)]     groups overlap!
  Group 1: [f2(10-19), f4(30-39)]

After (consecutive assignment):
  Group 0: [f1(0-9),  f2(10-19)]     max=19
  Group 1: [f3(20-29), f4(30-39)]    min=20 > 19 ✓ ordered!

Automatic Ordering Inference

DataFusion already infers ordering from Parquet sorting_columns metadata (via ordering_from_parquet_metadata). With this PR, the inferred ordering flows through sort pushdown automatically — users don't need WITH ORDER for sorted Parquet files.

Files Changed

File	Change
`datasource-parquet/src/source.rs`	ParquetSource returns `Exact` when natural ordering satisfies request
`datasource/src/file_scan_config.rs`	Core sort pushdown logic: statistics sorting, non-overlapping detection, multi-group redistribution
`physical-optimizer/src/pushdown_sort.rs`	Module documentation update
`core/tests/physical_optimizer/pushdown_sort.rs`	Updated prefix match test
`sqllogictest/test_files/sort_pushdown.slt`	5 new test groups (A-E) + updated existing tests
`benchmarks/src/sort_pushdown.rs`	New benchmark for sort elimination
`benchmarks/{lib,bin/dfbench,bench}.{rs,sh}`	Benchmark registration

Benchmark Results

300k rows, 8 non-overlapping sorted parquet files, single partition:

Query	Description	Baseline (ms)	Sort Eliminated (ms)	Speedup
Q1	`ORDER BY col ASC` (full scan)	159	91	43%
Q2	`ORDER BY col ASC LIMIT 100`	36	12	67%
Q3	`ORDER BY col ASC` (wide, `SELECT *`)	487	333	31%
Q4	`ORDER BY col ASC LIMIT 100` (wide)	119	30	74%

LIMIT queries benefit most (67-74%) because sort elimination + limit pushdown means only the first few rows are read.

Tests

Unit Tests (12 new)

Unsupported/Inexact/Exact source × sorted/unsorted/overlapping/non-overlapping
Multi-group consecutive redistribution (even and uneven distribution)
Partial statistics, single-file groups, descending sort

SLT Integration Tests (5 new groups)

Test A: Non-overlapping files + WITH ORDER → Sort eliminated (single partition)
Test B: Overlapping files → statistics reorder, SortExec retained
Test C: LIMIT queries (ASC sort elimination + DESC reverse scan)
Test D: target_partitions=2 → SPM + per-partition sort elimination
Test E: Inferred ordering from Parquet metadata (no WITH ORDER) — single and multi partition

Integration Tests

Updated prefix match test for Exact pushdown behavior
All 919 core integration tests pass, all existing SLT tests pass

Test plan

cargo test -p datafusion-datasource (111 tests pass)
cargo test -p datafusion-datasource-parquet (96 tests pass)
cargo test -p datafusion-physical-optimizer (27 tests pass)
cargo test -p datafusion --test core_integration (919 tests pass)
cargo test -p datafusion all tests (1997+ pass)
SLT sort/order/topk tests pass
SLT window/union/joins tests pass (no regressions)
cargo clippy — 0 warnings
Benchmark runs and shows expected speedups

🤖 Generated with Claude Code

Sort files within each file group by min/max statistics during sort pushdown to better align with the requested ordering. When files are non-overlapping and within-file ordering is guaranteed (e.g. Parquet with sorting_columns metadata), the SortExec is completely eliminated. Key changes: - ParquetSource::try_pushdown_sort returns Exact when natural ordering satisfies the request, enabling sort elimination - FileScanConfig sorts files within groups by statistics and verifies non-overlapping property to determine Exact vs Inexact - Multi-group files are redistributed consecutively to preserve both sort elimination and I/O parallelism across partitions - Statistics-based file reordering as fallback when FileSource returns Unsupported (benefits TopK via better dynamic filter pruning) - New sort_pushdown benchmark for measuring sort elimination speedup Closes apache#17348 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

zhuqi-lucas · 2026-03-26T15:58:03Z

cc @alamb @adriangb — this implements the sort pushdown phase 2 from #17348. Would appreciate your review when you get a chance.

Copilot

Pull request overview

Implements statistics-driven file group ordering as part of sort pushdown, enabling sort elimination when within-file ordering matches and files are non-overlapping, plus a best-effort stats-based reorder fallback when exact pushdown isn’t possible.

Changes:

Add file-group reordering by min/max statistics (and non-overlap validation) to enable SortExec elimination for exactly ordered, non-overlapping files.
Extend Parquet sort pushdown to return Exact when Parquet ordering metadata satisfies the requested ordering.
Add/adjust SLT + Rust tests and a new benchmark to validate and measure the optimization.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
datafusion/sqllogictest/test_files/sort_pushdown.slt	Adds SLT coverage for stats reorder, sort elimination, LIMIT behavior, multi-partition behavior, and inferred ordering from Parquet metadata.
datafusion/physical-optimizer/src/pushdown_sort.rs	Updates module docs to reflect new capabilities (Exact elimination + stats-based ordering).
datafusion/datasource/src/file_scan_config.rs	Implements core stats-based reordering, non-overlap validation, “exact” preservation logic, and cross-group redistribution. Adds unit tests.
datafusion/datasource-parquet/src/source.rs	Returns `Exact` when Parquet natural ordering satisfies the requested sort.
datafusion/core/tests/physical_optimizer/pushdown_sort.rs	Updates a prefix-match test to reflect `Exact` pushdown / sort elimination behavior.
benchmarks/src/sort_pushdown.rs	Adds a benchmark to measure sort elimination and LIMIT benefits on sorted, non-overlapping parquet files.
benchmarks/src/lib.rs	Registers the new `sort_pushdown` benchmark module.
benchmarks/src/bin/dfbench.rs	Exposes `sort-pushdown` as a new `dfbench` subcommand.
benchmarks/bench.sh	Adds `bench.sh` targets to run the new sort pushdown benchmarks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

datafusion/datasource/src/file_scan_config.rs

benchmarks/src/sort_pushdown.rs

datafusion/sqllogictest/test_files/sort_pushdown.slt

datafusion/physical-optimizer/src/pushdown_sort.rs

… improve docs - Remove dead stats computation in reverse_file_groups branch (reverse path is always Inexact, so all_non_overlapping is unused) - Add reverse prefix matching documentation to pushdown_sort module Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

adriangb · 2026-03-26T20:12:26Z

Very exciting! I hope I have wifi on the plane later today so I can review.

Copilot AI review requested due to automatic review settings March 26, 2026 15:54

zhuqi-lucas mentioned this pull request Mar 26, 2026

[EPIC] Sort pushdown / partially sorted scans #17348

Open

zhuqi-lucas changed the title ~~feat: sort file groups by statistics during sort pushdown~~ feat: sort file groups by statistics during sort pushdown (Sort pushdown phase 2) Mar 26, 2026

Copilot AI reviewed Mar 26, 2026

View reviewed changes

zhuqi-lucas and others added 2 commits March 27, 2026 00:18

add test: reverse path preserves file order, does not apply stats sor…

aaae591

…ting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions bot added optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) datasource Changes to the datasource crate labels Mar 26, 2026

Copilot started reviewing on behalf of zhuqi-lucas March 26, 2026 16:32 View session

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: sort file groups by statistics during sort pushdown (Sort pushdown phase 2)#21182

feat: sort file groups by statistics during sort pushdown (Sort pushdown phase 2)#21182
zhuqi-lucas wants to merge 3 commits intoapache:mainfrom
zhuqi-lucas:feat/sort-file-groups-by-statistics

zhuqi-lucas commented Mar 26, 2026 •

edited

Loading

Uh oh!

zhuqi-lucas commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adriangb commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zhuqi-lucas commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Architecture

Three Optimization Paths

Multi-Partition Design

Automatic Ordering Inference

Files Changed

Benchmark Results

Tests

Unit Tests (12 new)

SLT Integration Tests (5 new groups)

Integration Tests

Test plan

Uh oh!

zhuqi-lucas commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adriangb commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhuqi-lucas commented Mar 26, 2026 •

edited

Loading