-
Notifications
You must be signed in to change notification settings - Fork 2k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
When you run Q10 with predicate pushdown enabled (see #20324 for details of what that means) it goes more slowly:
SELECT "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM hits WHERE "MobilePhoneModel" <> '' GROUP BY "MobilePhoneModel" ORDER BY u DESC LIMIT 10;Specifically, on my test I see it going 30% slower
Benchmark clickbench_partitioned.json
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃ HEAD ┃ alamb_pushdown_and_arrow_58 ┃ Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 10 │ 333.30 ms │ 447.25 ms │ 1.34x slower │
You can repro it like this:
Repro Script
set datafusion.execution.parquet.binary_as_string = true; -- needed for ClickBench data
SET datafusion.execution.target_partitions = 1; -- set to 1 to reduce variability
create external table hits stored as parquet location '/home/ec2-user/datafusion/benchmarks/data/hits_partitioned';
-- Q10 (default configuration, no pushdown)
SELECT "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM hits WHERE "MobilePhoneModel" <> '' GROUP BY "MobilePhoneModel" ORDER BY u DESC LIMIT 10;
SELECT "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM hits WHERE "MobilePhoneModel" <> '' GROUP BY "MobilePhoneModel" ORDER BY u DESC LIMIT 10;
SELECT "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM hits WHERE "MobilePhoneModel" <> '' GROUP BY "MobilePhoneModel" ORDER BY u DESC LIMIT 10;
SELECT "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM hits WHERE "MobilePhoneModel" <> '' GROUP BY "MobilePhoneModel" ORDER BY u DESC LIMIT 10;
SELECT "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM hits WHERE "MobilePhoneModel" <> '' GROUP BY "MobilePhoneModel" ORDER BY u DESC LIMIT 10;
-- Q10 enable with pushdown enabled
SET datafusion.execution.parquet.pushdown_filters = true;
SELECT "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM hits WHERE "MobilePhoneModel" <> '' GROUP BY "MobilePhoneModel" ORDER BY u DESC LIMIT 10;
SELECT "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM hits WHERE "MobilePhoneModel" <> '' GROUP BY "MobilePhoneModel" ORDER BY u DESC LIMIT 10;
SELECT "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM hits WHERE "MobilePhoneModel" <> '' GROUP BY "MobilePhoneModel" ORDER BY u DESC LIMIT 10;
SELECT "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM hits WHERE "MobilePhoneModel" <> '' GROUP BY "MobilePhoneModel" ORDER BY u DESC LIMIT 10;
SELECT "MobilePhoneModel", COUNT(DISTINCT "UserID") AS u FROM hits WHERE "MobilePhoneModel" <> '' GROUP BY "MobilePhoneModel" ORDER BY u DESC LIMIT 10;When I run this with datafusion-cli -f /tmp/q.sql | grep Elapsed you can clearly see
Elapsed 0.001 seconds.
Elapsed 0.000 seconds.
Elapsed 0.063 seconds.
Elapsed 1.419 seconds. <-- First query run
Elapsed 1.432 seconds.
Elapsed 1.387 seconds.
Elapsed 1.399 seconds.
Elapsed 1.388 seconds.
Elapsed 0.000 seconds. <---- turn on filter pushdown
Elapsed 1.690 seconds. <-- now the query runs 30% slower
Elapsed 1.671 seconds.
Elapsed 1.695 seconds.
Elapsed 1.695 seconds.
Elapsed 1.718 seconds.Describe the solution you'd like
I want there to be no slowdown when I run Q10
Describe alternatives you've considered
No response
Additional context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request