chore: Add TPC-* queries to repo by andygrove · Pull Request #3562 · apache/datafusion-comet

andygrove · 2026-02-21T00:44:38Z

Which issue does this PR close?

N/A

Rationale for this change

The benchmark scripts in benchmarks/tpc currently require the user to provide the queries. It is more convenient to add them to the repository.

What changes are included in this PR?

Add query files. These are copied from datafusion-benchmarks repo.

How are these changes tested?

mbutrovich

Thanks @andygrove!

mbutrovich · 2026-02-23T15:48:56Z

benchmarks/tpc/queries/tpcds/q1.sql

@@ -0,0 +1,26 @@
+-- CometBench-DS query 1 derived from TPC-DS query 1 under the terms of the TPC Fair Use Policy.
+-- TPC-DS queries are Copyright 2021 Transaction Processing Performance Council.
+-- This query was generated at scale factor 1.


How hard is it to parameterize this in the future? I wonder what values change, considering we usually run SF100 or 1000.

we could try regenerating at different scale factors and do a diff

comphead

Thanks @andygrove one question though.
One day we investigated with @mbutrovich efficiency of pregenerated queries and that time we got 40% of TPCDS queries returning no results which might affect benchmarks.
We managed to improve the set to have on 18% of such queries.

For this TPC* set how many of them return 0 rows?

andygrove · 2026-02-23T16:22:06Z

Thanks @andygrove one question though. One day we investigated with @mbutrovich efficiency of pregenerated queries and that time we got 40% of TPCDS queries returning no results which might affect benchmarks. We managed to improve the set to have on 18% of such queries.

For this TPC* set how many of them return 0 rows?

I don't know. The goal for this PR is just to move them from datafusion-benchmarks to this repo so that we can include them in docker images for docker-compose and k8s without having dependency on another repo.

when I do the next benchmark run I will record how many rows are returned

andygrove · 2026-02-23T16:22:27Z

Thanks for the reviews @mbutrovich @comphead. I'll have the next PR up to day to add support for docker-compose.

andygrove · 2026-02-24T14:28:38Z

Thanks @andygrove one question though. One day we investigated with @mbutrovich efficiency of pregenerated queries and that time we got 40% of TPCDS queries returning no results which might affect benchmarks. We managed to improve the set to have on 18% of such queries.
For this TPC* set how many of them return 0 rows?

Thanks @andygrove one question though. One day we investigated with @mbutrovich efficiency of pregenerated queries and that time we got 40% of TPCDS queries returning no results which might affect benchmarks. We managed to improve the set to have on 18% of such queries.
For this TPC* set how many of them return 0 rows?

I don't know. The goal for this PR is just to move them from datafusion-benchmarks to this repo so that we can include them in docker images for docker-compose and k8s without having dependency on another repo.

when I do the next benchmark run I will record how many rows are returned

@comphead I created #3582 to start recording row counts and result hashes when running benchmarks

andygrove added 2 commits February 20, 2026 17:37

add queries

fa09e0c

Add benchmark queries

b7f493c

andygrove requested review from comphead, mbutrovich and parthchandra February 21, 2026 00:45

mbutrovich approved these changes Feb 23, 2026

View reviewed changes

comphead approved these changes Feb 23, 2026

View reviewed changes

andygrove merged commit d2e3c26 into apache:main Feb 23, 2026
112 checks passed

andygrove deleted the bundle-tpc-queries branch February 23, 2026 16:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

chore: Add TPC-* queries to repo#3562

chore: Add TPC-* queries to repo#3562
andygrove merged 2 commits intoapache:mainfrom
andygrove:bundle-tpc-queries

andygrove commented Feb 21, 2026 •

edited

Loading

Uh oh!

mbutrovich left a comment

Uh oh!

mbutrovich Feb 23, 2026

Uh oh!

andygrove Feb 23, 2026

Uh oh!

comphead left a comment

Uh oh!

andygrove commented Feb 23, 2026

Uh oh!

andygrove commented Feb 23, 2026

Uh oh!

Uh oh!

andygrove commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

andygrove commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

mbutrovich left a comment

Choose a reason for hiding this comment

Uh oh!

mbutrovich Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

andygrove Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

andygrove commented Feb 23, 2026

Uh oh!

andygrove commented Feb 23, 2026

Uh oh!

Uh oh!

andygrove commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andygrove commented Feb 21, 2026 •

edited

Loading