chore: Add TPC-* queries to repo#3562
Conversation
| @@ -0,0 +1,26 @@ | |||
| -- CometBench-DS query 1 derived from TPC-DS query 1 under the terms of the TPC Fair Use Policy. | |||
| -- TPC-DS queries are Copyright 2021 Transaction Processing Performance Council. | |||
| -- This query was generated at scale factor 1. | |||
There was a problem hiding this comment.
How hard is it to parameterize this in the future? I wonder what values change, considering we usually run SF100 or 1000.
There was a problem hiding this comment.
we could try regenerating at different scale factors and do a diff
comphead
left a comment
There was a problem hiding this comment.
Thanks @andygrove one question though.
One day we investigated with @mbutrovich efficiency of pregenerated queries and that time we got 40% of TPCDS queries returning no results which might affect benchmarks.
We managed to improve the set to have on 18% of such queries.
For this TPC* set how many of them return 0 rows?
I don't know. The goal for this PR is just to move them from when I do the next benchmark run I will record how many rows are returned |
|
Thanks for the reviews @mbutrovich @comphead. I'll have the next PR up to day to add support for docker-compose. |
@comphead I created #3582 to start recording row counts and result hashes when running benchmarks |
Which issue does this PR close?
N/A
Rationale for this change
The benchmark scripts in
benchmarks/tpccurrently require the user to provide the queries. It is more convenient to add them to the repository.What changes are included in this PR?
Add query files. These are copied from
datafusion-benchmarksrepo.How are these changes tested?