Skip to content

Add benchmarks for Parquet struct leaf-level projection pruning#21180

Merged
adriangb merged 2 commits intoapache:mainfrom
pydantic:friendlymatthew/parquet-struct-bench
Mar 26, 2026
Merged

Add benchmarks for Parquet struct leaf-level projection pruning#21180
adriangb merged 2 commits intoapache:mainfrom
pydantic:friendlymatthew/parquet-struct-bench

Conversation

@friendlymatthew
Copy link
Contributor

Rationale for this change

This PR adds benchmarks that measure the perf of projecting individual fields from struct columns in Parquet files. #20925 introduced leaf-level projection masking so that select s['small_int'] on a struct with large string fields only reads the small integer leaf, skipping the expensive string decoding entirely

3 dataset shapes are coevered, each with ~262K rows of 8kb string payloads: a narrow struct (2 leaves), a wide struct (5 leaves), and a nested struct. Each shape benchmarks full-struct reads against single-field projections

@github-actions github-actions bot added the core Core DataFusion crate label Mar 26, 2026
Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh how I wish we could write SQL benchmarks easily... adding to #21165

@adriangb adriangb added this pull request to the merge queue Mar 26, 2026
Merged via the queue into apache:main with commit 1416ed4 Mar 26, 2026
33 checks passed
@adriangb adriangb deleted the friendlymatthew/parquet-struct-bench branch March 26, 2026 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants