Skip to content

Optimize RowNumberReader to be 8x faster#9680

Open
Samyak2 wants to merge 1 commit intoapache:mainfrom
Samyak2:row-number-reader-optimize
Open

Optimize RowNumberReader to be 8x faster#9680
Samyak2 wants to merge 1 commit intoapache:mainfrom
Samyak2:row-number-reader-optimize

Conversation

@Samyak2
Copy link
Copy Markdown
Contributor

@Samyak2 Samyak2 commented Apr 9, 2026

Which issue does this PR close?

  • Closes None

Rationale for this change

We internally found RowNumberReader to be a hot path in some of our queries. Flamegraphs showed ~70% of the cpu time taken by methods in RowNumberReader.

These can be made an order of magnitude faster (benchmarks below).

What changes are included in this PR?

  • Instead of storing an iterator over individual row numbers, we now store a vec of ranges.
    • These ranges are not materialized into a fully array until needed.
  • read_records was previously linear in terms of number of rows read.
    • Now it's close to constant since one batch (8192 rows) usually is satisfied by one row range (which comes from a row group).
    • Same for skip_records
  • consume_batch is still linear in terms of rows, but it is faster since it can pre-allocate the output vec.
    • Previously, the Flatten iter would have prevented it pre-allocating (it's not an ExactSizeIterator).

Are these changes tested?

Before:

Benchmarking row_number_read_consume: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.0s, enable flat sampling, or reduce sample count to 50.
row_number_read_consume time:   [1.3915 ms 1.3967 ms 1.4035 ms]
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  5 (5.00%) high mild
  4 (4.00%) high severe

row_number_skip_and_read
                        time:   [716.61 µs 718.14 µs 719.91 µs]
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

After:

row_number_read_consume time:   [159.00 µs 160.81 µs 162.68 µs]
                        change: [−88.900% −88.721% −88.505%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild

row_number_skip_and_read
                        time:   [79.057 µs 79.924 µs 80.846 µs]
                        change: [−89.025% −88.865% −88.712%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Ranging from 8.6x to 8.9x faster!

Are there any user-facing changes?

No

- Instead of storing an iterator over individual row numbers, we now store a vec of ranges.
    - These ranges are not materialized into a fully array until needed.
- `read_records` was previously linear in terms of number of rows read.
  - Now it's close to constant since one batch (8192 rows) usually is satisfied by one row range (which comes from a row group).
  - Same for `skip_records`
- `consume_batch` is still linear in terms of rows, but it is faster since it can pre-allocate the output vec.
  - Previously, the `Flatten` iter would have prevented it pre-allocating (it's not an `ExactSizeIterator`).

I do not have micro-benchmark numbers for this change, but I have noticed a 3x improvement in execution time for an internal query.
@github-actions github-actions bot added the parquet Changes to the parquet crate label Apr 9, 2026
@Samyak2 Samyak2 changed the title perf: optimize RowNumberReader Optimize RowNumberReader to be 8x faster Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant