Improve take_bytes perf in the null cases between 10-25%#9625
Improve take_bytes perf in the null cases between 10-25%#9625AdamGS wants to merge 3 commits intoapache:mainfrom
take_bytes perf in the null cases between 10-25%#9625Conversation
a41e896 to
57cacc4
Compare
|
Running the |
|
run benchmark take_kernels |
arrow-select/src/take.rs
Outdated
| T::Offset::from_usize(capacity) | ||
| .ok_or_else(|| ArrowError::OffsetOverflowError(capacity))?, | ||
| ); | ||
| ranges.push((start, end)); |
There was a problem hiding this comment.
Won't for this case be faster to just keep the previous implementation?
There was a problem hiding this comment.
we can use the unsafe if it helps
There was a problem hiding this comment.
yeah it seems to be more consistently ~5% faster locally, I've used unsafe in what I think is a sound way
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing adamg/faster-take-bytes (57cacc4) to aa9432c (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
b6143ff to
089c9a9
Compare
|
run benchmark take_kernels |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing adamg/faster-take-bytes (089c9a9) to aa9432c (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
vs 089c9a9 Earlier one looks slightly better, I'll run once more |
|
run benchmark take_kernels |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing adamg/faster-take-bytes (089c9a9) to aa9432c (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
arrow-select/src/take.rs
Outdated
| capacity: usize, | ||
| values: &mut Vec<u8>, | ||
| ) { | ||
| values.reserve(capacity); |
There was a problem hiding this comment.
What is the expected behavior if the vec is non-empty? Currently its a bit inconsistent in reserving additional capacity, but then overwriting any existing content. I think appending might make sense, which means the assert and dst pointer below should be based on Vec::spare_capacity_mut.
There was a problem hiding this comment.
yeah this might be a bit confusing, this being a function is a remanent of a previous iteration but I might just inline it which I think makes it clearer.
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
|
I've made both branches use |
|
run benchmark take_kernels |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing adamg/faster-take-bytes (1b3d1fa) to aa9432c (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
regressions in "take stringview" might be significant, I'll look into it. |
|
Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is ready for another look |
Which issue does this PR close?
Rationale for this change
Just improves performance, I was profiling some things downstream and got curious about how it works.
What changes are included in this PR?
The main idea is to use a two-pass approach:
copy_byte_ranges)This PR also reduces the branching from 4 (one for each nullability combination) to only two.
Are these changes tested?
Existing tests
Are there any user-facing changes?
None