-
Notifications
You must be signed in to change notification settings - Fork 602
Refactor Perl_av_fetch, optimize path for non-negative key access #23994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: blead
Are you sure you want to change the base?
Conversation
leonerd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good aside from a code formatting adjustment
|
On Tue, Dec 09, 2025 at 08:09:40AM -0800, Bartosz Jarzyna wrote:
This change refactors Perl_av_fetch function and moves the
re-calculation of key, so that it happens only if the key is not found
in range between 0 and max_index. Fetching existing non-negative
elements should be much more common than the alternatives, and this
change should reduce the number of instructions needed to achieve that.
You are undoing my change from 2017, v5.25.3-269-gf4d8be8b39, which
optimised av_fetch() on the basis that CPU instructions are cheap, while
branches are (potentially) expensive, especially if unpredictable.
My change made av_fetch() do this:
neg = (key < 0);
size = AvFILLp(av) + 1;
key += neg * size; /* handle negative index without using branch */
/* the cast from SSize_t to Size_t allows both (key < 0) and (key >= size)
* to be tested as a single condition */
if ((Size_t)key >= (Size_t)size) {
if (UNLIKELY(neg))
return NULL;
goto emptiness;
}
At the cost of the two extra calculations for neg and key += ..., which
are cheap, it makes all code where the key is in bounds (regardless of
whether the key is negative or not) require only a single branch, and will
always take that branch.
So with my version of av_fetch():
1) If you have a block of code where all the accesses to the array are of
existing keys (regardless of the sign of individual keys), then only one
condition is tested, and the branch is always the same (skip the "out of
bounds" code), allowing for good branch prediction behaviour.
2) If you have a block of code where all the accesses to the array are for
out-of-bounds keys, regardless of the sign of individual keys, - maybe you
are populating the array - then two conditions are tested. The first
branch is always the same (take the "out of bounds" code), allowing for
good branch prediction behaviour. The second branch condition is "is the
key negative" which may or may not have good branch prediction depending
on whether keys all have the same sign.
3) If the block of code has a mixture of existing and out-of-bounds keys,
then it will be a mixture of one and two conditions, with the first branch
becoming unpredictable.
With the proposed new scheme:
1) If you have a block of code where all the accesses to the array are of
existing keys, then having all +ve keys means taking one predicable
branch. Having all -ve keys involves three branch conditions, likely
predicable. With a mixture of +ve and -ve keys the main branch becomes
unpredictable.
2) If you have a block of code where all the accesses to the array are of
out-of-bounds keys, then you test two or three conditions: the first
branch being predictable, the other two depending on key sign.
3) If the block of code has a mixture of existing and out-of-bounds keys,
then it will be a mixture of between one and three conditions, with the
first branch becoming unpredictable.
In summary, the proposed change saves a couple of calculations but often
involves more branches and sometimes less predictability, especially for
-ve keys.
|
|
I'm sorry, should've checked git blame to see the origin of this code. Porting bench script showed up to 7.5% less data writes and up to 6% less instructions used in some cases. The basis for my change is assumption that negative key access is much less common, which could make this change worthwhile. Especially where it really matters (loop through indexes), I've never seen a loop that does that on negative keys. However, I am not very familiar with how the branch predictions work, and if randomly accessing a negative index of any array somewhere may cause branch prediction to fail and remove all the benefit. If you believe in real-world scenarios this change gives no performance gain then we can scrap this PR, or I can remake it to only move the |
|
@iabyn After thinking this through, I agree. Stable performance is good. I changed the code to do things differently:
I created this benchmark case which attempts to trigger all paths in av_fetch (may actually not trigger lval): The results (new = what I commited now, hacked = what I had earlier): So indeed blead had less conditional branch misses, but my after my changes it seems to have even less, while not worsening any other parameter compared to blead. |
This change refactors Perl_av_fetch function and moves the re-calculation of key, so that it happens only if the key is not found in range between 0 and max_index. Fetching existing non-negative elements should be much more common than the alternatives, and this change should reduce the number of instructions needed to achieve that.
I've run Porting/bench.pl script against all tests that start with
expr::array, here are the average results:Tests which contributed to
CONDbeing higher wereexpr::array::pkg_1const_m1andexpr::array::lex_1const_m1- not unexpected. Single L1 cache miss comes fromexpr::arrayhash::lex_3vartest, and surprisingly, the pkg version of the same test did not miss the cache.