Refactor Perl_av_fetch, optimize path for non-negative key access #23994

bbrtj · 2025-12-09T16:09:14Z

This change refactors Perl_av_fetch function and moves the re-calculation of key, so that it happens only if the key is not found in range between 0 and max_index. Fetching existing non-negative elements should be much more common than the alternatives, and this change should reduce the number of instructions needed to achieve that.

I've run Porting/bench.pl script against all tests that start with expr::array, here are the average results:

       hacked  blead
       ------ ------
    Ir 100.00  97.63
    Dr 100.00 100.00
    Dw 100.00  97.14
  COND 100.00 101.30
   IND 100.00 100.00

COND_m 100.00 100.00
 IND_m 100.00  98.36

 Ir_m1 100.00 102.22
 Dr_m1 100.00 100.00
 Dw_m1 100.00 100.00

 Ir_mm 100.00 100.00
 Dr_mm 100.00 100.00
 Dw_mm 100.00 100.00

Tests which contributed to COND being higher were expr::array::pkg_1const_m1 and expr::array::lex_1const_m1 - not unexpected. Single L1 cache miss comes from expr::arrayhash::lex_3var test, and surprisingly, the pkg version of the same test did not miss the cache.

This set of changes does not require a perldelta entry.

leonerd

Looks good aside from a code formatting adjustment

av.c

iabyn · 2025-12-10T13:33:35Z

On Tue, Dec 09, 2025 at 08:09:40AM -0800, Bartosz Jarzyna wrote: This change refactors Perl_av_fetch function and moves the re-calculation of key, so that it happens only if the key is not found in range between 0 and max_index. Fetching existing non-negative elements should be much more common than the alternatives, and this change should reduce the number of instructions needed to achieve that.

You are undoing my change from 2017, v5.25.3-269-gf4d8be8b39, which optimised av_fetch() on the basis that CPU instructions are cheap, while branches are (potentially) expensive, especially if unpredictable. My change made av_fetch() do this: neg = (key < 0); size = AvFILLp(av) + 1; key += neg * size; /* handle negative index without using branch */ /* the cast from SSize_t to Size_t allows both (key < 0) and (key >= size) * to be tested as a single condition */ if ((Size_t)key >= (Size_t)size) { if (UNLIKELY(neg)) return NULL; goto emptiness; } At the cost of the two extra calculations for neg and key += ..., which are cheap, it makes all code where the key is in bounds (regardless of whether the key is negative or not) require only a single branch, and will always take that branch. So with my version of av_fetch(): 1) If you have a block of code where all the accesses to the array are of existing keys (regardless of the sign of individual keys), then only one condition is tested, and the branch is always the same (skip the "out of bounds" code), allowing for good branch prediction behaviour. 2) If you have a block of code where all the accesses to the array are for out-of-bounds keys, regardless of the sign of individual keys, - maybe you are populating the array - then two conditions are tested. The first branch is always the same (take the "out of bounds" code), allowing for good branch prediction behaviour. The second branch condition is "is the key negative" which may or may not have good branch prediction depending on whether keys all have the same sign. 3) If the block of code has a mixture of existing and out-of-bounds keys, then it will be a mixture of one and two conditions, with the first branch becoming unpredictable. With the proposed new scheme: 1) If you have a block of code where all the accesses to the array are of existing keys, then having all +ve keys means taking one predicable branch. Having all -ve keys involves three branch conditions, likely predicable. With a mixture of +ve and -ve keys the main branch becomes unpredictable. 2) If you have a block of code where all the accesses to the array are of out-of-bounds keys, then you test two or three conditions: the first branch being predictable, the other two depending on key sign. 3) If the block of code has a mixture of existing and out-of-bounds keys, then it will be a mixture of between one and three conditions, with the first branch becoming unpredictable. In summary, the proposed change saves a couple of calculations but often involves more branches and sometimes less predictability, especially for -ve keys.

bbrtj · 2025-12-10T13:54:47Z

I'm sorry, should've checked git blame to see the origin of this code.

Porting bench script showed up to 7.5% less data writes and up to 6% less instructions used in some cases. The basis for my change is assumption that negative key access is much less common, which could make this change worthwhile. Especially where it really matters (loop through indexes), I've never seen a loop that does that on negative keys. However, I am not very familiar with how the branch predictions work, and if randomly accessing a negative index of any array somewhere may cause branch prediction to fail and remove all the benefit.

If you believe in real-world scenarios this change gives no performance gain then we can scrap this PR, or I can remake it to only move the emptiness label to the end, where it probably is more readable (unless that can mess up branch prediction too).

bbrtj · 2025-12-10T19:25:51Z

@iabyn After thinking this through, I agree. Stable performance is good.

I changed the code to do things differently:

added some comments, especially the comment which marks early key recalculation as deliberate performance choice
reduced the number of branches even further by adjusting lval using multiplication
removed the need for a goto, so removed the label as well

I created this benchmark case which attempts to trigger all paths in av_fetch (may actually not trigger lval):

    'fetch_test' => {
        desc    => 'fetch test',
        setup   => 'my @a = 1..10;',
        code    => 'for (-9 .. 9) { $a[$_]; $a[$_ - 5]; $a[$_ + 5]; $a[$_] = 42; $a[$_+20] = 42; }',
    },

The results (new = what I commited now, hacked = what I had earlier):

       hacked   blead    new
       ------ ------- ------
    Ir 100.00   98.67  98.67
    Dr 100.00  100.00 100.00
    Dw 100.00   98.83  98.83
  COND 100.00  101.79 101.79
   IND 100.00  100.00 100.00

COND_m 100.00  135.64 169.14
 IND_m 100.00  100.00 100.00

 Ir_m1 100.00  100.00 100.00
 Dr_m1 100.00  100.00 100.00
 Dw_m1 100.00  100.00 100.00

 Ir_mm 100.00  100.00 100.00
 Dr_mm 100.00  100.00 100.00
 Dw_mm 100.00  100.00 100.00

So indeed blead had less conditional branch misses, but my after my changes it seems to have even less, while not worsening any other parameter compared to blead.

leonerd reviewed Dec 9, 2025

View reviewed changes

av.c Outdated Show resolved Hide resolved

bbrtj changed the title ~~Refactor Perl_av_fetch, optimize path for positive keys access~~ Refactor Perl_av_fetch, optimize path for non-negative key access Dec 9, 2025

bbrtj force-pushed the blead branch from 87d5c9c to eb443c9 Compare December 9, 2025 16:30

richardleach reviewed Dec 9, 2025

View reviewed changes

av.c Outdated Show resolved Hide resolved

Refactor Perl_av_fetch

59f1e48

bbrtj force-pushed the blead branch from d31998e to 59f1e48 Compare December 10, 2025 19:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor Perl_av_fetch, optimize path for non-negative key access #23994

Refactor Perl_av_fetch, optimize path for non-negative key access #23994

bbrtj commented Dec 9, 2025

Uh oh!

leonerd left a comment

Uh oh!

Uh oh!

Uh oh!

iabyn commented Dec 10, 2025 via email

Uh oh!

bbrtj commented Dec 10, 2025 •

edited

Loading

Uh oh!

bbrtj commented Dec 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Refactor Perl_av_fetch, optimize path for non-negative key access #23994

Are you sure you want to change the base?

Refactor Perl_av_fetch, optimize path for non-negative key access #23994

Conversation

bbrtj commented Dec 9, 2025

Uh oh!

leonerd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

iabyn commented Dec 10, 2025 via email

Uh oh!

bbrtj commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bbrtj commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bbrtj commented Dec 10, 2025 •

edited

Loading

bbrtj commented Dec 10, 2025 •

edited

Loading