Skip to content

perf(arena3): inline size_class_index_for in allocation hot path#40

Open
mrhapile wants to merge 2 commits intoboa-dev:mainfrom
mrhapile:perf/arena3-inline-size-class
Open

perf(arena3): inline size_class_index_for in allocation hot path#40
mrhapile wants to merge 2 commits intoboa-dev:mainfrom
mrhapile:perf/arena3-inline-size-class

Conversation

@mrhapile
Copy link
Contributor

@mrhapile mrhapile commented Mar 6, 2026

perf(arena3): inline size_class_index_for to remove allocation hot-path call

Root cause analysis showed that try_alloc() performed a function call
to size_class_index_for on every allocation.

Inlining the function allows LLVM to constant-fold the size class for
monomorphized types, eliminating the call overhead.

Criterion benchmarks:

Pure bump allocation:
before: ~4.33 µs
after: ~3.34 µs
improvement: ~6–10%

Screenshot 2026-03-07 at 5 04 55 AM

Bulk allocation:
no statistically significant change

Free-list reuse:
no statistically significant change

…th call

Root cause analysis showed that try_alloc() performed a function call
to size_class_index_for on every allocation.

Inlining the function allows LLVM to constant-fold the size class for
monomorphized types, eliminating the call overhead.

Criterion benchmarks:

Pure bump allocation:
  before: ~4.33 µs
  after:  ~3.34 µs
  improvement: ~6–10%

Bulk allocation:
  no statistically significant change

Free-list reuse:
  no statistically significant change

Signed-off-by: mrhapile <allinonegaming3456@gmail.com>
Copilot AI review requested due to automatic review settings March 6, 2026 23:43
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes arena3 allocation hot path by forcing inlining of the size-class lookup helper, and adds a Criterion benchmark to quantify allocation scenarios.

Changes:

  • Add #[inline(always)] to size_class_index_for to reduce per-allocation call overhead and enable better constant-folding.
  • Add a new Criterion benchmark (arena3_alloc) covering bump-style allocation, bulk allocation, and free-list reuse scenarios.
  • Register the new benchmark target in oscars/Cargo.toml.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
oscars/src/alloc/arena3/mod.rs Forces inlining of size-class index computation used by try_alloc.
oscars/benches/arena3_alloc.rs Introduces new Criterion benchmark scenarios for arena3 allocation behavior.
oscars/Cargo.toml Adds a [[bench]] entry to enable running arena3_alloc with Criterion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +25 to +31
let mut alloc = ArenaAllocator::default().with_arena_size(1024 * 1024 * 10);
b.iter(|| {
for i in 0..1000 {
let ptr = alloc.try_alloc(bb(i)).unwrap();
bb(ptr);
}
});
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Scenario B, the allocator is created outside b.iter, so each iteration keeps allocating additional objects into the same ArenaAllocator. Over many Criterion iterations this can continuously grow memory usage (and may eventually OOM / change behavior once new pools are allocated), which will skew results and make the benchmark unstable. Consider using iter_batched/iter_batched_ref to construct a fresh allocator per measured iteration, or otherwise reset/free between iterations so each sample measures the same fixed workload.

Suggested change
let mut alloc = ArenaAllocator::default().with_arena_size(1024 * 1024 * 10);
b.iter(|| {
for i in 0..1000 {
let ptr = alloc.try_alloc(bb(i)).unwrap();
bb(ptr);
}
});
b.iter_batched(
|| {
ArenaAllocator::default().with_arena_size(1024 * 1024 * 10)
},
|mut alloc| {
for i in 0..1000 {
let ptr = alloc.try_alloc(bb(i)).unwrap();
bb(ptr);
}
},
criterion::BatchSize::SmallInput,
);

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +3
use criterion::{criterion_group, criterion_main, Criterion};
use oscars::alloc::arena3::ArenaAllocator;
use std::hint::black_box as bb;
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This benchmark uses std::hint::black_box (aliased as bb), but the other Criterion benches in this repo import and use criterion::black_box (e.g. oscars/benches/arena2_vs_arena3.rs:1 and oscars/benches/oscars_vs_boa_gc.rs:1). To stay consistent across benches, prefer using Criterion’s black_box here too.

Suggested change
use criterion::{criterion_group, criterion_main, Criterion};
use oscars::alloc::arena3::ArenaAllocator;
use std::hint::black_box as bb;
use criterion::{black_box as bb, criterion_group, criterion_main, Criterion};
use oscars::alloc::arena3::ArenaAllocator;

Copilot uses AI. Check for mistakes.
Signed-off-by: mrhapile <allinonegaming3456@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants