A focused C++ microbenchmark repository for cache, memory, ILP, synchronization, queue, and allocator behavior.
- Build intuition for cache hierarchy and memory access patterns
- Compare synchronization and data-structure tradeoffs with reproducible microbenchmarks
- Produce evidence-based performance notes from stable runs
- CMake >= 3.20
- A C++20 compiler (clang++ or g++)
- Git + internet access (for fetching
google/benchmark)
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -jscripts/run_all.shStandard pattern:
./build/benchmark/<binary_name> --benchmark_min_time=0.3sExamples:
./build/benchmark/bm_stride_access --benchmark_min_time=0.3s
./build/benchmark/bm_cache_levels --benchmark_min_time=0.3sQueue tuned run:
./build/benchmark/bm_queue \
--benchmark_filter='BM_Queue(MutexTransfer/batch:64/backoff:0|SpscRingTransfer/batch:8/backoff:0)$' \
--benchmark_min_time=1s \
--benchmark_repetitions=10 \
--benchmark_report_aggregates_only=truebm_stride_access: locality loss from larger access stridebm_pointer_chasing: sequential access vs irregular pointer traversalbm_false_sharing: adjacent counters vs cache-line-padded countersbm_aos_vs_soa: layout sensitivity for dense vs sparse field usagebm_mutex_vs_atomic: contention scaling for shared counter updatesbm_cache_levels: throughput drop as working set crosses cache levelsbm_ilp: dependent vs independent instruction streamsbm_cache_associativity: friendly stride vs conflict-prone stridebm_queue: mutex queue vs tuned SPSC ring transferbm_memory_pool:new/deletevs locked pool vs thread-local pool
results-summary.md: current run summary and conclusions
Generate figures from benchmark runs:
python3 scripts/generate_plots.py

