perf: introduce MARF benchmarks and tooling by cylewitruk-stacks · Pull Request #6932 · stacks-network/stacks-core

cylewitruk-stacks · 2026-02-24T13:21:59Z

Description

This PR introduces a benchmark suite covering MARF reads/writes & some primitives.

No production code modifications; only new benchmarks and tooling.

Implemented in two layers:

stackslib/benches/marf/* - a bench harness which can be used for configurable one-shot benchmarks (run via cargo bench -p stackslib --bench marf -- <command>).
- These use a custom CountingAllocator to intercept and measure heap allocations.
- Intended to be run using e.g. the below marf-bench utility, but can be run standalone.
- Supports TSV output for machine parsing (which marf-bench uses to collect output).
contrib/marf-bench/ - a benchmark runner harness which provides a more friendly clap-based CLI for running the above benchmarks.
- Can compare between base and target git revisions (commit, tag, branch, etc.) via the bench command.
- It will automate the creation of git worktrees, overlay the above benchmark files onto the selected base and target worktrees, build and run the benchmarks within each worktree separately (avoiding any side-effects on your current working tree/index/branch).
- See the README for more detailed info on its use.

Applicable issues

Related to core epic #118 (benchmarking in CI) and various 100x tickets (for measurement)

Additional info (benefits, drawbacks, caveats)

Note: ~~This was written before the new TrieNodePatch node type was introduced, so it does not currently have any benchmarks for patch nodes.~~ Added preliminary patch node benchmarks in cylewitruk-stacks#1.

Example: Running marf-bench bench read --repeats 15 --base ... --target ... will give you a summarized table similar to:

[marf-bench] Comparison summary
values: base:merge-base(origin/develop) / target:current-working-tree / %delta
benchmark  name                                  total(ms) b/t       Δ  alloc_count b/t        Δ    alloc_bytes b/t        Δ
----------------------------------------------------------------------------------------------------------------------------
read       node256/depth=1024/variant=get    8824.528/7951.251   -9.9%       11700000/0  -100.0%      17000100000/0  -100.0%
read       node256/depth=128/variant=get     8131.712/7561.920   -7.0%       9900015/21  -100.0%  13849234134/38340  -100.0%
read       node256/depth=512/variant=get     8418.888/7680.533   -8.8%       10800006/6  -100.0%  16866016971/19416  -100.0%
read       node256/depth=8192/variant=get    8929.387/7950.491  -11.0%       11700000/0  -100.0%      17000100000/0  -100.0%
read       noop/depth=1024/variant=get     15194.902/13760.344   -9.4%       15300000/0  -100.0%      27182700000/0  -100.0%
read       noop/depth=128/variant=get      14334.103/13071.203   -8.8%       13500003/9  -100.0%   24031800192/1044  -100.0%
read       noop/depth=512/variant=get      14804.237/13426.497   -9.3%       14400000/0  -100.0%    27048600000/768  -100.0%
read       noop/depth=8192/variant=get     15231.926/13896.007   -8.8%       15300000/0  -100.0%      27182700000/0  -100.0%
----------------------------------------------------------------------------------------------------------------------------

[marf-bench] Repeated comparison stats
values: base:merge-base(origin/develop) / target:current-working-tree / median/min/max %delta across 15 repeats
benchmark  name                             total Δ med   total Δ min   total Δ max   count Δ med   bytes Δ med  repeats
------------------------------------------------------------------------------------------------------------------------
read       node256/depth=1024/variant=get        -10.8%        -13.7%         -7.4%       -100.0%       -100.0%       15
read       node256/depth=128/variant=get          -7.2%         -9.7%         -2.9%       -100.0%       -100.0%       15
read       node256/depth=512/variant=get         -10.0%        -13.0%         -6.3%       -100.0%       -100.0%       15
read       node256/depth=8192/variant=get        -10.9%        -13.3%         -7.1%       -100.0%       -100.0%       15
read       noop/depth=1024/variant=get           -10.1%        -12.6%         -8.3%       -100.0%       -100.0%       15
read       noop/depth=128/variant=get             -8.5%        -10.9%         -5.8%       -100.0%       -100.0%       15
read       noop/depth=512/variant=get             -9.3%        -12.1%         -7.4%       -100.0%       -100.0%       15
read       noop/depth=8192/variant=get           -10.0%        -11.8%         -7.4%       -100.0%       -100.0%       15
------------------------------------------------------------------------------------------------------------------------

[marf-bench] Repeat confidence summary
values: base:merge-base(origin/develop) / target:current-working-tree / total_ms stability across 15 repeats
rows: total=8 stable=8 high-jitter=0 (high-jitter means min<0<max and spread>=30.0%)
high-jitter rows: none

Checklist

Test coverage for new or modified code paths
Changelog is updated

Copilot

Pull request overview

This PR adds a MARF-focused benchmarking suite to stackslib (read/write workflows + primitives) and introduces a contrib/marf-bench CLI tool to run/compare those benches across git revisions using worktrees.

Changes:

Add a custom cargo bench harness under stackslib/benches/marf/* with allocation counting + TSV-like summary output.
Add contrib/marf-bench CLI to run the harness in the current tree or compare base/target revisions via temporary worktrees.
Consolidate some CLI/tooling dependencies under workspace deps and wire up a cargo marf-bench alias.

Reviewed changes

Copilot reviewed 26 out of 27 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
stackslib/benches/marf/allocator.rs	Global counting allocator wrapper (jemalloc/system) for bench allocation metrics
stackslib/benches/marf/common.rs	Shared measurement helpers + SQLite WAL checkpoint configuration parsing
stackslib/benches/marf/main.rs	Bench entrypoint dispatching `primitives`/`read`/`write` subcommands and printing unified summaries
stackslib/benches/marf/primitives.rs	Microbenchmarks for trie node encode/decode and storage/trie primitives
stackslib/benches/marf/read.rs	Read-heavy MARF `get` / `get_with_proof` benchmarks with configurable fixture + depths
stackslib/benches/marf/utils.rs	Benchmark utilities (env parsing, deterministic IDs/hashes, small trie helpers)
stackslib/benches/marf/write.rs	Write workflow benchmark focused on promotion-driving inserts + configurable updates
stackslib/Cargo.toml	Adds the `marf` bench target and dev-deps needed for bench compilation
Cargo.toml	Adds workspace deps for tooling/benchmarking and includes `contrib/marf-bench` as a workspace member
Cargo.lock	Locks new workspace/tooling dependencies introduced by the new CLI + workspace dependency additions
.cargo/config.toml	Adds `cargo marf-bench` alias and includes `marf-bench` in the clippy alias package set
contrib/marf-bench/Cargo.toml	Defines the new `marf-bench` binary crate
contrib/marf-bench/README.md	Documents CLI usage, worktree behavior, and cleanup/recovery
contrib/marf-bench/src/main.rs	`clap` CLI definition + ctrl-c/panic cleanup hooks
contrib/marf-bench/src/git.rs	Git helpers (revision verification, merge-base resolution, staged snapshot commit creation)
contrib/marf-bench/src/runner.rs	Worktree orchestration, bench overlay, build, execution, and cleanup logic
contrib/marf-bench/src/report.rs	Formatting/printing for single runs, comparisons, and repeated-run jitter stats
contrib/marf-bench/src/util.rs	Small helpers (command runner, parsing summary lines, sorting, percent deltas)
contrib/marf-bench/src/commands/mod.rs	Command module wiring
contrib/marf-bench/src/commands/run.rs	`run` subcommand implementation (current tree only)
contrib/marf-bench/src/commands/bench.rs	`bench` subcommand implementation (base/target comparisons + repeats)
contrib/marf-bench/src/commands/bench_target.rs	Bench target subcommands + CLI-to-env forwarding
contrib/marf-bench/src/commands/clean.rs	Cleanup subcommand for stale worktrees, cached roots, and orphan temp dirs
contrib/clarity-cli/Cargo.toml	Switches `clap` to the workspace dependency
contrib/stacks-inspect/Cargo.toml	Switches `clap` to the workspace dependency
contrib/tools/config-docs-generator/Cargo.toml	Switches `clap` to the workspace dependency
stacks-signer/Cargo.toml	Adds a TODO comment about switching `clap` to a workspace dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

codecov · 2026-02-24T14:18:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.94%. Comparing base (0317850) to head (e1651b2).

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #6932      +/-   ##
===========================================
- Coverage    72.67%   69.94%   -2.74%     
===========================================
  Files          411      412       +1     
  Lines       221663   217940    -3723     
  Branches         0      338     +338     
===========================================
- Hits        161086   152429    -8657     
- Misses       60577    65511    +4934

see 290 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0317850...e1651b2. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…arks-and-tooling

…tch-support-to-benchmarks

…ort-to-benchmarks perf: add `TrieNodePatch` support to MARF benchmarks and the `marf-bench` harness

jcnelson · 2026-03-20T15:09:47Z

+
+    /// Set CACHE_STRATEGIES as comma-separated values (for example: noop,node256).
+    #[arg(long)]
+    cache_strategies: Option<String>,


Is this a no-op now?

I haven't PR'd the removal yet, but yeah it will be. This branch is old, but I could just remove this now and always use noop until the actual removal lands.

Replaced cache strategies with compression on/off during some read bench refactoring in 9476967

jcnelson · 2026-03-20T15:33:21Z

+
+    /// Set SQLITE_WAL_CHECKPOINT_MODE for post-setup checkpoint when SQLITE_WAL_AUTOCHECKPOINT=0 (PASSIVE|FULL|RESTART|TRUNCATE).
+    #[arg(long)]
+    sqlite_wal_checkpoint_mode: Option<String>,


Does changing the checkpoint mode affect anything, assuming that the number of pages before checkpoint matches or exceeds the size of the (single) transaction to be committed to record the hash and location of the serialized trie blob?

A transaction on the MARF database should only happen when a trie is added. For MARFs where the tries are stored externally in a .blobs file, the size of this transaction is a small O(1).

So, SQLite in WAL mode places all modifications in the WAL until a checkpoint op finally merges them into the main db file, which by default is automatically every 1000 pages (~4MB); so the WAL can contain many transactions when that occurs.

So, RESTART is likely the best checkpointing mode for the long-running nature of the node, so that it reuses the existing space instead of new disk allocations+grows after every checkpoint.

The more interesting setting, though, is the auto-checkpointing, which will stop and perform a merge of those 1000 pages as soon as that boundary is reached, regardless of what the node is doing (excluding ongoing db transactions, ofc). Disabling auto-checkpointing altogether for the hot db's and manually triggering checkpointing after block processing would ensure that checkpointing can't happen in the middle of transaction processing, keeping that cost off the "hot" path (particularly for blocks doing a lot of work (writes), which may incur multiple checkpoints during processing).

Reference: https://sqlite.org/wal.html

P.S. I haven't explicitly traced the transaction scopes/graph yet, this was just something I was interested in testing and easy to add, so that's why it's here. I was primarily thinking of the Clarity MARF'd db which has a number of smaller writes during block processing iirc.

Disabling auto-checkpointing altogether for the hot db's and manually triggering checkpointing after block processing would ensure that checkpointing can't happen in the middle of transaction processing, keeping that cost off the "hot" path (particularly for blocks doing a lot of work (writes), which may incur multiple checkpoints during processing).

I think it's a question of when we want to pay for merging the WAL back into the DB -- at the end of block-processing, or at the end of lots of blocks. IIRC all of the MARF-indexed side stores commit everything in a single transaction -- i.e. it's not the case that we have multiple transaction-commits (or even explicit checkpoints) during block-processing. The act of processing one (Stacks) block is the act of performing at most one transaction-commit per MARF'ed DB. I guess the best-possible case is if we somehow manage to avoid internal fragmentation -- either way, those pages are going to be copied in, so we might as well try to ensure that as many disk blocks of those pages represent Stacks data.

I'd be curious to know what kinds of performance characteristics you see when you vary the WAL behavior?

…arks-and-tooling

… defaults

federico-stacks · 2026-03-25T13:14:04Z

+    record_case(
+        &mut summary,
+        "trie_read_root_with_hash",
+        output_mode,
+        || {
+            let mut fixture = make_trie_fixture("noop");
+            let mut conn = fixture.store.connection();
+            conn.open_block(&fixture.tip)
+                .expect("failed to open trie fixture tip");
+            for _ in 0..iters {
+                black_box(Trie::read_root(&mut conn).expect("read_root failed"));
+            }
+        },
+    );


qq: in record cases like this one, make_trie_fixture / make_storage_fixture and open_block, are called inside the measured closure, so we're including fixture setup cost in the timing and allocation counts. Is that desired?

federico-stacks · 2026-03-25T13:19:08Z

+    unsafe fn realloc(&self, ptr: *mut u8, layout: Layout, new_size: usize) -> *mut u8 {
+        REALLOC_CALLS.fetch_add(1, Ordering::Relaxed);
+        DEALLOC_BYTES.fetch_add(layout.size() as u64, Ordering::Relaxed);
+        ALLOC_BYTES.fetch_add(new_size as u64, Ordering::Relaxed);
+        unsafe { ALLOCATOR.realloc(ptr, layout, new_size) }
+    }


nit: realloc adds new_size to ALLOC_BYTES but doesn't increment ALLOC_CALLS (only REALLOC_CALLS). Since the summary output only surfaces alloc_count and alloc_bytes (realloc_calls is raw-mode only), the two visible metrics are inconsistent when reallocs are frequent.

federico-stacks · 2026-03-25T13:31:15Z

+pub fn pct(base: f64, target: f64) -> f64 {
+    if base == 0.0 {
+        return 0.0;
+    }
+    ((target - base) * 100.0) / base
+}


nit: pct() returns 0.0 when base == 0.0, which would show +0.0% in the comparison table if a metric goes from 0 to nonzero (e.g., a change that introduces allocations where there were none). Maybe not a big issue for our use cases, but worth mentioning.

federico-stacks · 2026-03-25T13:39:51Z

+}
+
+/// Parse `summary` TSV lines emitted by the marf benchmark harness into typed rows.
+pub fn extract_summary_lines(text: &str) -> Vec<SummaryRow> {


minor: is tightly coupled to harness output format. Possible changes to that format could silenty produce empty results with no error evidence.

dhaney-stacks · 2026-04-17T18:05:31Z

@cylewitruk-stacks please remove @jcnelson as reviewer and add @aaronb-stacks or @brice-stacks

cylewitruk-stacks · 2026-04-17T18:22:00Z

@dhaney-stacks jude is a participant via existing comments, so I don't seem to be able to remove him.

introduce marf benchmarks (stackslib/benches + contrib/marf-bench)

192618f

cylewitruk-stacks requested a review from Copilot February 24, 2026 13:21

Copilot started reviewing on behalf of cylewitruk-stacks February 24, 2026 13:22 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

cylewitruk-stacks added 18 commits February 24, 2026 15:49

address copilot pr nits

2423bfd

address additional copilot pr findings

9f01091

address more copilot pr nits

db286c5

address windows path manipulation compatibility pr nits from copilot

b001a5f

Merge remote-tracking branch 'upstream/develop' into perf/marf-benchm…

fcc5b11

…arks-and-tooling

handle new patch node type

d82fa0c

wip

c668324

marf-bench cleanup

f0e2993

Merge branch 'perf/marf-benchmarks-and-tooling' into perf/add-marf-pa…

1702ae4

…tch-support-to-benchmarks

cleanup and add support for repeats+reporting for 'marf-bench run'

07ae762

merge perf/marf-benchmarks-and-tooling

fa47a6c

a little cleanup

a5eef1c

simplify patch bench into existing marf bench/harness

2cf465d

address copilot PR comments

4c06647

remove redundant help-args check in benchmarks

95ebc34

fix PR issues found by copilot

2e34a6b

Merge pull request #1 from cylewitruk-stacks/perf/add-marf-patch-supp…

d9090da

…ort-to-benchmarks perf: add `TrieNodePatch` support to MARF benchmarks and the `marf-bench` harness

chore: trigger CI

e1651b2

cylewitruk-stacks mentioned this pull request Feb 26, 2026

perf: eliminate redundant node clones during Trie::recalculate_root_hash() #6935

Merged

2 tasks

cylewitruk-stacks added 2 commits February 27, 2026 12:40

fix intermittent issue with worktree registration checks on macos

d04be88

separate high vs. low/no jitter rows in different report sections

f81a84c

cylewitruk-stacks mentioned this pull request Feb 27, 2026

perf: add small root-node LRU cache for MARF storage connections #6944

Open

2 tasks

cylewitruk-stacks requested review from benjamin-stacks and federico-stacks March 20, 2026 15:03

jcnelson reviewed Mar 20, 2026

View reviewed changes

cylewitruk-stacks added 8 commits March 21, 2026 21:42

Merge remote-tracking branch 'upstream/develop' into perf/marf-benchm…

fc60df5

…arks-and-tooling

enable concurrent worktree builds for marf-bench

6234b70

clean up logging in marf-bench

87d1de5

generate bench main.rs overlay & remove node256 caching strategy from…

ffa3aab

… defaults

add changelog.d entry

3ecd74e

cargo fmt with nightly-2026-01-22

23855f5

replace trie cache mode with compression on/off

9476967

refactor marf read benchmark to build more representative tries

b384eca

federico-stacks reviewed Mar 25, 2026

View reviewed changes

add new block-commit benchmark

973508e

Conversation

cylewitruk-stacks commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Applicable issues

Additional info (benefits, drawbacks, caveats)

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cylewitruk-stacks Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cylewitruk-stacks Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhaney-stacks commented Apr 17, 2026

Uh oh!

cylewitruk-stacks commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cylewitruk-stacks commented Feb 24, 2026 •

edited

Loading

codecov Bot commented Feb 24, 2026 •

edited

Loading

cylewitruk-stacks Mar 23, 2026 •

edited

Loading

cylewitruk-stacks Mar 20, 2026 •

edited

Loading