Skip to content

perf: introduce MARF benchmarks and tooling#6932

Open
cylewitruk-stacks wants to merge 30 commits intostacks-network:developfrom
cylewitruk-stacks:perf/marf-benchmarks-and-tooling
Open

perf: introduce MARF benchmarks and tooling#6932
cylewitruk-stacks wants to merge 30 commits intostacks-network:developfrom
cylewitruk-stacks:perf/marf-benchmarks-and-tooling

Conversation

@cylewitruk-stacks
Copy link
Copy Markdown
Contributor

@cylewitruk-stacks cylewitruk-stacks commented Feb 24, 2026

Description

This PR introduces a benchmark suite covering MARF reads/writes & some primitives.

No production code modifications; only new benchmarks and tooling.

Implemented in two layers:

  1. stackslib/benches/marf/* - a bench harness which can be used for configurable one-shot benchmarks (run via cargo bench -p stackslib --bench marf -- <command>).
    • These use a custom CountingAllocator to intercept and measure heap allocations.
    • Intended to be run using e.g. the below marf-bench utility, but can be run standalone.
    • Supports TSV output for machine parsing (which marf-bench uses to collect output).
  2. contrib/marf-bench/ - a benchmark runner harness which provides a more friendly clap-based CLI for running the above benchmarks.
    • Can compare between base and target git revisions (commit, tag, branch, etc.) via the bench command.
    • It will automate the creation of git worktrees, overlay the above benchmark files onto the selected base and target worktrees, build and run the benchmarks within each worktree separately (avoiding any side-effects on your current working tree/index/branch).
    • See the README for more detailed info on its use.

Applicable issues

  • Related to core epic #118 (benchmarking in CI) and various 100x tickets (for measurement)

Additional info (benefits, drawbacks, caveats)

Note: This was written before the new TrieNodePatch node type was introduced, so it does not currently have any benchmarks for patch nodes. Added preliminary patch node benchmarks in cylewitruk-stacks#1.

Example: Running marf-bench bench read --repeats 15 --base ... --target ... will give you a summarized table similar to:

[marf-bench] Comparison summary
values: base:merge-base(origin/develop) / target:current-working-tree / %delta
benchmark  name                                  total(ms) b/t       Δ  alloc_count b/t        Δ    alloc_bytes b/t        Δ
----------------------------------------------------------------------------------------------------------------------------
read       node256/depth=1024/variant=get    8824.528/7951.251   -9.9%       11700000/0  -100.0%      17000100000/0  -100.0%
read       node256/depth=128/variant=get     8131.712/7561.920   -7.0%       9900015/21  -100.0%  13849234134/38340  -100.0%
read       node256/depth=512/variant=get     8418.888/7680.533   -8.8%       10800006/6  -100.0%  16866016971/19416  -100.0%
read       node256/depth=8192/variant=get    8929.387/7950.491  -11.0%       11700000/0  -100.0%      17000100000/0  -100.0%
read       noop/depth=1024/variant=get     15194.902/13760.344   -9.4%       15300000/0  -100.0%      27182700000/0  -100.0%
read       noop/depth=128/variant=get      14334.103/13071.203   -8.8%       13500003/9  -100.0%   24031800192/1044  -100.0%
read       noop/depth=512/variant=get      14804.237/13426.497   -9.3%       14400000/0  -100.0%    27048600000/768  -100.0%
read       noop/depth=8192/variant=get     15231.926/13896.007   -8.8%       15300000/0  -100.0%      27182700000/0  -100.0%
----------------------------------------------------------------------------------------------------------------------------

[marf-bench] Repeated comparison stats
values: base:merge-base(origin/develop) / target:current-working-tree / median/min/max %delta across 15 repeats
benchmark  name                             total Δ med   total Δ min   total Δ max   count Δ med   bytes Δ med  repeats
------------------------------------------------------------------------------------------------------------------------
read       node256/depth=1024/variant=get        -10.8%        -13.7%         -7.4%       -100.0%       -100.0%       15
read       node256/depth=128/variant=get          -7.2%         -9.7%         -2.9%       -100.0%       -100.0%       15
read       node256/depth=512/variant=get         -10.0%        -13.0%         -6.3%       -100.0%       -100.0%       15
read       node256/depth=8192/variant=get        -10.9%        -13.3%         -7.1%       -100.0%       -100.0%       15
read       noop/depth=1024/variant=get           -10.1%        -12.6%         -8.3%       -100.0%       -100.0%       15
read       noop/depth=128/variant=get             -8.5%        -10.9%         -5.8%       -100.0%       -100.0%       15
read       noop/depth=512/variant=get             -9.3%        -12.1%         -7.4%       -100.0%       -100.0%       15
read       noop/depth=8192/variant=get           -10.0%        -11.8%         -7.4%       -100.0%       -100.0%       15
------------------------------------------------------------------------------------------------------------------------

[marf-bench] Repeat confidence summary
values: base:merge-base(origin/develop) / target:current-working-tree / total_ms stability across 15 repeats
rows: total=8 stable=8 high-jitter=0 (high-jitter means min<0<max and spread>=30.0%)
high-jitter rows: none

Checklist

  • Test coverage for new or modified code paths
  • Changelog is updated

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a MARF-focused benchmarking suite to stackslib (read/write workflows + primitives) and introduces a contrib/marf-bench CLI tool to run/compare those benches across git revisions using worktrees.

Changes:

  • Add a custom cargo bench harness under stackslib/benches/marf/* with allocation counting + TSV-like summary output.
  • Add contrib/marf-bench CLI to run the harness in the current tree or compare base/target revisions via temporary worktrees.
  • Consolidate some CLI/tooling dependencies under workspace deps and wire up a cargo marf-bench alias.

Reviewed changes

Copilot reviewed 26 out of 27 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
stackslib/benches/marf/allocator.rs Global counting allocator wrapper (jemalloc/system) for bench allocation metrics
stackslib/benches/marf/common.rs Shared measurement helpers + SQLite WAL checkpoint configuration parsing
stackslib/benches/marf/main.rs Bench entrypoint dispatching primitives/read/write subcommands and printing unified summaries
stackslib/benches/marf/primitives.rs Microbenchmarks for trie node encode/decode and storage/trie primitives
stackslib/benches/marf/read.rs Read-heavy MARF get / get_with_proof benchmarks with configurable fixture + depths
stackslib/benches/marf/utils.rs Benchmark utilities (env parsing, deterministic IDs/hashes, small trie helpers)
stackslib/benches/marf/write.rs Write workflow benchmark focused on promotion-driving inserts + configurable updates
stackslib/Cargo.toml Adds the marf bench target and dev-deps needed for bench compilation
Cargo.toml Adds workspace deps for tooling/benchmarking and includes contrib/marf-bench as a workspace member
Cargo.lock Locks new workspace/tooling dependencies introduced by the new CLI + workspace dependency additions
.cargo/config.toml Adds cargo marf-bench alias and includes marf-bench in the clippy alias package set
contrib/marf-bench/Cargo.toml Defines the new marf-bench binary crate
contrib/marf-bench/README.md Documents CLI usage, worktree behavior, and cleanup/recovery
contrib/marf-bench/src/main.rs clap CLI definition + ctrl-c/panic cleanup hooks
contrib/marf-bench/src/git.rs Git helpers (revision verification, merge-base resolution, staged snapshot commit creation)
contrib/marf-bench/src/runner.rs Worktree orchestration, bench overlay, build, execution, and cleanup logic
contrib/marf-bench/src/report.rs Formatting/printing for single runs, comparisons, and repeated-run jitter stats
contrib/marf-bench/src/util.rs Small helpers (command runner, parsing summary lines, sorting, percent deltas)
contrib/marf-bench/src/commands/mod.rs Command module wiring
contrib/marf-bench/src/commands/run.rs run subcommand implementation (current tree only)
contrib/marf-bench/src/commands/bench.rs bench subcommand implementation (base/target comparisons + repeats)
contrib/marf-bench/src/commands/bench_target.rs Bench target subcommands + CLI-to-env forwarding
contrib/marf-bench/src/commands/clean.rs Cleanup subcommand for stale worktrees, cached roots, and orphan temp dirs
contrib/clarity-cli/Cargo.toml Switches clap to the workspace dependency
contrib/stacks-inspect/Cargo.toml Switches clap to the workspace dependency
contrib/tools/config-docs-generator/Cargo.toml Switches clap to the workspace dependency
stacks-signer/Cargo.toml Adds a TODO comment about switching clap to a workspace dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread stackslib/Cargo.toml Outdated
Comment thread stackslib/Cargo.toml Outdated
Comment thread Cargo.toml Outdated
Comment thread contrib/tools/config-docs-generator/Cargo.toml Outdated
Comment thread contrib/marf-bench/src/runner.rs
Comment thread stackslib/benches/marf/write.rs Outdated
Comment thread contrib/marf-bench/src/runner.rs Outdated
Comment thread stackslib/benches/marf/read.rs Outdated
Comment thread stackslib/benches/marf/primitives.rs
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.94%. Comparing base (0317850) to head (e1651b2).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #6932      +/-   ##
===========================================
- Coverage    72.67%   69.94%   -2.74%     
===========================================
  Files          411      412       +1     
  Lines       221663   217940    -3723     
  Branches         0      338     +338     
===========================================
- Hits        161086   152429    -8657     
- Misses       60577    65511    +4934     

see 290 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0317850...e1651b2. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.


/// Set CACHE_STRATEGIES as comma-separated values (for example: noop,node256).
#[arg(long)]
cache_strategies: Option<String>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a no-op now?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't PR'd the removal yet, but yeah it will be. This branch is old, but I could just remove this now and always use noop until the actual removal lands.

Copy link
Copy Markdown
Contributor Author

@cylewitruk-stacks cylewitruk-stacks Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced cache strategies with compression on/off during some read bench refactoring in 9476967


/// Set SQLITE_WAL_CHECKPOINT_MODE for post-setup checkpoint when SQLITE_WAL_AUTOCHECKPOINT=0 (PASSIVE|FULL|RESTART|TRUNCATE).
#[arg(long)]
sqlite_wal_checkpoint_mode: Option<String>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does changing the checkpoint mode affect anything, assuming that the number of pages before checkpoint matches or exceeds the size of the (single) transaction to be committed to record the hash and location of the serialized trie blob?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A transaction on the MARF database should only happen when a trie is added. For MARFs where the tries are stored externally in a .blobs file, the size of this transaction is a small O(1).

Copy link
Copy Markdown
Contributor Author

@cylewitruk-stacks cylewitruk-stacks Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, SQLite in WAL mode places all modifications in the WAL until a checkpoint op finally merges them into the main db file, which by default is automatically every 1000 pages (~4MB); so the WAL can contain many transactions when that occurs.

So, RESTART is likely the best checkpointing mode for the long-running nature of the node, so that it reuses the existing space instead of new disk allocations+grows after every checkpoint.

The more interesting setting, though, is the auto-checkpointing, which will stop and perform a merge of those 1000 pages as soon as that boundary is reached, regardless of what the node is doing (excluding ongoing db transactions, ofc). Disabling auto-checkpointing altogether for the hot db's and manually triggering checkpointing after block processing would ensure that checkpointing can't happen in the middle of transaction processing, keeping that cost off the "hot" path (particularly for blocks doing a lot of work (writes), which may incur multiple checkpoints during processing).

Reference: https://sqlite.org/wal.html

P.S. I haven't explicitly traced the transaction scopes/graph yet, this was just something I was interested in testing and easy to add, so that's why it's here. I was primarily thinking of the Clarity MARF'd db which has a number of smaller writes during block processing iirc.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabling auto-checkpointing altogether for the hot db's and manually triggering checkpointing after block processing would ensure that checkpointing can't happen in the middle of transaction processing, keeping that cost off the "hot" path (particularly for blocks doing a lot of work (writes), which may incur multiple checkpoints during processing).

I think it's a question of when we want to pay for merging the WAL back into the DB -- at the end of block-processing, or at the end of lots of blocks. IIRC all of the MARF-indexed side stores commit everything in a single transaction -- i.e. it's not the case that we have multiple transaction-commits (or even explicit checkpoints) during block-processing. The act of processing one (Stacks) block is the act of performing at most one transaction-commit per MARF'ed DB. I guess the best-possible case is if we somehow manage to avoid internal fragmentation -- either way, those pages are going to be copied in, so we might as well try to ensure that as many disk blocks of those pages represent Stacks data.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be curious to know what kinds of performance characteristics you see when you vary the WAL behavior?

Comment on lines +504 to +517
record_case(
&mut summary,
"trie_read_root_with_hash",
output_mode,
|| {
let mut fixture = make_trie_fixture("noop");
let mut conn = fixture.store.connection();
conn.open_block(&fixture.tip)
.expect("failed to open trie fixture tip");
for _ in 0..iters {
black_box(Trie::read_root(&mut conn).expect("read_root failed"));
}
},
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: in record cases like this one, make_trie_fixture / make_storage_fixture and open_block, are called inside the measured closure, so we're including fixture setup cost in the timing and allocation counts. Is that desired?

Comment on lines +90 to +95
unsafe fn realloc(&self, ptr: *mut u8, layout: Layout, new_size: usize) -> *mut u8 {
REALLOC_CALLS.fetch_add(1, Ordering::Relaxed);
DEALLOC_BYTES.fetch_add(layout.size() as u64, Ordering::Relaxed);
ALLOC_BYTES.fetch_add(new_size as u64, Ordering::Relaxed);
unsafe { ALLOCATOR.realloc(ptr, layout, new_size) }
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: realloc adds new_size to ALLOC_BYTES but doesn't increment ALLOC_CALLS (only REALLOC_CALLS). Since the summary output only surfaces alloc_count and alloc_bytes (realloc_calls is raw-mode only), the two visible metrics are inconsistent when reallocs are frequent.

Comment on lines +88 to +93
pub fn pct(base: f64, target: f64) -> f64 {
if base == 0.0 {
return 0.0;
}
((target - base) * 100.0) / base
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: pct() returns 0.0 when base == 0.0, which would show +0.0% in the comparison table if a metric goes from 0 to nonzero (e.g., a change that introduces allocations where there were none). Maybe not a big issue for our use cases, but worth mentioning.

}

/// Parse `summary` TSV lines emitted by the marf benchmark harness into typed rows.
pub fn extract_summary_lines(text: &str) -> Vec<SummaryRow> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: is tightly coupled to harness output format. Possible changes to that format could silenty produce empty results with no error evidence.

@dhaney-stacks
Copy link
Copy Markdown
Contributor

@cylewitruk-stacks please remove @jcnelson as reviewer and add @aaronb-stacks or @brice-stacks

@cylewitruk-stacks
Copy link
Copy Markdown
Contributor Author

@dhaney-stacks jude is a participant via existing comments, so I don't seem to be able to remove him.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants