perf: introduce MARF benchmarks and tooling#6932
perf: introduce MARF benchmarks and tooling#6932cylewitruk-stacks wants to merge 30 commits intostacks-network:developfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a MARF-focused benchmarking suite to stackslib (read/write workflows + primitives) and introduces a contrib/marf-bench CLI tool to run/compare those benches across git revisions using worktrees.
Changes:
- Add a custom
cargo benchharness understackslib/benches/marf/*with allocation counting + TSV-like summary output. - Add
contrib/marf-benchCLI to run the harness in the current tree or compare base/target revisions via temporary worktrees. - Consolidate some CLI/tooling dependencies under workspace deps and wire up a
cargo marf-benchalias.
Reviewed changes
Copilot reviewed 26 out of 27 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| stackslib/benches/marf/allocator.rs | Global counting allocator wrapper (jemalloc/system) for bench allocation metrics |
| stackslib/benches/marf/common.rs | Shared measurement helpers + SQLite WAL checkpoint configuration parsing |
| stackslib/benches/marf/main.rs | Bench entrypoint dispatching primitives/read/write subcommands and printing unified summaries |
| stackslib/benches/marf/primitives.rs | Microbenchmarks for trie node encode/decode and storage/trie primitives |
| stackslib/benches/marf/read.rs | Read-heavy MARF get / get_with_proof benchmarks with configurable fixture + depths |
| stackslib/benches/marf/utils.rs | Benchmark utilities (env parsing, deterministic IDs/hashes, small trie helpers) |
| stackslib/benches/marf/write.rs | Write workflow benchmark focused on promotion-driving inserts + configurable updates |
| stackslib/Cargo.toml | Adds the marf bench target and dev-deps needed for bench compilation |
| Cargo.toml | Adds workspace deps for tooling/benchmarking and includes contrib/marf-bench as a workspace member |
| Cargo.lock | Locks new workspace/tooling dependencies introduced by the new CLI + workspace dependency additions |
| .cargo/config.toml | Adds cargo marf-bench alias and includes marf-bench in the clippy alias package set |
| contrib/marf-bench/Cargo.toml | Defines the new marf-bench binary crate |
| contrib/marf-bench/README.md | Documents CLI usage, worktree behavior, and cleanup/recovery |
| contrib/marf-bench/src/main.rs | clap CLI definition + ctrl-c/panic cleanup hooks |
| contrib/marf-bench/src/git.rs | Git helpers (revision verification, merge-base resolution, staged snapshot commit creation) |
| contrib/marf-bench/src/runner.rs | Worktree orchestration, bench overlay, build, execution, and cleanup logic |
| contrib/marf-bench/src/report.rs | Formatting/printing for single runs, comparisons, and repeated-run jitter stats |
| contrib/marf-bench/src/util.rs | Small helpers (command runner, parsing summary lines, sorting, percent deltas) |
| contrib/marf-bench/src/commands/mod.rs | Command module wiring |
| contrib/marf-bench/src/commands/run.rs | run subcommand implementation (current tree only) |
| contrib/marf-bench/src/commands/bench.rs | bench subcommand implementation (base/target comparisons + repeats) |
| contrib/marf-bench/src/commands/bench_target.rs | Bench target subcommands + CLI-to-env forwarding |
| contrib/marf-bench/src/commands/clean.rs | Cleanup subcommand for stale worktrees, cached roots, and orphan temp dirs |
| contrib/clarity-cli/Cargo.toml | Switches clap to the workspace dependency |
| contrib/stacks-inspect/Cargo.toml | Switches clap to the workspace dependency |
| contrib/tools/config-docs-generator/Cargo.toml | Switches clap to the workspace dependency |
| stacks-signer/Cargo.toml | Adds a TODO comment about switching clap to a workspace dependency |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #6932 +/- ##
===========================================
- Coverage 72.67% 69.94% -2.74%
===========================================
Files 411 412 +1
Lines 221663 217940 -3723
Branches 0 338 +338
===========================================
- Hits 161086 152429 -8657
- Misses 60577 65511 +4934 see 290 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
…tch-support-to-benchmarks
…ort-to-benchmarks perf: add `TrieNodePatch` support to MARF benchmarks and the `marf-bench` harness
|
|
||
| /// Set CACHE_STRATEGIES as comma-separated values (for example: noop,node256). | ||
| #[arg(long)] | ||
| cache_strategies: Option<String>, |
There was a problem hiding this comment.
I haven't PR'd the removal yet, but yeah it will be. This branch is old, but I could just remove this now and always use noop until the actual removal lands.
There was a problem hiding this comment.
Replaced cache strategies with compression on/off during some read bench refactoring in 9476967
|
|
||
| /// Set SQLITE_WAL_CHECKPOINT_MODE for post-setup checkpoint when SQLITE_WAL_AUTOCHECKPOINT=0 (PASSIVE|FULL|RESTART|TRUNCATE). | ||
| #[arg(long)] | ||
| sqlite_wal_checkpoint_mode: Option<String>, |
There was a problem hiding this comment.
Does changing the checkpoint mode affect anything, assuming that the number of pages before checkpoint matches or exceeds the size of the (single) transaction to be committed to record the hash and location of the serialized trie blob?
There was a problem hiding this comment.
A transaction on the MARF database should only happen when a trie is added. For MARFs where the tries are stored externally in a .blobs file, the size of this transaction is a small O(1).
There was a problem hiding this comment.
So, SQLite in WAL mode places all modifications in the WAL until a checkpoint op finally merges them into the main db file, which by default is automatically every 1000 pages (~4MB); so the WAL can contain many transactions when that occurs.
So, RESTART is likely the best checkpointing mode for the long-running nature of the node, so that it reuses the existing space instead of new disk allocations+grows after every checkpoint.
The more interesting setting, though, is the auto-checkpointing, which will stop and perform a merge of those 1000 pages as soon as that boundary is reached, regardless of what the node is doing (excluding ongoing db transactions, ofc). Disabling auto-checkpointing altogether for the hot db's and manually triggering checkpointing after block processing would ensure that checkpointing can't happen in the middle of transaction processing, keeping that cost off the "hot" path (particularly for blocks doing a lot of work (writes), which may incur multiple checkpoints during processing).
Reference: https://sqlite.org/wal.html
P.S. I haven't explicitly traced the transaction scopes/graph yet, this was just something I was interested in testing and easy to add, so that's why it's here. I was primarily thinking of the Clarity MARF'd db which has a number of smaller writes during block processing iirc.
There was a problem hiding this comment.
Disabling auto-checkpointing altogether for the hot db's and manually triggering checkpointing after block processing would ensure that checkpointing can't happen in the middle of transaction processing, keeping that cost off the "hot" path (particularly for blocks doing a lot of work (writes), which may incur multiple checkpoints during processing).
I think it's a question of when we want to pay for merging the WAL back into the DB -- at the end of block-processing, or at the end of lots of blocks. IIRC all of the MARF-indexed side stores commit everything in a single transaction -- i.e. it's not the case that we have multiple transaction-commits (or even explicit checkpoints) during block-processing. The act of processing one (Stacks) block is the act of performing at most one transaction-commit per MARF'ed DB. I guess the best-possible case is if we somehow manage to avoid internal fragmentation -- either way, those pages are going to be copied in, so we might as well try to ensure that as many disk blocks of those pages represent Stacks data.
There was a problem hiding this comment.
I'd be curious to know what kinds of performance characteristics you see when you vary the WAL behavior?
| record_case( | ||
| &mut summary, | ||
| "trie_read_root_with_hash", | ||
| output_mode, | ||
| || { | ||
| let mut fixture = make_trie_fixture("noop"); | ||
| let mut conn = fixture.store.connection(); | ||
| conn.open_block(&fixture.tip) | ||
| .expect("failed to open trie fixture tip"); | ||
| for _ in 0..iters { | ||
| black_box(Trie::read_root(&mut conn).expect("read_root failed")); | ||
| } | ||
| }, | ||
| ); |
There was a problem hiding this comment.
qq: in record cases like this one, make_trie_fixture / make_storage_fixture and open_block, are called inside the measured closure, so we're including fixture setup cost in the timing and allocation counts. Is that desired?
| unsafe fn realloc(&self, ptr: *mut u8, layout: Layout, new_size: usize) -> *mut u8 { | ||
| REALLOC_CALLS.fetch_add(1, Ordering::Relaxed); | ||
| DEALLOC_BYTES.fetch_add(layout.size() as u64, Ordering::Relaxed); | ||
| ALLOC_BYTES.fetch_add(new_size as u64, Ordering::Relaxed); | ||
| unsafe { ALLOCATOR.realloc(ptr, layout, new_size) } | ||
| } |
There was a problem hiding this comment.
nit: realloc adds new_size to ALLOC_BYTES but doesn't increment ALLOC_CALLS (only REALLOC_CALLS). Since the summary output only surfaces alloc_count and alloc_bytes (realloc_calls is raw-mode only), the two visible metrics are inconsistent when reallocs are frequent.
| pub fn pct(base: f64, target: f64) -> f64 { | ||
| if base == 0.0 { | ||
| return 0.0; | ||
| } | ||
| ((target - base) * 100.0) / base | ||
| } |
There was a problem hiding this comment.
nit: pct() returns 0.0 when base == 0.0, which would show +0.0% in the comparison table if a metric goes from 0 to nonzero (e.g., a change that introduces allocations where there were none). Maybe not a big issue for our use cases, but worth mentioning.
| } | ||
|
|
||
| /// Parse `summary` TSV lines emitted by the marf benchmark harness into typed rows. | ||
| pub fn extract_summary_lines(text: &str) -> Vec<SummaryRow> { |
There was a problem hiding this comment.
minor: is tightly coupled to harness output format. Possible changes to that format could silenty produce empty results with no error evidence.
|
@cylewitruk-stacks please remove @jcnelson as reviewer and add @aaronb-stacks or @brice-stacks |
|
@dhaney-stacks jude is a participant via existing comments, so I don't seem to be able to remove him. |
Description
This PR introduces a benchmark suite covering MARF reads/writes & some primitives.
No production code modifications; only new benchmarks and tooling.
Implemented in two layers:
stackslib/benches/marf/*- a bench harness which can be used for configurable one-shot benchmarks (run viacargo bench -p stackslib --bench marf -- <command>).CountingAllocatorto intercept and measure heap allocations.marf-benchutility, but can be run standalone.marf-benchuses to collect output).contrib/marf-bench/- a benchmark runner harness which provides a more friendlyclap-based CLI for running the above benchmarks.baseandtargetgit revisions (commit, tag, branch, etc.) via thebenchcommand.baseandtargetworktrees, build and run the benchmarks within each worktree separately (avoiding any side-effects on your current working tree/index/branch).Applicable issues
#118(benchmarking in CI) and various 100x tickets (for measurement)Additional info (benefits, drawbacks, caveats)
Note:
This was written before the newAdded preliminary patch node benchmarks in cylewitruk-stacks#1.TrieNodePatchnode type was introduced, so it does not currently have any benchmarks for patch nodes.Example: Running
marf-bench bench read --repeats 15 --base ... --target ...will give you a summarized table similar to:Checklist