feat: new profiling and instrumentation utility crates `stacks-profiler` & `stacks-profiler-macros` by cylewitruk-stacks · Pull Request #6905 · stacks-network/stacks-core

cylewitruk-stacks · 2026-02-16T10:54:05Z

Description

Introduces a new lightweight profiler for targeted instrumentation and result extraction, intended primarily for benchmarking/instrumentation builds, and to be feature-gated when wired into runtime paths. This profiler is used extensively by the stacks-bench tool (follow-up PR).

The motivation behind a custom implementation was that none of the existing instrumentation/profiling crates provided the level of control and detail which stacks-bench needed, and I also wanted it to be as slim and low-overhead as possible while providing ergonomic primitives to help minimize profiling clutter in core code.

This PR adds two new workspace crates:

stacks-profiler: thread-local profiling library for hierarchical spans, wall/CPU/wait timing, tags, records, and counters.
stacks-profiler-macros: procedural macro crate for profiler ergonomics, including #[profile] (re-exported by stacks-profiler).

These crates are added at workspace root (rather than contrib) because they are intended dependencies of root crates.

No existing runtime paths are wired yet, so this PR does not alter any existing behavior.

Included in this PR:

Core profiler state/model and platform CPU-time support.
Instrumentation macros: span!, measure!, record!, counter_add!, plus conditional *_if! variants.
#[profile] attribute macro (sync functions only; async fn is compile-fail).
Crate docs/readme, examples, benchmarks, and integration/doctest coverage.

Non-Goals for this PR:

Wiring profiler calls into stackslib, clarity, etc. (follow-up PRs).
#[profile] support for async functions (not currently used in the main codebase).
Truly supporting Windows (I added a simple impl for completeness, but the timing resolution is coarse).

Applicable issues

Related to core epic #118.

Checklist

Test coverage for new or modified code paths
Changelog is updated

CLAassistant · 2026-02-16T10:54:13Z

All committers have signed the CLA.

codecov · 2026-02-16T14:08:03Z

Codecov Report

❌ Patch coverage is 40.63604% with 336 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.34%. Comparing base (0317850) to head (7ce2e88).
⚠️ Report is 222 commits behind head on develop.

Files with missing lines	Patch %	Lines
stacks-profiler/src/print.rs	0.00%	115 Missing ⚠️
stacks-profiler/src/lib.rs	49.28%	107 Missing ⚠️
stacks-profiler-macros/src/lib.rs	0.00%	54 Missing ⚠️
stacks-profiler/src/types.rs	0.00%	54 Missing ⚠️
stacks-profiler/src/state.rs	94.91%	6 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #6905      +/-   ##
===========================================
- Coverage    72.67%   72.34%   -0.33%     
===========================================
  Files          411      425      +14     
  Lines       221663   223697    +2034     
  Branches         0      338     +338     
===========================================
+ Hits        161086   161844     +758     
- Misses       60577    61853    +1276

Files with missing lines	Coverage Δ
stacks-profiler/src/macros.rs	`100.00% <100.00%> (ø)`
stacks-profiler/src/platform.rs	`100.00% <100.00%> (ø)`
stacks-profiler/src/state.rs	`94.91% <94.91%> (ø)`
stacks-profiler-macros/src/lib.rs	`0.00% <0.00%> (ø)`
stacks-profiler/src/types.rs	`0.00% <0.00%> (ø)`
stacks-profiler/src/lib.rs	`49.28% <49.28%> (ø)`
stacks-profiler/src/print.rs	`0.00% <0.00%> (ø)`

... and 313 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0317850...7ce2e88. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

Adds two new workspace crates (stacks-profiler and stacks-profiler-macros) providing a lightweight, thread-local hierarchical span profiler with sampling, tagging, records/counters, and tree printing—intended for instrumentation/benchmarking usage and future wiring into runtime paths.

Changes:

Introduces the core profiler implementation (TLS state/arena, span lifecycle, CPU-time support) plus printing utilities.
Adds ergonomic instrumentation macros (span!, measure!, record!, counter_add!, and conditional variants) and a #[profile] proc-macro.
Adds integration tests, examples, benchmarks, and wires the new crates + dependencies into the workspace/config.

Reviewed changes

Copilot reviewed 17 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`stacks-profiler/src/lib.rs`	Core profiler API, TLS state wiring, span/guard implementation, tag interning.
`stacks-profiler/src/macros.rs`	Defines `span!`/`measure!`/record/counter macros and conditional variants.
`stacks-profiler/src/state.rs`	Per-thread arena/stack model and results materialization.
`stacks-profiler/src/types.rs`	Record/counter/tag core value types and conversions.
`stacks-profiler/src/platform.rs`	Platform-specific per-thread CPU-time implementation.
`stacks-profiler/src/print.rs`	Tree formatter trait + default ANSI pretty-printer implementation.
`stacks-profiler/tests/integration.rs`	Integration coverage for nesting, sampling, suppression, count-only, panic safety, threading isolation.
`stacks-profiler/examples/basics.rs`	Basic end-to-end usage example of spans/measure/`#[profile]`.
`stacks-profiler/examples/aggregation.rs`	Example demonstrating aggregation + sampling modifiers.
`stacks-profiler/examples/cpu_vs_wait.rs`	Example showcasing wall vs CPU vs wait and metadata attachment.
`stacks-profiler/benches/overhead.rs`	Criterion benchmarks for overhead and sampling modes.
`stacks-profiler/README.md`	Crate-level documentation and usage reference.
`stacks-profiler/Cargo.toml`	New profiler crate manifest and deps.
`stacks-profiler-macros/src/lib.rs`	`#[profile]` proc-macro implementation (sync-only; async compile-fail).
`stacks-profiler-macros/Cargo.toml`	Proc-macro crate manifest and deps.
`Cargo.toml`	Adds new workspace members and shared workspace dependencies for profiler + macro deps.
`Cargo.lock`	Locks newly introduced dependencies.
`.cargo/config.toml`	Extends `clippy-stacks` alias to include the new crates.
`.gitignore`	Adds `.vite` ignore entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-17T06:39:57Z

+        }
+        let boxed: Box<str> = s.into_boxed_str();
+        let leaked: &'static str = Box::leak(boxed);
+        map.insert(leaked.into(), leaked);
+        leaked


intern_tag_str leaks the boxed string and then does map.insert(leaked.into(), leaked), which allocates/copies a second Box<str> for the key. This defeats the purpose of interning and can significantly increase allocations for high-cardinality tags. Consider changing the interner map key to &'static str (so you can map.insert(leaked, leaked)), or otherwise avoid allocating a second owned copy for the key.

Valid, will address when I'm in a place to switch back to this branch 👍

Copilot · 2026-02-17T06:39:57Z

+    #[inline(always)]
+    #[doc(hidden)]
+    pub fn end_suppression() {
+        SUPPRESS_DEPTH.with(|d| d.set(d.get().wrapping_sub(1)));


end_suppression() uses wrapping_sub(1) on the suppression depth. If this is ever called when the depth is already 0 (e.g., due to misuse, double-drop, or manual calls to the #[doc(hidden)] API), the depth will wrap to u32::MAX and the thread will effectively remain suppressed forever. Prefer a debug assertion + saturating/checked decrement to make underflow impossible.

Suggested change

SUPPRESS_DEPTH.with(|d| d.set(d.get().wrapping_sub(1)));

SUPPRESS_DEPTH.with(|d| {

let depth = d.get();

debug_assert!(

depth > 0,

"end_suppression called when suppression depth is already zero"

);

d.set(depth.saturating_sub(1));

});

Valid although unlikely as the RAII guards should keep things balanced (but as mentioned, assuming it's used via the intended non-hidden APIs).

Will look at this and see whether saturating sub/add or checked.expect() feels more correct for suppression depth in edge cases.

…iler

benjamin-stacks

This looks really really great.

I still want to have a closer look at all the meta-programming (I'm not very familiar with macro writing yet) and the timing implementation, but I really like the surface API.

I've left a few small notes on that area, but nothing huge.

I'm really looking forward to using this. We built a very similar thing at Stack Overflow many years ago (https://miniprofiler.com/), and it's super useful to this day.

benjamin-stacks · 2026-03-23T13:31:57Z

+
+    /// Estimated idle time (wall − CPU). See [`platform`] for resolution caveats.
+    pub fn wait_time(&self) -> Duration {
+        Duration::from_nanos(self.wall_time_ns.saturating_sub(self.cpu_time_ns))


Nit: could be de-duplicated.

Suggested change

Duration::from_nanos(self.wall_time_ns.saturating_sub(self.cpu_time_ns))

Duration::from_nanos(self.wait_time_ns())

benjamin-stacks · 2026-03-24T09:23:43Z

+            GuardKind::Span => Profiler::end_span(),
+            GuardKind::Suppression => Profiler::end_suppression(),


This assumes that the guards are dropped in reverse order of creation. That is reasonable (and probably guaranteed when using the macros, I haven't read that far yet), but not guaranteed if someone calls begin_span manually. I think as long as it's a public method, we should either handle it correctly, or at the very least document very well what a caller is expected to do.

I think (but you've probably thought about this more, so feel free to disagree) that it shouldn't be hard (or expensive) to handle it correctly, i.e. if guards are dropped out of order, end any descendant spans alongside.

My goal is essentially that the macros should be the only instrumentation API surface area that users see/use -- the "internal" methods are public just to enable the cross-crate macro-generated calls, but they're marked #[doc(hidden)] so they won't show up in docs/intellisense.

So, my thinking was rather that if someone digs deep enough to find those and decides to use them, then that's "at your own risk" 😅

benjamin-stacks · 2026-03-24T09:28:23Z

+                    let wall_ns = end_wall.duration_since(start_wall).as_nanos() as u64;
+                    let cpu_ns = end_cpu_ns.saturating_sub(start_cpu_ns);


We could only record the start and end times, and defer the computations to take_results(). Granted, these aren't particularly expensive, so it may or may not be worth it.

benjamin-stacks · 2026-03-24T12:57:55Z

+        id: &'static SpanId,
+        tag: Option<Tag>,
+    ) -> NodeId {
+        if let Some(last) = self.node(parent).last_child


Wondering if it's worth it to grab self.node_mut(parent) once into a variable, and use it here as well. If last_child doesn't match (which seems likely, unless you're in a loop), you're guaranteed to need that mutable reference anyway. Might as well avoid the second call. What do you think?

iirc there was a borrow issue with that, but can try it 👍

benjamin-stacks · 2026-03-24T13:19:02Z

+/// Increment a named counter on the current span (aggregated by key).
+///
+/// If a counter with this key already exists on the span, `delta` is added to it (saturating).
+/// Otherwise a new counter is created with the given value.


I'm wondering if these are the optimal semantics for counters. A minor refactoring would change on what span a counter is incremented (or even split a counter that across multiple spans). This may or may not be what a user actually intends or expects.

Granted, this is not a problem in so far as we're not actually losing data, but I'm wondering if the default PrettyPrinter should at least have the ability to aggregate counters. Alternatively, being able to define the scope of a counter might be really nice, but it may not be feasible in a pretty way.

Is that a question you've considered?

I spent a little time trying to at least make it reasonable, but when the counters got added it was largely just a "means to an end" optimization over using the more verbose record! for stacks-bench when I needed to count e.g. Clarity execution costs or cache hit/miss.

Completely open to suggestions!

benjamin-stacks · 2026-03-24T13:47:36Z

+/// });
+/// ```
+#[macro_export]
+macro_rules! span {


Seem like there's a (minor) footgun here.

span!("memoized lookup"); if cache.has(foo) { return cache.get(foo); } let _c = span!("one-time computation"); compute_and_cache(foo)

If my understanding of drop rules is correct (it may not be!), the first guard only exists in a temporary scope and is immediately dropped, causing the profiling tree to look different than expected. I assume that's exactly why you always assign span! results to variables.

Can we either

do some [must_use] magic to warn about this, or

assign the result to a variable that lives for the current scope, so that if nobody takes ownership of the guard, it still works as excpect?

(I don't know if the second thing is even possible, given that the new variable and the let _x = span!() have to live in the same scope, so it might come down to the first point)

Yeah we should be able to add #[must_use] on ProfileGuard, good idea 👍

federico-stacks

LGTM!

I’ve added a number of minor comments and sanity checks.

One additional point to consider is the placement of these new crates. Should we move them to the contrib folder? If we decide to keep them in their current location, we should verify whether they are "covered" by the Bug Bounty Program, since the contrib folder is entirely excluded.

federico-stacks · 2026-03-24T13:27:53Z

+    let ctx = NodeContext {
+        stats,
+        depth,
+        is_last_sibling: connector == "└── ",


This couples the semantic meaning to a specific Unicode string. If the connector strings are ever changed, this could silently breaks. Maybe passing a bool is_last parameter, could do the job.

Alternatively, I could also be fine using the string comparison, making it a global constant reused along this module.

I was on the fence of whether or not to even include the printing stuff since I pretty much just used it for early testing/debugging together with the examples. So, minimal change and go the const route?

federico-stacks · 2026-03-24T13:34:25Z

+            records: Vec::with_capacity(4),
+            counters: Vec::with_capacity(4),
+        });
+        idx as NodeId


Probably not an issue, but wondering if it could make sense add some assertion to guard that idx <= u32::MAX

Yeah, probably a good idea from a correctness/soundness perspective, so nobody ends up with truncated ids (though that's a crapload of nodes)

federico-stacks · 2026-03-24T13:42:53Z

+    #[doc(hidden)]
+    #[inline(always)]
+    pub fn begin_span(id: &'static SpanId, tag: Option<Tag>) -> ProfileGuard {
+        let start_wall = Instant::now();


sanity-check: small asymmetry between begin_span and end_span. In "begin", bookkeping is excluded from timing, while in the "end" bookkeeping is included. Maybe acceptable but worth checking

iirc I structured "end" that way so that untimed/unsampled spans wouldn't incur the FFI/syscall overhead of probing the clocks, since in the actual integrated marf-bench branch some of the spans sit around very tight loops (sampled).

I could probably split this up a bit more with more specific GuardKind variants and more targeted begin_timed()/end_timed(), etc. I'll have a look at it 👍

federico-stacks · 2026-03-24T13:57:43Z

+    // Anonymous Block
+    ($($t:tt)*) => {{
+        let _guard = $crate::span!("scope");
+        $($t)*
+    }};


sanity-check: this would match even typos in measure! macro, right? Maybe we could convert the catch-all match to a compiler_error!, and right before it we could add measure!({ ... }) for Anonymous Block

benjamin-stacks · 2026-03-24T15:00:02Z

Should we move them to the contrib folder?

A third alternative would be to move it to a completely new project, as a separate open-source project. It might be useful in other Rust projects, far away from the Stacks ecosystem.

benjamin-stacks · 2026-03-26T14:54:27Z

+    if let Some(async_token) = &input_fn.sig.asyncness {
+        return syn::Error::new_spanned(
+            async_token,
+            "#[profile] does not support async fn; use span!/measure! inside the function body",


So, I understand why profiling async functions is going to be inherently weird, given that they might end up running on who knows what thread.

I'm just wondering about the guidance from this error message -- if #[profile] didn't reject async functions, wouldn't it actually do exactly the same thing as a span! at the beginning of said function?

benjamin-stacks · 2026-03-26T15:30:03Z

+
+    // Name, Rate, count_only, Block
+    ($name:literal, rate: $rate:literal, count_only, $block:block) => {{
+        let _guard = $crate::span!($name, rate: $rate, count_only);


Should we give all these variables a more obvious name, like you did in #[profile]? This variable will appear in the autocomplete, so it might be less confusing to the user what this is if the name is a little more self-explanatory.

Yeah we can do that, the proc macro came after these and I guess I never made it back around to align var names 👍

benjamin-stacks · 2026-03-26T15:41:10Z

+        Some($crate::Profiler::begin_span($id, $tag_opt))
+    }};
+
+    (@should_sample $counter:ident, $rate:literal) => {{


Nit: This is emitting a conditional runtime code that could be moved to compile time (since $rate is literal). In particular, I wouldn't surprised if the is_power_of_two call eats up any gain from splitting the logic in the first place.

Worth it to optimize? (Could totally be a follow-up)

Hm, in the proc macro it works to do it compile-time, but I'm not sure it's possible in the declarative version? These are all const/const fn, so the actual generated bytecode should be very minimal.

benjamin-stacks · 2026-03-26T15:52:58Z

+    fn materialize_node(nodes: &mut Vec<Option<Node>>, node_id: NodeId) -> ProfileStats {
+        let node = nodes[node_id as usize]
+            .take()
+            .expect("node already materialized or missing");


💭 I don't think this should be allowed to panic.

I realize that this should be unreachable and is guarding against bugs only, but it feels like a profiler tool should never crash the node (or whatever application). A bug in stacks-profiler logic isn't representing inconsistent application state as far as the profiled application is concerned.

Thoughts?

Yeah that's fair, can make the "take results" fallible 👍

federico-stacks · 2026-04-03T08:10:40Z

Should we move them to the contrib folder?

A third alternative would be to move it to a completely new project, as a separate open-source project. It might be useful in other Rust projects, far away from the Stacks ecosystem.

Sure, that could work as well. As a starting point, we could add it under one of our organizations. We’ve already taken a similar approach with a couple of utilities, for example:

We might have more flexibility placing it under the stx-labs organization, as we did with pinny-rs.

This would also avoid “burdening” the main repository with additional packages that are general purpose.

addition of new stacks-profiler and stacks-profiler-macros crates

2196bde

cylewitruk-stacks requested a review from Copilot February 17, 2026 06:35

Copilot started reviewing on behalf of cylewitruk-stacks February 17, 2026 06:35 View session

Copilot AI reviewed Feb 17, 2026

View reviewed changes

Merge remote-tracking branch 'upstream/develop' into feat/stacks-prof…

7ce2e88

…iler

cylewitruk-stacks requested review from benjamin-stacks and federico-stacks March 20, 2026 15:00

benjamin-stacks reviewed Mar 24, 2026

View reviewed changes

federico-stacks reviewed Mar 24, 2026

View reviewed changes

benjamin-stacks reviewed Mar 26, 2026

View reviewed changes

-        SUPPRESS_DEPTH.with(|d| d.set(d.get().wrapping_sub(1)));
+        SUPPRESS_DEPTH.with(|d| {
+            let depth = d.get();
+            debug_assert!(
+                depth > 0,
+                "end_suppression called when suppression depth is already zero"
+            );
+            d.set(depth.saturating_sub(1));
+        });

	Duration::from_nanos(self.wall_time_ns.saturating_sub(self.cpu_time_ns))
	Duration::from_nanos(self.wait_time_ns())

		GuardKind::Span => Profiler::end_span(),
		GuardKind::Suppression => Profiler::end_suppression(),

		let wall_ns = end_wall.duration_since(start_wall).as_nanos() as u64;
		let cpu_ns = end_cpu_ns.saturating_sub(start_cpu_ns);

Conversation

cylewitruk-stacks commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Applicable issues

Checklist

Uh oh!

CLAassistant commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

cylewitruk-stacks Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benjamin-stacks left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cylewitruk-stacks Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cylewitruk-stacks Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

federico-stacks left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cylewitruk-stacks Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benjamin-stacks commented Mar 24, 2026

Uh oh!

cylewitruk-stacks commented Feb 16, 2026 •

edited

Loading

CLAassistant commented Feb 16, 2026 •

edited

Loading

codecov Bot commented Feb 16, 2026 •

edited

Loading

cylewitruk-stacks Feb 17, 2026 •

edited

Loading

cylewitruk-stacks Mar 27, 2026 •

edited

Loading

cylewitruk-stacks Mar 27, 2026 •

edited

Loading

federico-stacks left a comment •

edited

Loading

cylewitruk-stacks Mar 24, 2026 •

edited

Loading

cylewitruk-stacks Mar 27, 2026 •

edited

Loading