feat: new profiling and instrumentation utility crates stacks-profiler & stacks-profiler-macros#6905
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #6905 +/- ##
===========================================
- Coverage 72.67% 72.34% -0.33%
===========================================
Files 411 425 +14
Lines 221663 223697 +2034
Branches 0 338 +338
===========================================
+ Hits 161086 161844 +758
- Misses 60577 61853 +1276
... and 313 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds two new workspace crates (stacks-profiler and stacks-profiler-macros) providing a lightweight, thread-local hierarchical span profiler with sampling, tagging, records/counters, and tree printing—intended for instrumentation/benchmarking usage and future wiring into runtime paths.
Changes:
- Introduces the core profiler implementation (TLS state/arena, span lifecycle, CPU-time support) plus printing utilities.
- Adds ergonomic instrumentation macros (
span!,measure!,record!,counter_add!, and conditional variants) and a#[profile]proc-macro. - Adds integration tests, examples, benchmarks, and wires the new crates + dependencies into the workspace/config.
Reviewed changes
Copilot reviewed 17 out of 19 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
stacks-profiler/src/lib.rs |
Core profiler API, TLS state wiring, span/guard implementation, tag interning. |
stacks-profiler/src/macros.rs |
Defines span!/measure!/record/counter macros and conditional variants. |
stacks-profiler/src/state.rs |
Per-thread arena/stack model and results materialization. |
stacks-profiler/src/types.rs |
Record/counter/tag core value types and conversions. |
stacks-profiler/src/platform.rs |
Platform-specific per-thread CPU-time implementation. |
stacks-profiler/src/print.rs |
Tree formatter trait + default ANSI pretty-printer implementation. |
stacks-profiler/tests/integration.rs |
Integration coverage for nesting, sampling, suppression, count-only, panic safety, threading isolation. |
stacks-profiler/examples/basics.rs |
Basic end-to-end usage example of spans/measure/#[profile]. |
stacks-profiler/examples/aggregation.rs |
Example demonstrating aggregation + sampling modifiers. |
stacks-profiler/examples/cpu_vs_wait.rs |
Example showcasing wall vs CPU vs wait and metadata attachment. |
stacks-profiler/benches/overhead.rs |
Criterion benchmarks for overhead and sampling modes. |
stacks-profiler/README.md |
Crate-level documentation and usage reference. |
stacks-profiler/Cargo.toml |
New profiler crate manifest and deps. |
stacks-profiler-macros/src/lib.rs |
#[profile] proc-macro implementation (sync-only; async compile-fail). |
stacks-profiler-macros/Cargo.toml |
Proc-macro crate manifest and deps. |
Cargo.toml |
Adds new workspace members and shared workspace dependencies for profiler + macro deps. |
Cargo.lock |
Locks newly introduced dependencies. |
.cargo/config.toml |
Extends clippy-stacks alias to include the new crates. |
.gitignore |
Adds .vite ignore entry. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| let boxed: Box<str> = s.into_boxed_str(); | ||
| let leaked: &'static str = Box::leak(boxed); | ||
| map.insert(leaked.into(), leaked); | ||
| leaked |
There was a problem hiding this comment.
intern_tag_str leaks the boxed string and then does map.insert(leaked.into(), leaked), which allocates/copies a second Box<str> for the key. This defeats the purpose of interning and can significantly increase allocations for high-cardinality tags. Consider changing the interner map key to &'static str (so you can map.insert(leaked, leaked)), or otherwise avoid allocating a second owned copy for the key.
There was a problem hiding this comment.
Valid, will address when I'm in a place to switch back to this branch 👍
| #[inline(always)] | ||
| #[doc(hidden)] | ||
| pub fn end_suppression() { | ||
| SUPPRESS_DEPTH.with(|d| d.set(d.get().wrapping_sub(1))); |
There was a problem hiding this comment.
end_suppression() uses wrapping_sub(1) on the suppression depth. If this is ever called when the depth is already 0 (e.g., due to misuse, double-drop, or manual calls to the #[doc(hidden)] API), the depth will wrap to u32::MAX and the thread will effectively remain suppressed forever. Prefer a debug assertion + saturating/checked decrement to make underflow impossible.
| SUPPRESS_DEPTH.with(|d| d.set(d.get().wrapping_sub(1))); | |
| SUPPRESS_DEPTH.with(|d| { | |
| let depth = d.get(); | |
| debug_assert!( | |
| depth > 0, | |
| "end_suppression called when suppression depth is already zero" | |
| ); | |
| d.set(depth.saturating_sub(1)); | |
| }); |
There was a problem hiding this comment.
Valid although unlikely as the RAII guards should keep things balanced (but as mentioned, assuming it's used via the intended non-hidden APIs).
Will look at this and see whether saturating sub/add or checked.expect() feels more correct for suppression depth in edge cases.
benjamin-stacks
left a comment
There was a problem hiding this comment.
This looks really really great.
I still want to have a closer look at all the meta-programming (I'm not very familiar with macro writing yet) and the timing implementation, but I really like the surface API.
I've left a few small notes on that area, but nothing huge.
I'm really looking forward to using this. We built a very similar thing at Stack Overflow many years ago (https://miniprofiler.com/), and it's super useful to this day.
|
|
||
| /// Estimated idle time (wall − CPU). See [`platform`] for resolution caveats. | ||
| pub fn wait_time(&self) -> Duration { | ||
| Duration::from_nanos(self.wall_time_ns.saturating_sub(self.cpu_time_ns)) |
There was a problem hiding this comment.
Nit: could be de-duplicated.
| Duration::from_nanos(self.wall_time_ns.saturating_sub(self.cpu_time_ns)) | |
| Duration::from_nanos(self.wait_time_ns()) |
| GuardKind::Span => Profiler::end_span(), | ||
| GuardKind::Suppression => Profiler::end_suppression(), |
There was a problem hiding this comment.
This assumes that the guards are dropped in reverse order of creation. That is reasonable (and probably guaranteed when using the macros, I haven't read that far yet), but not guaranteed if someone calls begin_span manually. I think as long as it's a public method, we should either handle it correctly, or at the very least document very well what a caller is expected to do.
I think (but you've probably thought about this more, so feel free to disagree) that it shouldn't be hard (or expensive) to handle it correctly, i.e. if guards are dropped out of order, end any descendant spans alongside.
There was a problem hiding this comment.
My goal is essentially that the macros should be the only instrumentation API surface area that users see/use -- the "internal" methods are public just to enable the cross-crate macro-generated calls, but they're marked #[doc(hidden)] so they won't show up in docs/intellisense.
So, my thinking was rather that if someone digs deep enough to find those and decides to use them, then that's "at your own risk" 😅
| let wall_ns = end_wall.duration_since(start_wall).as_nanos() as u64; | ||
| let cpu_ns = end_cpu_ns.saturating_sub(start_cpu_ns); |
There was a problem hiding this comment.
We could only record the start and end times, and defer the computations to take_results(). Granted, these aren't particularly expensive, so it may or may not be worth it.
| id: &'static SpanId, | ||
| tag: Option<Tag>, | ||
| ) -> NodeId { | ||
| if let Some(last) = self.node(parent).last_child |
There was a problem hiding this comment.
Wondering if it's worth it to grab self.node_mut(parent) once into a variable, and use it here as well. If last_child doesn't match (which seems likely, unless you're in a loop), you're guaranteed to need that mutable reference anyway. Might as well avoid the second call. What do you think?
There was a problem hiding this comment.
iirc there was a borrow issue with that, but can try it 👍
| /// Increment a named counter on the current span (aggregated by key). | ||
| /// | ||
| /// If a counter with this key already exists on the span, `delta` is added to it (saturating). | ||
| /// Otherwise a new counter is created with the given value. |
There was a problem hiding this comment.
I'm wondering if these are the optimal semantics for counters. A minor refactoring would change on what span a counter is incremented (or even split a counter that across multiple spans). This may or may not be what a user actually intends or expects.
Granted, this is not a problem in so far as we're not actually losing data, but I'm wondering if the default PrettyPrinter should at least have the ability to aggregate counters. Alternatively, being able to define the scope of a counter might be really nice, but it may not be feasible in a pretty way.
Is that a question you've considered?
There was a problem hiding this comment.
I spent a little time trying to at least make it reasonable, but when the counters got added it was largely just a "means to an end" optimization over using the more verbose record! for stacks-bench when I needed to count e.g. Clarity execution costs or cache hit/miss.
Completely open to suggestions!
| /// }); | ||
| /// ``` | ||
| #[macro_export] | ||
| macro_rules! span { |
There was a problem hiding this comment.
Seem like there's a (minor) footgun here.
span!("memoized lookup");
if cache.has(foo) {
return cache.get(foo);
}
let _c = span!("one-time computation");
compute_and_cache(foo)If my understanding of drop rules is correct (it may not be!), the first guard only exists in a temporary scope and is immediately dropped, causing the profiling tree to look different than expected. I assume that's exactly why you always assign span! results to variables.
Can we either
- do some
[must_use]magic to warn about this, or - assign the result to a variable that lives for the current scope, so that if nobody takes ownership of the guard, it still works as excpect?
(I don't know if the second thing is even possible, given that the new variable and the let _x = span!() have to live in the same scope, so it might come down to the first point)
There was a problem hiding this comment.
Yeah we should be able to add #[must_use] on ProfileGuard, good idea 👍
There was a problem hiding this comment.
LGTM!
I’ve added a number of minor comments and sanity checks.
One additional point to consider is the placement of these new crates. Should we move them to the contrib folder? If we decide to keep them in their current location, we should verify whether they are "covered" by the Bug Bounty Program, since the contrib folder is entirely excluded.
| let ctx = NodeContext { | ||
| stats, | ||
| depth, | ||
| is_last_sibling: connector == "└── ", |
There was a problem hiding this comment.
This couples the semantic meaning to a specific Unicode string. If the connector strings are ever changed, this could silently breaks. Maybe passing a bool is_last parameter, could do the job.
Alternatively, I could also be fine using the string comparison, making it a global constant reused along this module.
There was a problem hiding this comment.
I was on the fence of whether or not to even include the printing stuff since I pretty much just used it for early testing/debugging together with the examples. So, minimal change and go the const route?
| records: Vec::with_capacity(4), | ||
| counters: Vec::with_capacity(4), | ||
| }); | ||
| idx as NodeId |
There was a problem hiding this comment.
Probably not an issue, but wondering if it could make sense add some assertion to guard that idx <= u32::MAX
There was a problem hiding this comment.
Yeah, probably a good idea from a correctness/soundness perspective, so nobody ends up with truncated ids (though that's a crapload of nodes)
| #[doc(hidden)] | ||
| #[inline(always)] | ||
| pub fn begin_span(id: &'static SpanId, tag: Option<Tag>) -> ProfileGuard { | ||
| let start_wall = Instant::now(); |
There was a problem hiding this comment.
sanity-check: small asymmetry between begin_span and end_span. In "begin", bookkeping is excluded from timing, while in the "end" bookkeeping is included. Maybe acceptable but worth checking
There was a problem hiding this comment.
iirc I structured "end" that way so that untimed/unsampled spans wouldn't incur the FFI/syscall overhead of probing the clocks, since in the actual integrated marf-bench branch some of the spans sit around very tight loops (sampled).
I could probably split this up a bit more with more specific GuardKind variants and more targeted begin_timed()/end_timed(), etc. I'll have a look at it 👍
| // Anonymous Block | ||
| ($($t:tt)*) => {{ | ||
| let _guard = $crate::span!("scope"); | ||
| $($t)* | ||
| }}; |
There was a problem hiding this comment.
sanity-check: this would match even typos in measure! macro, right? Maybe we could convert the catch-all match to a compiler_error!, and right before it we could add measure!({ ... }) for Anonymous Block
A third alternative would be to move it to a completely new project, as a separate open-source project. It might be useful in other Rust projects, far away from the Stacks ecosystem. |
| if let Some(async_token) = &input_fn.sig.asyncness { | ||
| return syn::Error::new_spanned( | ||
| async_token, | ||
| "#[profile] does not support async fn; use span!/measure! inside the function body", |
There was a problem hiding this comment.
So, I understand why profiling async functions is going to be inherently weird, given that they might end up running on who knows what thread.
I'm just wondering about the guidance from this error message -- if #[profile] didn't reject async functions, wouldn't it actually do exactly the same thing as a span! at the beginning of said function?
|
|
||
| // Name, Rate, count_only, Block | ||
| ($name:literal, rate: $rate:literal, count_only, $block:block) => {{ | ||
| let _guard = $crate::span!($name, rate: $rate, count_only); |
There was a problem hiding this comment.
Should we give all these variables a more obvious name, like you did in #[profile]? This variable will appear in the autocomplete, so it might be less confusing to the user what this is if the name is a little more self-explanatory.
There was a problem hiding this comment.
Yeah we can do that, the proc macro came after these and I guess I never made it back around to align var names 👍
| Some($crate::Profiler::begin_span($id, $tag_opt)) | ||
| }}; | ||
|
|
||
| (@should_sample $counter:ident, $rate:literal) => {{ |
There was a problem hiding this comment.
Nit: This is emitting a conditional runtime code that could be moved to compile time (since $rate is literal). In particular, I wouldn't surprised if the is_power_of_two call eats up any gain from splitting the logic in the first place.
Worth it to optimize? (Could totally be a follow-up)
There was a problem hiding this comment.
Hm, in the proc macro it works to do it compile-time, but I'm not sure it's possible in the declarative version? These are all const/const fn, so the actual generated bytecode should be very minimal.
| fn materialize_node(nodes: &mut Vec<Option<Node>>, node_id: NodeId) -> ProfileStats { | ||
| let node = nodes[node_id as usize] | ||
| .take() | ||
| .expect("node already materialized or missing"); |
There was a problem hiding this comment.
💭 I don't think this should be allowed to panic.
I realize that this should be unreachable and is guarding against bugs only, but it feels like a profiler tool should never crash the node (or whatever application). A bug in stacks-profiler logic isn't representing inconsistent application state as far as the profiled application is concerned.
Thoughts?
There was a problem hiding this comment.
Yeah that's fair, can make the "take results" fallible 👍
Sure, that could work as well. As a starting point, we could add it under one of our organizations. We’ve already taken a similar approach with a couple of utilities, for example: We might have more flexibility placing it under the This would also avoid “burdening” the main repository with additional packages that are general purpose. |
Description
Introduces a new lightweight profiler for targeted instrumentation and result extraction, intended primarily for benchmarking/instrumentation builds, and to be feature-gated when wired into runtime paths. This profiler is used extensively by the
stacks-benchtool (follow-up PR).The motivation behind a custom implementation was that none of the existing instrumentation/profiling crates provided the level of control and detail which
stacks-benchneeded, and I also wanted it to be as slim and low-overhead as possible while providing ergonomic primitives to help minimize profiling clutter in core code.This PR adds two new workspace crates:
stacks-profiler: thread-local profiling library for hierarchical spans, wall/CPU/wait timing, tags, records, and counters.stacks-profiler-macros: procedural macro crate for profiler ergonomics, including#[profile](re-exported bystacks-profiler).These crates are added at workspace root (rather than
contrib) because they are intended dependencies of root crates.No existing runtime paths are wired yet, so this PR does not alter any existing behavior.
Included in this PR:
span!,measure!,record!,counter_add!, plus conditional*_if!variants.#[profile]attribute macro (sync functions only;async fnis compile-fail).Non-Goals for this PR:
stackslib,clarity, etc. (follow-up PRs).#[profile]support forasyncfunctions (not currently used in the main codebase).Applicable issues
#118.Checklist