Skip to content

feat: new profiling and instrumentation utility crates stacks-profiler & stacks-profiler-macros#6905

Open
cylewitruk-stacks wants to merge 2 commits intostacks-network:developfrom
cylewitruk-stacks:feat/stacks-profiler
Open

feat: new profiling and instrumentation utility crates stacks-profiler & stacks-profiler-macros#6905
cylewitruk-stacks wants to merge 2 commits intostacks-network:developfrom
cylewitruk-stacks:feat/stacks-profiler

Conversation

@cylewitruk-stacks
Copy link
Copy Markdown
Contributor

@cylewitruk-stacks cylewitruk-stacks commented Feb 16, 2026

Description

Introduces a new lightweight profiler for targeted instrumentation and result extraction, intended primarily for benchmarking/instrumentation builds, and to be feature-gated when wired into runtime paths. This profiler is used extensively by the stacks-bench tool (follow-up PR).

The motivation behind a custom implementation was that none of the existing instrumentation/profiling crates provided the level of control and detail which stacks-bench needed, and I also wanted it to be as slim and low-overhead as possible while providing ergonomic primitives to help minimize profiling clutter in core code.

This PR adds two new workspace crates:

  • stacks-profiler: thread-local profiling library for hierarchical spans, wall/CPU/wait timing, tags, records, and counters.
  • stacks-profiler-macros: procedural macro crate for profiler ergonomics, including #[profile] (re-exported by stacks-profiler).

These crates are added at workspace root (rather than contrib) because they are intended dependencies of root crates.

No existing runtime paths are wired yet, so this PR does not alter any existing behavior.

Included in this PR:

  • Core profiler state/model and platform CPU-time support.
  • Instrumentation macros: span!, measure!, record!, counter_add!, plus conditional *_if! variants.
  • #[profile] attribute macro (sync functions only; async fn is compile-fail).
  • Crate docs/readme, examples, benchmarks, and integration/doctest coverage.

Non-Goals for this PR:

  • Wiring profiler calls into stackslib, clarity, etc. (follow-up PRs).
  • #[profile] support for async functions (not currently used in the main codebase).
  • Truly supporting Windows (I added a simple impl for completeness, but the timing resolution is coarse).

Applicable issues

  • Related to core epic #118.

Checklist

  • Test coverage for new or modified code paths
  • Changelog is updated

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Feb 16, 2026

CLA assistant check
All committers have signed the CLA.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 16, 2026

Codecov Report

❌ Patch coverage is 40.63604% with 336 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.34%. Comparing base (0317850) to head (7ce2e88).
⚠️ Report is 222 commits behind head on develop.

Files with missing lines Patch % Lines
stacks-profiler/src/print.rs 0.00% 115 Missing ⚠️
stacks-profiler/src/lib.rs 49.28% 107 Missing ⚠️
stacks-profiler-macros/src/lib.rs 0.00% 54 Missing ⚠️
stacks-profiler/src/types.rs 0.00% 54 Missing ⚠️
stacks-profiler/src/state.rs 94.91% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #6905      +/-   ##
===========================================
- Coverage    72.67%   72.34%   -0.33%     
===========================================
  Files          411      425      +14     
  Lines       221663   223697    +2034     
  Branches         0      338     +338     
===========================================
+ Hits        161086   161844     +758     
- Misses       60577    61853    +1276     
Files with missing lines Coverage Δ
stacks-profiler/src/macros.rs 100.00% <100.00%> (ø)
stacks-profiler/src/platform.rs 100.00% <100.00%> (ø)
stacks-profiler/src/state.rs 94.91% <94.91%> (ø)
stacks-profiler-macros/src/lib.rs 0.00% <0.00%> (ø)
stacks-profiler/src/types.rs 0.00% <0.00%> (ø)
stacks-profiler/src/lib.rs 49.28% <49.28%> (ø)
stacks-profiler/src/print.rs 0.00% <0.00%> (ø)

... and 313 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0317850...7ce2e88. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds two new workspace crates (stacks-profiler and stacks-profiler-macros) providing a lightweight, thread-local hierarchical span profiler with sampling, tagging, records/counters, and tree printing—intended for instrumentation/benchmarking usage and future wiring into runtime paths.

Changes:

  • Introduces the core profiler implementation (TLS state/arena, span lifecycle, CPU-time support) plus printing utilities.
  • Adds ergonomic instrumentation macros (span!, measure!, record!, counter_add!, and conditional variants) and a #[profile] proc-macro.
  • Adds integration tests, examples, benchmarks, and wires the new crates + dependencies into the workspace/config.

Reviewed changes

Copilot reviewed 17 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
stacks-profiler/src/lib.rs Core profiler API, TLS state wiring, span/guard implementation, tag interning.
stacks-profiler/src/macros.rs Defines span!/measure!/record/counter macros and conditional variants.
stacks-profiler/src/state.rs Per-thread arena/stack model and results materialization.
stacks-profiler/src/types.rs Record/counter/tag core value types and conversions.
stacks-profiler/src/platform.rs Platform-specific per-thread CPU-time implementation.
stacks-profiler/src/print.rs Tree formatter trait + default ANSI pretty-printer implementation.
stacks-profiler/tests/integration.rs Integration coverage for nesting, sampling, suppression, count-only, panic safety, threading isolation.
stacks-profiler/examples/basics.rs Basic end-to-end usage example of spans/measure/#[profile].
stacks-profiler/examples/aggregation.rs Example demonstrating aggregation + sampling modifiers.
stacks-profiler/examples/cpu_vs_wait.rs Example showcasing wall vs CPU vs wait and metadata attachment.
stacks-profiler/benches/overhead.rs Criterion benchmarks for overhead and sampling modes.
stacks-profiler/README.md Crate-level documentation and usage reference.
stacks-profiler/Cargo.toml New profiler crate manifest and deps.
stacks-profiler-macros/src/lib.rs #[profile] proc-macro implementation (sync-only; async compile-fail).
stacks-profiler-macros/Cargo.toml Proc-macro crate manifest and deps.
Cargo.toml Adds new workspace members and shared workspace dependencies for profiler + macro deps.
Cargo.lock Locks newly introduced dependencies.
.cargo/config.toml Extends clippy-stacks alias to include the new crates.
.gitignore Adds .vite ignore entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +109 to +113
}
let boxed: Box<str> = s.into_boxed_str();
let leaked: &'static str = Box::leak(boxed);
map.insert(leaked.into(), leaked);
leaked
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intern_tag_str leaks the boxed string and then does map.insert(leaked.into(), leaked), which allocates/copies a second Box<str> for the key. This defeats the purpose of interning and can significantly increase allocations for high-cardinality tags. Consider changing the interner map key to &'static str (so you can map.insert(leaked, leaked)), or otherwise avoid allocating a second owned copy for the key.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid, will address when I'm in a place to switch back to this branch 👍

#[inline(always)]
#[doc(hidden)]
pub fn end_suppression() {
SUPPRESS_DEPTH.with(|d| d.set(d.get().wrapping_sub(1)));
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

end_suppression() uses wrapping_sub(1) on the suppression depth. If this is ever called when the depth is already 0 (e.g., due to misuse, double-drop, or manual calls to the #[doc(hidden)] API), the depth will wrap to u32::MAX and the thread will effectively remain suppressed forever. Prefer a debug assertion + saturating/checked decrement to make underflow impossible.

Suggested change
SUPPRESS_DEPTH.with(|d| d.set(d.get().wrapping_sub(1)));
SUPPRESS_DEPTH.with(|d| {
let depth = d.get();
debug_assert!(
depth > 0,
"end_suppression called when suppression depth is already zero"
);
d.set(depth.saturating_sub(1));
});

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

@cylewitruk-stacks cylewitruk-stacks Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid although unlikely as the RAII guards should keep things balanced (but as mentioned, assuming it's used via the intended non-hidden APIs).

Will look at this and see whether saturating sub/add or checked.expect() feels more correct for suppression depth in edge cases.

Copy link
Copy Markdown
Contributor

@benjamin-stacks benjamin-stacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really really great.

I still want to have a closer look at all the meta-programming (I'm not very familiar with macro writing yet) and the timing implementation, but I really like the surface API.

I've left a few small notes on that area, but nothing huge.

I'm really looking forward to using this. We built a very similar thing at Stack Overflow many years ago (https://miniprofiler.com/), and it's super useful to this day.


/// Estimated idle time (wall − CPU). See [`platform`] for resolution caveats.
pub fn wait_time(&self) -> Duration {
Duration::from_nanos(self.wall_time_ns.saturating_sub(self.cpu_time_ns))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could be de-duplicated.

Suggested change
Duration::from_nanos(self.wall_time_ns.saturating_sub(self.cpu_time_ns))
Duration::from_nanos(self.wait_time_ns())

Comment on lines +516 to +517
GuardKind::Span => Profiler::end_span(),
GuardKind::Suppression => Profiler::end_suppression(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that the guards are dropped in reverse order of creation. That is reasonable (and probably guaranteed when using the macros, I haven't read that far yet), but not guaranteed if someone calls begin_span manually. I think as long as it's a public method, we should either handle it correctly, or at the very least document very well what a caller is expected to do.

I think (but you've probably thought about this more, so feel free to disagree) that it shouldn't be hard (or expensive) to handle it correctly, i.e. if guards are dropped out of order, end any descendant spans alongside.

Copy link
Copy Markdown
Contributor Author

@cylewitruk-stacks cylewitruk-stacks Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal is essentially that the macros should be the only instrumentation API surface area that users see/use -- the "internal" methods are public just to enable the cross-crate macro-generated calls, but they're marked #[doc(hidden)] so they won't show up in docs/intellisense.

So, my thinking was rather that if someone digs deep enough to find those and decides to use them, then that's "at your own risk" 😅

Comment on lines +372 to +373
let wall_ns = end_wall.duration_since(start_wall).as_nanos() as u64;
let cpu_ns = end_cpu_ns.saturating_sub(start_cpu_ns);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could only record the start and end times, and defer the computations to take_results(). Granted, these aren't particularly expensive, so it may or may not be worth it.

id: &'static SpanId,
tag: Option<Tag>,
) -> NodeId {
if let Some(last) = self.node(parent).last_child
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if it's worth it to grab self.node_mut(parent) once into a variable, and use it here as well. If last_child doesn't match (which seems likely, unless you're in a loop), you're guaranteed to need that mutable reference anyway. Might as well avoid the second call. What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc there was a borrow issue with that, but can try it 👍

/// Increment a named counter on the current span (aggregated by key).
///
/// If a counter with this key already exists on the span, `delta` is added to it (saturating).
/// Otherwise a new counter is created with the given value.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if these are the optimal semantics for counters. A minor refactoring would change on what span a counter is incremented (or even split a counter that across multiple spans). This may or may not be what a user actually intends or expects.

Granted, this is not a problem in so far as we're not actually losing data, but I'm wondering if the default PrettyPrinter should at least have the ability to aggregate counters. Alternatively, being able to define the scope of a counter might be really nice, but it may not be feasible in a pretty way.

Is that a question you've considered?

Copy link
Copy Markdown
Contributor Author

@cylewitruk-stacks cylewitruk-stacks Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a little time trying to at least make it reasonable, but when the counters got added it was largely just a "means to an end" optimization over using the more verbose record! for stacks-bench when I needed to count e.g. Clarity execution costs or cache hit/miss.

Completely open to suggestions!

/// });
/// ```
#[macro_export]
macro_rules! span {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seem like there's a (minor) footgun here.

span!("memoized lookup");
if cache.has(foo) {
  return cache.get(foo);
}
let _c = span!("one-time computation");
compute_and_cache(foo)

If my understanding of drop rules is correct (it may not be!), the first guard only exists in a temporary scope and is immediately dropped, causing the profiling tree to look different than expected. I assume that's exactly why you always assign span! results to variables.

Can we either

  • do some [must_use] magic to warn about this, or
  • assign the result to a variable that lives for the current scope, so that if nobody takes ownership of the guard, it still works as excpect?

(I don't know if the second thing is even possible, given that the new variable and the let _x = span!() have to live in the same scope, so it might come down to the first point)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we should be able to add #[must_use] on ProfileGuard, good idea 👍

Copy link
Copy Markdown
Contributor

@federico-stacks federico-stacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I’ve added a number of minor comments and sanity checks.

One additional point to consider is the placement of these new crates. Should we move them to the contrib folder? If we decide to keep them in their current location, we should verify whether they are "covered" by the Bug Bounty Program, since the contrib folder is entirely excluded.

let ctx = NodeContext {
stats,
depth,
is_last_sibling: connector == "└── ",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This couples the semantic meaning to a specific Unicode string. If the connector strings are ever changed, this could silently breaks. Maybe passing a bool is_last parameter, could do the job.

Alternatively, I could also be fine using the string comparison, making it a global constant reused along this module.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was on the fence of whether or not to even include the printing stuff since I pretty much just used it for early testing/debugging together with the examples. So, minimal change and go the const route?

records: Vec::with_capacity(4),
counters: Vec::with_capacity(4),
});
idx as NodeId
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not an issue, but wondering if it could make sense add some assertion to guard that idx <= u32::MAX

Copy link
Copy Markdown
Contributor Author

@cylewitruk-stacks cylewitruk-stacks Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably a good idea from a correctness/soundness perspective, so nobody ends up with truncated ids (though that's a crapload of nodes)

#[doc(hidden)]
#[inline(always)]
pub fn begin_span(id: &'static SpanId, tag: Option<Tag>) -> ProfileGuard {
let start_wall = Instant::now();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanity-check: small asymmetry between begin_span and end_span. In "begin", bookkeping is excluded from timing, while in the "end" bookkeeping is included. Maybe acceptable but worth checking

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc I structured "end" that way so that untimed/unsampled spans wouldn't incur the FFI/syscall overhead of probing the clocks, since in the actual integrated marf-bench branch some of the spans sit around very tight loops (sampled).

I could probably split this up a bit more with more specific GuardKind variants and more targeted begin_timed()/end_timed(), etc. I'll have a look at it 👍

Comment on lines +100 to +104
// Anonymous Block
($($t:tt)*) => {{
let _guard = $crate::span!("scope");
$($t)*
}};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanity-check: this would match even typos in measure! macro, right? Maybe we could convert the catch-all match to a compiler_error!, and right before it we could add measure!({ ... }) for Anonymous Block

@benjamin-stacks
Copy link
Copy Markdown
Contributor

Should we move them to the contrib folder?

A third alternative would be to move it to a completely new project, as a separate open-source project. It might be useful in other Rust projects, far away from the Stacks ecosystem.

if let Some(async_token) = &input_fn.sig.asyncness {
return syn::Error::new_spanned(
async_token,
"#[profile] does not support async fn; use span!/measure! inside the function body",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I understand why profiling async functions is going to be inherently weird, given that they might end up running on who knows what thread.

I'm just wondering about the guidance from this error message -- if #[profile] didn't reject async functions, wouldn't it actually do exactly the same thing as a span! at the beginning of said function?


// Name, Rate, count_only, Block
($name:literal, rate: $rate:literal, count_only, $block:block) => {{
let _guard = $crate::span!($name, rate: $rate, count_only);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we give all these variables a more obvious name, like you did in #[profile]? This variable will appear in the autocomplete, so it might be less confusing to the user what this is if the name is a little more self-explanatory.

Copy link
Copy Markdown
Contributor Author

@cylewitruk-stacks cylewitruk-stacks Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we can do that, the proc macro came after these and I guess I never made it back around to align var names 👍

Some($crate::Profiler::begin_span($id, $tag_opt))
}};

(@should_sample $counter:ident, $rate:literal) => {{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This is emitting a conditional runtime code that could be moved to compile time (since $rate is literal). In particular, I wouldn't surprised if the is_power_of_two call eats up any gain from splitting the logic in the first place.

Worth it to optimize? (Could totally be a follow-up)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, in the proc macro it works to do it compile-time, but I'm not sure it's possible in the declarative version? These are all const/const fn, so the actual generated bytecode should be very minimal.

fn materialize_node(nodes: &mut Vec<Option<Node>>, node_id: NodeId) -> ProfileStats {
let node = nodes[node_id as usize]
.take()
.expect("node already materialized or missing");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 I don't think this should be allowed to panic.

I realize that this should be unreachable and is guarding against bugs only, but it feels like a profiler tool should never crash the node (or whatever application). A bug in stacks-profiler logic isn't representing inconsistent application state as far as the profiled application is concerned.

Thoughts?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's fair, can make the "take results" fallible 👍

@federico-stacks
Copy link
Copy Markdown
Contributor

Should we move them to the contrib folder?

A third alternative would be to move it to a completely new project, as a separate open-source project. It might be useful in other Rust projects, far away from the Stacks ecosystem.

Sure, that could work as well. As a starting point, we could add it under one of our organizations. We’ve already taken a similar approach with a couple of utilities, for example:

We might have more flexibility placing it under the stx-labs organization, as we did with pinny-rs.

This would also avoid “burdening” the main repository with additional packages that are general purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants