[Experiment] feat(core): add fast_clock cached monotonic clock with benchmark#25370
Draft
[Experiment] feat(core): add fast_clock cached monotonic clock with benchmark#25370
Conversation
A coarse, cached monotonic clock for hot-path metric instrumentation. Reading recent_millis() is a single relaxed atomic load; the cached value is refreshed by a background thread every 25ms. Intended for histogram binning where ms resolution is sufficient and the per-call cost of Instant::now() shows up in profiles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tant/Utc Microbenchmark comparing the read-cost of fast_clock::recent_millis() against Instant::now(), Instant.elapsed().as_millis() and Utc::now().timestamp_millis() patterns used elsewhere in Vector for histogram-binning timestamps. Run: cargo bench --bench fast_clock -p vector-common Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…riant Adds a wall-clock companion to recent_millis. The cached value is refreshed from SystemTime::now() on each updater tick, so reads cost a single AtomicI64 relaxed load (~0.8 ns) versus ~41 ns for Utc::now().timestamp_millis(). Suitable for source-lag-time histograms where ms precision and up-to-25ms staleness are both acceptable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the per-send Utc::now().timestamp_millis() call in source_sender::Output::send and send_batch with fast_clock::recent_unix_millis(). This is the call exercised on every event in syslog_log2metric_tag_cardinality_limit_blackhole and similar SMP experiments; on M4 Max the read cost drops from ~41 ns to ~0.8 ns (microbenchmark). Behavior change: lag time reference is now stale by up to 25 ms, which is well within the noise of a histogram metric. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
vector_common::fast_clock, a coarse cached monotonic clock for hot-path metric instrumentation. Readingrecent_millis()/recent_unix_millis()is a single relaxed atomic load (~1 ns); the cached values are refreshed every 25 ms by a background thread. Intended for histogram binning where ms resolution is sufficient and the per-call cost ofInstant::now()/Utc::now()shows up in profiles.This PR adds the primitive plus one real migration:
source_sender::Output::sendandsend_batchnow usefast_clock::recent_unix_millis()for their lag-time reference timestamp instead ofUtc::now().timestamp_millis(). This is the call exercised on every event insyslog_log2metric_tag_cardinality_limit_blackholeand similar SMP experiments.Motivation: a recent revert (#25221) of #24987 showed that adding a couple of
Instant::now()calls per source send was enough to cause a ~6.5% ingress-throughput regression onsyslog_log2metric_tag_cardinality_limit_blackholein the SMP Regression Detector. A cached-atomic clock lets us instrument those paths without that cost.Vector configuration
n/a (library addition + benchmark + internal migration)
How did you test this PR?
Run the benchmark
Results on my machine
Hardware: Apple M4 Max, 16 cores, 64 GiB RAM
OS: macOS 26.4.1 (arm64)
Toolchain: repo-default
fast_clockfast_clock::recent_millis()fast_clock::recent_unix_millis()Instant::now()Instant::elapsed().as_millis()Utc::now().timestamp_millis()Caveat: these numbers are on Apple Silicon (arm64). The SMP Regression Detector and Vector production both run on x86_64 Linux. Absolute numbers there will differ (the Linux vDSO path for
clock_gettime(CLOCK_REALTIME)is typically 15-25 ns), but the order-of-magnitude shape should hold.TODO: Run the SMP Regression Detector on this branch. The lag-time reference in
source_senderis the per-event call exercised bysyslog_log2metric_tag_cardinality_limit_blackhole(and similar source-heavy experiments), so we should see a measurable ingress-throughput improvement vs master.Unit tests
The first confirms the background updater is ticking forward and that
recent_unix_millis()stays close toSystemTime::now(). The second confirms the existing lag-time tests (emits_lag_time_for_log/metric/trace) still pass after the migration.Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
Notes / Open questions
coarsetimecrate.coarsetime(~15M downloads, ISC license) does the same thing as a third-party dep. I went with a hand-rolled ~50-line module to avoid a new dep and keep test injection simple, but I'm open to swapping if reviewers prefer a vendored solution.Instant::now()calls inlib/vector-buffers/src/topology/channel/limited_queue.rs(per-push/pop utilization recording) andsrc/utilization.rs, both of which would also need a small change tovector_common::stats::TimeEwmato accept au64-millis reference instead ofInstant.🤖 Generated with Claude Code