Skip to content

[Experiment] feat(core): add fast_clock cached monotonic clock with benchmark#25370

Draft
pront wants to merge 5 commits intomasterfrom
pavlos/fast-clock-bench
Draft

[Experiment] feat(core): add fast_clock cached monotonic clock with benchmark#25370
pront wants to merge 5 commits intomasterfrom
pavlos/fast-clock-bench

Conversation

@pront
Copy link
Copy Markdown
Member

@pront pront commented May 5, 2026

Summary

Adds vector_common::fast_clock, a coarse cached monotonic clock for hot-path metric instrumentation. Reading recent_millis() / recent_unix_millis() is a single relaxed atomic load (~1 ns); the cached values are refreshed every 25 ms by a background thread. Intended for histogram binning where ms resolution is sufficient and the per-call cost of Instant::now() / Utc::now() shows up in profiles.

This PR adds the primitive plus one real migration: source_sender::Output::send and send_batch now use fast_clock::recent_unix_millis() for their lag-time reference timestamp instead of Utc::now().timestamp_millis(). This is the call exercised on every event in syslog_log2metric_tag_cardinality_limit_blackhole and similar SMP experiments.

Motivation: a recent revert (#25221) of #24987 showed that adding a couple of Instant::now() calls per source send was enough to cause a ~6.5% ingress-throughput regression on syslog_log2metric_tag_cardinality_limit_blackhole in the SMP Regression Detector. A cached-atomic clock lets us instrument those paths without that cost.

Vector configuration

n/a (library addition + benchmark + internal migration)

How did you test this PR?

Run the benchmark

cargo bench --bench fast_clock -p vector-common

Results on my machine

Hardware: Apple M4 Max, 16 cores, 64 GiB RAM
OS: macOS 26.4.1 (arm64)
Toolchain: repo-default

Clock Median vs fast_clock
fast_clock::recent_millis() 0.77 ns 1.0×
fast_clock::recent_unix_millis() 0.78 ns 1.0×
Instant::now() 16.3 ns ~21× slower
Instant::elapsed().as_millis() 23.0 ns ~30× slower
Utc::now().timestamp_millis() 41.2 ns ~53× slower

Caveat: these numbers are on Apple Silicon (arm64). The SMP Regression Detector and Vector production both run on x86_64 Linux. Absolute numbers there will differ (the Linux vDSO path for clock_gettime(CLOCK_REALTIME) is typically 15-25 ns), but the order-of-magnitude shape should hold.

TODO: Run the SMP Regression Detector on this branch. The lag-time reference in source_sender is the per-event call exercised by syslog_log2metric_tag_cardinality_limit_blackhole (and similar source-heavy experiments), so we should see a measurable ingress-throughput improvement vs master.

Unit tests

cargo test -p vector-common fast_clock
cargo test -p vector-core --lib source_sender

The first confirms the background updater is ticking forward and that recent_unix_millis() stays close to SystemTime::now(). The second confirms the existing lag-time tests (emits_lag_time_for_log/metric/trace) still pass after the migration.

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes / Open questions

  1. Lag-time precision. The lag-time reference timestamp is now stale by up to ~25 ms (the updater tick). For a histogram metric covering lag values in the ms-to-seconds range this is well within the noise, but flagging it explicitly for review.
  2. coarsetime crate. coarsetime (~15M downloads, ISC license) does the same thing as a third-party dep. I went with a hand-rolled ~50-line module to avoid a new dep and keep test injection simple, but I'm open to swapping if reviewers prefer a vendored solution.
  3. Tick cadence. Default 25 ms. Could be configurable, but unclear there's a use case — keeping it const for now.
  4. Further migration candidates. Once SMP confirms the win on this PR, follow-ups could migrate the Instant::now() calls in lib/vector-buffers/src/topology/channel/limited_queue.rs (per-push/pop utilization recording) and src/utilization.rs, both of which would also need a small change to vector_common::stats::TimeEwma to accept a u64-millis reference instead of Instant.

🤖 Generated with Claude Code

pront and others added 5 commits May 5, 2026 10:04
A coarse, cached monotonic clock for hot-path metric instrumentation.
Reading recent_millis() is a single relaxed atomic load; the cached
value is refreshed by a background thread every 25ms. Intended for
histogram binning where ms resolution is sufficient and the per-call
cost of Instant::now() shows up in profiles.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tant/Utc

Microbenchmark comparing the read-cost of fast_clock::recent_millis()
against Instant::now(), Instant.elapsed().as_millis() and
Utc::now().timestamp_millis() patterns used elsewhere in Vector for
histogram-binning timestamps.

Run: cargo bench --bench fast_clock -p vector-common

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…riant

Adds a wall-clock companion to recent_millis. The cached value is
refreshed from SystemTime::now() on each updater tick, so reads cost a
single AtomicI64 relaxed load (~0.8 ns) versus ~41 ns for
Utc::now().timestamp_millis(). Suitable for source-lag-time histograms
where ms precision and up-to-25ms staleness are both acceptable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the per-send Utc::now().timestamp_millis() call in
source_sender::Output::send and send_batch with
fast_clock::recent_unix_millis(). This is the call exercised on every
event in syslog_log2metric_tag_cardinality_limit_blackhole and similar
SMP experiments; on M4 Max the read cost drops from ~41 ns to ~0.8 ns
(microbenchmark). Behavior change: lag time reference is now stale by
up to 25 ms, which is well within the noise of a histogram metric.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pront pront changed the title feat(core): add fast_clock cached monotonic clock with benchmark [Experiment] feat(core): add fast_clock cached monotonic clock with benchmark May 5, 2026
@github-actions github-actions Bot added the domain: core Anything related to core crates i.e. vector-core, core-common, etc label May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: core Anything related to core crates i.e. vector-core, core-common, etc work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant