[Experiment] feat(core): add fast_clock cached monotonic clock with benchmark by pront · Pull Request #25370 · vectordotdev/vector

pront · 2026-05-05T14:09:21Z

Summary

Adds vector_common::fast_clock, a coarse cached monotonic clock for hot-path metric instrumentation. Reading recent_millis() / recent_unix_millis() is a single relaxed atomic load (~1 ns); the cached values are refreshed every 25 ms by a background thread. Intended for histogram binning where ms resolution is sufficient and the per-call cost of Instant::now() / Utc::now() shows up in profiles.

This PR adds the primitive plus one real migration: source_sender::Output::send and send_batch now use fast_clock::recent_unix_millis() for their lag-time reference timestamp instead of Utc::now().timestamp_millis(). This is the call exercised on every event in syslog_log2metric_tag_cardinality_limit_blackhole and similar SMP experiments.

Motivation: a recent revert (#25221) of #24987 showed that adding a couple of Instant::now() calls per source send was enough to cause a ~6.5% ingress-throughput regression on syslog_log2metric_tag_cardinality_limit_blackhole in the SMP Regression Detector. A cached-atomic clock lets us instrument those paths without that cost.

Vector configuration

n/a (library addition + benchmark + internal migration)

How did you test this PR?

Run the benchmark

cargo bench --bench fast_clock -p vector-common

Results on my machine

Hardware: Apple M4 Max, 16 cores, 64 GiB RAM
OS: macOS 26.4.1 (arm64)
Toolchain: repo-default

Clock	Median	vs `fast_clock`
`fast_clock::recent_millis()`	0.77 ns	1.0×
`fast_clock::recent_unix_millis()`	0.78 ns	1.0×
`Instant::now()`	16.3 ns	~21× slower
`Instant::elapsed().as_millis()`	23.0 ns	~30× slower
`Utc::now().timestamp_millis()`	41.2 ns	~53× slower

Caveat: these numbers are on Apple Silicon (arm64). The SMP Regression Detector and Vector production both run on x86_64 Linux. Absolute numbers there will differ (the Linux vDSO path for clock_gettime(CLOCK_REALTIME) is typically 15-25 ns), but the order-of-magnitude shape should hold.

TODO: Run the SMP Regression Detector on this branch. The lag-time reference in source_sender is the per-event call exercised by syslog_log2metric_tag_cardinality_limit_blackhole (and similar source-heavy experiments), so we should see a measurable ingress-throughput improvement vs master.

Unit tests

cargo test -p vector-common fast_clock
cargo test -p vector-core --lib source_sender

The first confirms the background updater is ticking forward and that recent_unix_millis() stays close to SystemTime::now(). The second confirms the existing lag-time tests (emits_lag_time_for_log/metric/trace) still pass after the migration.

Change Type

Is this a breaking change?

Yes
No

Does this PR include user facing changes?

Yes. Please add a changelog fragment based on our guidelines.
No. A maintainer will apply the no-changelog label to this PR.

References

Related: feat(sources): add source latency metric and fix source lag time on large batches #24987 (the change that triggered the regression)
Related: revert(sources): revert "add source latency metric and fix source lag time on large batches (#24987)" #25221 (the revert)

Notes / Open questions

Lag-time precision. The lag-time reference timestamp is now stale by up to ~25 ms (the updater tick). For a histogram metric covering lag values in the ms-to-seconds range this is well within the noise, but flagging it explicitly for review.
coarsetime crate. coarsetime (~15M downloads, ISC license) does the same thing as a third-party dep. I went with a hand-rolled ~50-line module to avoid a new dep and keep test injection simple, but I'm open to swapping if reviewers prefer a vendored solution.
Tick cadence. Default 25 ms. Could be configurable, but unclear there's a use case — keeping it const for now.
Further migration candidates. Once SMP confirms the win on this PR, follow-ups could migrate the Instant::now() calls in lib/vector-buffers/src/topology/channel/limited_queue.rs (per-push/pop utilization recording) and src/utilization.rs, both of which would also need a small change to vector_common::stats::TimeEwma to accept a u64-millis reference instead of Instant.

🤖 Generated with Claude Code

A coarse, cached monotonic clock for hot-path metric instrumentation. Reading recent_millis() is a single relaxed atomic load; the cached value is refreshed by a background thread every 25ms. Intended for histogram binning where ms resolution is sufficient and the per-call cost of Instant::now() shows up in profiles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tant/Utc Microbenchmark comparing the read-cost of fast_clock::recent_millis() against Instant::now(), Instant.elapsed().as_millis() and Utc::now().timestamp_millis() patterns used elsewhere in Vector for histogram-binning timestamps. Run: cargo bench --bench fast_clock -p vector-common Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…riant Adds a wall-clock companion to recent_millis. The cached value is refreshed from SystemTime::now() on each updater tick, so reads cost a single AtomicI64 relaxed load (~0.8 ns) versus ~41 ns for Utc::now().timestamp_millis(). Suitable for source-lag-time histograms where ms precision and up-to-25ms staleness are both acceptable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the per-send Utc::now().timestamp_millis() call in source_sender::Output::send and send_batch with fast_clock::recent_unix_millis(). This is the call exercised on every event in syslog_log2metric_tag_cardinality_limit_blackhole and similar SMP experiments; on M4 Max the read cost drops from ~41 ns to ~0.8 ns (microbenchmark). Behavior change: lag time reference is now stale by up to 25 ms, which is well within the noise of a histogram metric. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pront and others added 5 commits May 5, 2026 10:04

chore(vector-common): cargo fmt fast_clock test assert

8869ac4

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pront added the work in progress label May 5, 2026

pront changed the title ~~feat(core): add fast_clock cached monotonic clock with benchmark~~ [Experiment] feat(core): add fast_clock cached monotonic clock with benchmark May 5, 2026

github-actions Bot added the domain: core Anything related to core crates i.e. vector-core, core-common, etc label May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Experiment] feat(core): add fast_clock cached monotonic clock with benchmark#25370

[Experiment] feat(core): add fast_clock cached monotonic clock with benchmark#25370
pront wants to merge 5 commits intomasterfrom
pavlos/fast-clock-bench

pront commented May 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pront commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Vector configuration

How did you test this PR?

Run the benchmark

Results on my machine

Unit tests

Change Type

Is this a breaking change?

Does this PR include user facing changes?

References

Notes / Open questions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pront commented May 5, 2026 •

edited

Loading