Improve metric contention perf#8077
Conversation
| // (AggregatorHolder), and so if a recording thread encounters an odd value, | ||
| // all it needs to do is release the "read lock" it just obtained (decrementing by 2), | ||
| // and then grab and record against the new current interval (AggregatorHolder). | ||
| private final AtomicInteger activeRecordingThreads = new AtomicInteger(0); |
There was a problem hiding this comment.
TODO: update javadoc comments to reflect current design before merging
| forThread().unlock(); | ||
| } | ||
|
|
||
| @SuppressWarnings("ThreadPriorityCheck") |
There was a problem hiding this comment.
TODO: remove leftover annotation from a previous revision
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8077 +/- ##
============================================
+ Coverage 90.20% 90.21% +0.01%
- Complexity 7594 7595 +1
============================================
Files 841 841
Lines 22915 22943 +28
Branches 2290 2296 +6
============================================
+ Hits 20670 20698 +28
- Misses 1529 1531 +2
+ Partials 716 714 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| // all it needs to do is release the "read lock" it just obtained (decrementing by 2), | ||
| // and then grab and record against the new current interval (AggregatorHolder). | ||
| private final AtomicInteger activeRecordingThreads = new AtomicInteger(0); | ||
| private final ReentrantLock[] locks; |
There was a problem hiding this comment.
Thinking out loud:
Initially I went with a striped set of AtomicInt here: AtomicInteger[]
But a striped set of ReentrantLocks was more performant in the benchmark:
| Benchmark | Temporality | Cardinality | Instrument Type | With Striped Atomic Long | With Striped Reentrant Lock | Change % |
|---|---|---|---|---|---|---|
| threads1 | DELTA | 1 | COUNTER_SUM | 10624.568 | 8113.116 | -23.6% |
| threads1 | DELTA | 1 | UP_DOWN_COUNTER_SUM | 7051.987 | 7116.562 | +0.9% |
| threads1 | DELTA | 1 | GAUGE_LAST_VALUE | 3004.456 | 3579.052 | +19.1% |
| threads1 | DELTA | 1 | HISTOGRAM_EXPLICIT | 6193.367 | 5108.527 | -17.5% |
| threads1 | DELTA | 1 | HISTOGRAM_BASE2_EXPONENTIAL | 4068.654 | 3767.116 | -7.4% |
| threads1 | DELTA | 100 | COUNTER_SUM | 7345.512 | 7705.679 | +4.9% |
| threads1 | DELTA | 100 | UP_DOWN_COUNTER_SUM | 6728.076 | 7870.198 | +17.0% |
| threads1 | DELTA | 100 | GAUGE_LAST_VALUE | 2816.403 | 2998.816 | +6.5% |
| threads1 | DELTA | 100 | HISTOGRAM_EXPLICIT | 6707.058 | 4143.646 | -38.2% |
| threads1 | DELTA | 100 | HISTOGRAM_BASE2_EXPONENTIAL | 3486.534 | 3668.988 | +5.2% |
| threads1 | CUMULATIVE | 1 | COUNTER_SUM | 14545.199 | 15211.867 | +4.6% |
| threads1 | CUMULATIVE | 1 | UP_DOWN_COUNTER_SUM | 14764.813 | 15272.696 | +3.4% |
| threads1 | CUMULATIVE | 1 | GAUGE_LAST_VALUE | 5054.376 | 7315.585 | +44.7% |
| threads1 | CUMULATIVE | 1 | HISTOGRAM_EXPLICIT | 8409.192 | 6646.129 | -21.0% |
| threads1 | CUMULATIVE | 1 | HISTOGRAM_BASE2_EXPONENTIAL | 3602.837 | 4202.725 | +16.7% |
| threads1 | CUMULATIVE | 100 | COUNTER_SUM | 7977.291 | 8553.166 | +7.2% |
| threads1 | CUMULATIVE | 100 | UP_DOWN_COUNTER_SUM | 8651.882 | 9148.823 | +5.7% |
| threads1 | CUMULATIVE | 100 | GAUGE_LAST_VALUE | 8066.028 | 8658.896 | +7.4% |
| threads1 | CUMULATIVE | 100 | HISTOGRAM_EXPLICIT | 6841.747 | 5590.956 | -18.3% |
| threads1 | CUMULATIVE | 100 | HISTOGRAM_BASE2_EXPONENTIAL | 3769.735 | 4061.193 | +7.7% |
| threads4 | DELTA | 1 | COUNTER_SUM | 1036.584 | 3642.403 | +251.4% |
| threads4 | DELTA | 1 | UP_DOWN_COUNTER_SUM | 953.509 | 2202.933 | +131.0% |
| threads4 | DELTA | 1 | GAUGE_LAST_VALUE | 1063.733 | 1886.270 | +77.3% |
| threads4 | DELTA | 1 | HISTOGRAM_EXPLICIT | 1657.467 | 2254.040 | +36.0% |
| threads4 | DELTA | 1 | HISTOGRAM_BASE2_EXPONENTIAL | 928.849 | 1623.778 | +74.8% |
| threads4 | DELTA | 100 | COUNTER_SUM | 1593.081 | 4987.529 | +213.1% |
| threads4 | DELTA | 100 | UP_DOWN_COUNTER_SUM | 3318.371 | 4880.035 | +47.1% |
| threads4 | DELTA | 100 | GAUGE_LAST_VALUE | 2630.933 | 3816.537 | +45.1% |
| threads4 | DELTA | 100 | HISTOGRAM_EXPLICIT | 2153.202 | 5494.981 | +155.2% |
| threads4 | DELTA | 100 | HISTOGRAM_BASE2_EXPONENTIAL | 2039.287 | 3385.470 | +66.0% |
| threads4 | CUMULATIVE | 1 | COUNTER_SUM | 4189.110 | 6061.089 | +44.7% |
| threads4 | CUMULATIVE | 1 | UP_DOWN_COUNTER_SUM | 4060.842 | 6635.251 | +63.4% |
| threads4 | CUMULATIVE | 1 | GAUGE_LAST_VALUE | 1386.341 | 1971.709 | +42.2% |
| threads4 | CUMULATIVE | 1 | HISTOGRAM_EXPLICIT | 1588.722 | 3745.206 | +135.7% |
| threads4 | CUMULATIVE | 1 | HISTOGRAM_BASE2_EXPONENTIAL | 1159.695 | 1818.146 | +56.8% |
| threads4 | CUMULATIVE | 100 | COUNTER_SUM | 4060.759 | 7510.675 | +85.0% |
| threads4 | CUMULATIVE | 100 | UP_DOWN_COUNTER_SUM | 4674.983 | 6670.130 | +42.7% |
| threads4 | CUMULATIVE | 100 | GAUGE_LAST_VALUE | 4033.773 | 6777.530 | +68.0% |
| threads4 | CUMULATIVE | 100 | HISTOGRAM_EXPLICIT | 3511.330 | 9629.410 | +174.3% |
| threads4 | CUMULATIVE | 100 | HISTOGRAM_BASE2_EXPONENTIAL | 3415.420 | 5692.648 | +66.7% |
But this might not be the whole picture:
- The benchmarks show improvement in the
threads4cumulative case, despite the cumulative path being unchanged here. So there's definitely some noise / variance in the benchmark. - The lock approach should never have contention if each recording thread is obtaining its own lock. And this should be true some of the time. But its not true all the time since I use
threadId % locks.lengthto select the lock. This means that there could be cases where the recording threadIds hit the same lock, and then are blocked waiting for the other to complete. TheAtomicInteger[]design may be more expensive, but allows multiple recording threads to access to the same underlying aggregator concurrently.
|
What are your thoughts on the single threaded perf regression vs the multi-threaded perf improvement? |
DoubleExplicitBucketHistogramAggregatorlocking mechanism to use a striped set of cells approach.Summary view of perf changes to
MetricRecordBenchmarkops/son my machine:Raw benchmark output before and after
Before (on my machine):
After (on my machine):