obs/ash: Performance improvements#170900
Open
alyshanjahani-crl wants to merge 3 commits into
Open
Conversation
Contributor
|
Merging to
After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here |
Member
ccee925 to
78d0d15
Compare
Collaborator
Author
|
The KV0 roachtest ( I did one run after including the fixes in this commit with ASH enabled, and observed a throughput of The KV50 roachtest is where we saw more of a regression and filed the linked issue and lead to this PR as a result: The microbenchmark added in this PR showed: |
Previously, SetWorkState was called unconditionally at the top of wait(), meaning every latch acquisition paid the ASH registration cost even when no conflicting latches existed. In the common uncontended case, the iterator finds no conflicts and returns immediately, making the SetWorkState/clearWorkState pair pure overhead. Move the SetWorkState call into iterAndWait so it only fires when the goroutine actually needs to block on a conflicting latch. This also passes the full Guard into iterAndWait (instead of just the poison.Policy) so the workload metadata is available at the call site. Part of: cockroachdb#170822 Epic: CRDB-63845 Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
The global retiredWorkStates mutex was a contention point under high QPS: every clearWorkState call across all goroutines serialized on the same lock to append to the retired list. Replace the single global retired list with per-shard retired lists embedded in workStateMapShard. Each shard's retired list has its own mutex and a proportional cap (maxRetiredWorkStates / numShards), eliminating cross-shard contention on the retire path. Part of: cockroachdb#170822 Epic: CRDB-63845 Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
ASH's access pattern — unique, never-reused goroutine IDs as keys — hits the worst case for syncutil.Map (a fork of sync.Map). Every SetWorkState inserts a new key, forcing the slow mutex path. When the sampler calls rangeWorkStates (via Range), it promotes the dirty map to read and sets dirty=nil. The very next Store then triggers dirtyLocked(), which allocates a new map and copies all non-expunged entries from read. Over a 30-minute kv50 benchmark, this creates synchronized 1-second bursts where every goroutine serializes through the mutex during the dirty map rebuild. Replace syncutil.Map with a plain map[int64]*WorkState protected by a per-shard syncutil.Mutex. This eliminates: - The dirty map rebuild cycle triggered by Range() - Stale expunged entry accumulation (delete actually removes keys) - The double lock acquisition per SetWorkState (Load then Store) rangeWorkStates now snapshots each shard's entries (value-copied) under the shard lock, then iterates the snapshot outside the lock. This keeps lock hold time proportional to shard size rather than callback execution time. BenchmarkSetWorkStateWithRange (unique goids + continuous Range) shows a 37% latency improvement and 63% fewer allocations per op. Part of: cockroachdb#170822 Epic: CRDB-63845 Release note: None Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
78d0d15 to
b37abe5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See individual commits for details.
Fixes: #170822
Epic: https://cockroachlabs.atlassian.net/browse/CRDB-63845
Release note: None