Skip to content

Ssd/one queue#170144

Closed
stevendanna wants to merge 5 commits into
cockroachdb:masterfrom
stevendanna:ssd/one-queue
Closed

Ssd/one queue#170144
stevendanna wants to merge 5 commits into
cockroachdb:masterfrom
stevendanna:ssd/one-queue

Conversation

@stevendanna
Copy link
Copy Markdown
Collaborator

No description provided.

stevendanna and others added 5 commits May 11, 2026 11:05
Add an explicit BurstFrac field to ResourceGroupConfig and wire it
through to groupInfo. Previously, the burst bucket refill fraction
was derived from weight/100 in refillRMGroupBurstBuckets. Making it
an explicit config field decouples burst budget from fair-share
weight, which is needed for the upcoming single-queue unification
where the system tenant will have a high weight but a burst fraction
of 1.0 while non-system tenants will have weight=1 with a burst
fraction of 0.2.

Epic: none
Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Replace the two WorkQueue refill methods (refillBurstBuckets for
serverless, refillRMGroupBurstBuckets for RM) with a single
refillGroupBurstBuckets(rate100, cap100) that iterates all groups
and scales per-group amounts by group.burstFrac.

Both strategies now recover the 100%-CPU rate from tier-0 canBurst
tokens (rate100 = tokens[0][canBurst] / canBurstTarget) and forward
it to the queue. serverlessStrategy gains a canBurstTarget field
and applies the same rate100 to both queues.

Semantic change: burst budget is now "BurstFrac of 100%-CPU rate"
rather than "noBurst/4 of per-tier rate". At default settings
(target=0.8, BurstFrac=0.2) they agree; at non-default they
diverge, with the 100%-CPU framing being target-independent.

Epic: none
Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Define the ResourceGroupConfig that will be used for the system tenant
when the CPU time token AC collapses to a single WorkQueue. The system
tenant gets Weight=MaxUint32 (effectively infinite priority in
fair-share ordering), BurstFrac=1.0 (full 100%-CPU burst budget), and
MaxCPU=true (always qualifies for burst regardless of bucket fullness).

The constant is not yet installed in the default holder seed; it will
be activated in a follow-up commit that routes all work to a single
queue.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Remove the modeStrategy interface, serverlessStrategy, and rmStrategy
types. Replace with direct methods on cpuTimeTokenAllocator:

- computeTargets(noBurstFracs, burstDelta): pure transform from values
- readTargetSettings(): reads KVCPUTimeUtilTarget and mirrors to both
  tiers
- refillBurst(): loops over all queues unconditionally
- setMode(mode): sets currentMode only

Both modes now use a single utilization target setting. The old
app_tenant setting (key admission.cpu_time_tokens.target_util.app_tenant,
default 0.80) is retained as the registration key and given the
canonical name admission.cpu_time_tokens.target_util via WithName.
Existing serverless clusters that have tuned the app_tenant setting
keep their value. The previously unused KVCPUTimeUtilTarget (key
admission.cpu_time_tokens.target_util, default 0.75) and the
per-tier KVCPUTimeSystemUtilGoal are removed.

System tenant priority now comes entirely from weight and MaxCPU
config, not from a higher per-tier utilization target.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Route all CPU time token AC work through a single WorkQueue. Remove
the two-tier resource architecture that previously gave the system
tenant a separate queue with higher utilization targets.

Structural changes:
- Remove resourceTier type, systemTenant/appTenant constants, and
  numResourceTiers. The granter has [numBurstQualifications] buckets
  (canBurst, noBurst) instead of [numResourceTiers][numBurstQualifications].
- Remove cpuTimeTokenChildGranter; cpuTimeTokenGranter implements the
  granter interface directly.
- Collapse rates, tokenCounts, capacities, minimums, targetUtilizations
  from 2D to 1D.
- Collapse per-tenant metrics from per-tier arrays to single counters.
  Per-bucket metrics go from 4 to 2 counters.

Routing changes:
- GetKVWorkQueue/GetCTTWorkQueue always return the single queue.
- Remove useResourceGroup bool and setUseResourceGroup. Resource
  manager mode routing is determined by reading cpuTimeTokenACMode
  directly in groupKeyForWorkLocked.
- Remove activeMode, currentMode, setMode, configureQueue.

System tenant priority comes from systemTenantGroupConfig
(Weight=MaxUint32, MaxCPU=true, BurstFrac=1.0) installed in the
config holder seed, not from a separate granter tier.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io Bot commented May 11, 2026

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@blathers-crl
Copy link
Copy Markdown

blathers-crl Bot commented May 11, 2026

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants