Skip to content

sql: add always-on per-query SQL CPU measurement#169997

Merged
trunk-io[bot] merged 3 commits into
cockroachdb:masterfrom
DrewKimball:drewk/per-query-sql-cpu
May 21, 2026
Merged

sql: add always-on per-query SQL CPU measurement#169997
trunk-io[bot] merged 3 commits into
cockroachdb:masterfrom
DrewKimball:drewk/per-query-sql-cpu

Conversation

@DrewKimball
Copy link
Copy Markdown
Collaborator

@DrewKimball DrewKimball commented May 8, 2026

sql: propagate local KV CPU from mutation and vecindex operators

Add grunning-based CPUStopWatch instrumentation to mutation
operators (insert/update/delete/upsert and variants, deleteRange,
insertFastPath) and vecstore.Txn. Expose the result via a new
localKVCPUTime() method on the mutationPlanNode interface and a
new LocalKVCPUTime field on vecstore.KVStats.

planNodeToRowSource and the vector search processors populate
ComponentStats.KV.LocalKVCPUTime from these sources so that the
vectorized stats collector's CPU subtraction applies above
wrapped mutations and vector index ops.

Informs #168272
Release note: None

sql: propagate LocalKVCPUTime from all KV operators for query-level CPU

Add a LocalKVCPUTime field to the Metrics proto. Populate it from
all KV-reading operators (ColBatchScan, ColBatchDirectScan,
ColIndexJoin, tableReader, joinReader, invertedJoiner,
zigzagJoiner) and from wrapped mutationPlanNodes. Accumulate it
into topLevelQueryStats via DistSQLReceiver.pushMeta and forward
it through forwardInnerQueryStats for cascades and triggers.

Informs #168272
Release note: None

sql: add always-on per-query SQL CPU measurement

Add a RawSQLCPUTime field to the Metrics proto and capture goroutine
grunning at the gateway, parallel post-query check workers, each
processor goroutine (via a new grunningMetaOutput wrapper injected
by FlowBase.StartInternal), the row-engine and vectorized outboxes,
the vectorized HashRouter, the ParallelUnorderedSynchronizer worker
goroutines, and the row-based router output goroutines. Each capture
site emits a Metrics.RawSQLCPUTime entry; DistSQLReceiver.pushMeta
accumulates them into topLevelQueryStats.rawSQLCPUTime, to which the
gateway adds its own grunning. A topLevelQueryStats.sqlCPUTime()
helper returns the corrected per-query SQL CPU time by subtracting
LocalKVCPUTime from the raw total.

EXPLAIN ANALYZE now displays SQL CPU time unconditionally, removing
prior vectorized-only and non-mutation restrictions in
instrumentation.go. The always-on override is skipped when the
DeterministicExplain testing knob is set so that test output remains
stable (the trace-derived SQL CPU time is already zeroed by
ComponentStats.MakeDeterministic in that case).

Fixes #168272

Release note (sql change): EXPLAIN ANALYZE now displays the SQL CPU
time line for all queries, including mutations and queries that run
through the row-by-row execution engine. Previously this line was
only shown for non-mutation queries that ran through the vectorized
execution engine.

@DrewKimball DrewKimball requested review from a team as code owners May 8, 2026 19:32
@DrewKimball DrewKimball requested review from mw5h and removed request for a team May 8, 2026 19:32
@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io Bot commented May 8, 2026

😎 Merged successfully - details.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I think we also need to add CPUStopWatch to ParallelUnorderedSynchronizer input goroutines and the row-based router goroutine.

@yuzefovich reviewed 38 files and all commit messages, and made 7 comments.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on DrewKimball, mw5h, and ZhouXing19).


-- commits line 50 at r3:
nit: probably deserves a release note.


pkg/sql/delete_range.go line 147 at r1 (raw file):

			runErr := params.p.txn.Run(ctx, b)
			if delta := d.cpuStopWatch.Stop(); delta > 0 {
				atomic.AddInt64(&d.localKVCPUTimeAccum, int64(delta))

nit: is there a reason to use atomics for this? If we do need it, then we probably also need to do atomic read too, right?


pkg/sql/instrumentation.go line 1192 at r3 (raw file):

					nodeStats.MaxAllocatedMem.MaybeAdd(stats.Exec.MaxAllocatedMem)
					nodeStats.MaxAllocatedDisk.MaybeAdd(stats.Exec.MaxAllocatedDisk)
					if noMutations && !makeDeterministic {

nit: we probably need to make SQL CPU time deterministic (i.e. zero) when the testing knob is set.


pkg/sql/execinfrapb/data.proto line 339 at r2 (raw file):

    // inside KV calls for this flow, as measured by the grunning library. This
    // is the portion of SQL goroutine CPU that overlapped with KV work, not
    // the CPU consumed on KV servers (see KVCPUTime for that). The gateway

How do things work with a distributed TableReader stage?


pkg/sql/opt/exec/explain/output_test.go line 177 at r3 (raw file):

	db := sqlutils.MakeSQLRunner(tc.Conns[0])

	runQuery := func(query string, hideCPU bool) {

nit: we should try vectorized on and off.


pkg/sql/opt/exec/explain/output_test.go line 195 at r3 (raw file):

			}
		}
		if hideCPU {

nit: looks like hideCPU is always false now.

Copy link
Copy Markdown
Collaborator

@ZhouXing19 ZhouXing19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I have similar comments on the read and write with atomics, and one possible missed part for accounting.

@ZhouXing19 reviewed 38 files and all commit messages, and made 5 comments.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on DrewKimball and mw5h).


-- commits line 19 at r2:
Claude noted that the kv read/write in pkg/sql/sequence.go were not accounted -- is this a miss or it's intentional?


pkg/sql/delete.go line 272 at r1 (raw file):

func (d *deleteNode) localKVCPUTime() int64 {
	return d.run.td.localKVCPUTime

Seconding Yahor's point -- In kvBatchMetrics we use atomic.LoadInt64(h.atomics.localKVCPUTime). I wonder if we should be aligned for the read of the field. Ditto for deleteRangeNode, deleteSwapNode, updateNode, updateSwapNode and etc.


Thinking about this more -- maybe what this commit introduced don't need atomic, as they are all called from the plan node's execution goroutine; whereas in kvBatchMetrics, atomics are genuinely needed because the sendFunc can be called from KV streamer worker goroutines. Is this correct...?


pkg/sql/insert_fast_path.go line 377 at r1 (raw file):

	// Accumulate CPU time from the BatchResponse.
	if br.CPUTime > 0 {
		n.run.kvCPUTime += br.CPUTime

Similar issue -- the localKVCPUTime is accumulated with atomic.AddInt64 but kvCPUTime is simple addition. Shall we align them? Ditto for tableWriterBase.


pkg/sql/vecindex/vecstore/store_txn.go line 737 at r1 (raw file):

	err := tx.kv.Run(ctx, b)
	if delta := tx.cpuStopWatch.Stop(); delta > 0 {
		tx.kvStats.LocalKVCPUTime += int64(delta)

nit: Should it be atomic.AddInt64 as well? (Or at least should the writes be consistent with using atomic or not?)

Add grunning-based CPUStopWatch instrumentation to mutation
operators (insert/update/delete/upsert and variants, deleteRange,
insertFastPath) and vecstore.Txn. Expose the result via a new
localKVCPUTime() method on the mutationPlanNode interface and a
new LocalKVCPUTime field on vecstore.KVStats.

planNodeToRowSource and the vector search processors populate
ComponentStats.KV.LocalKVCPUTime from these sources so that the
vectorized stats collector's CPU subtraction applies above
wrapped mutations and vector index ops.

Informs cockroachdb#168272
Release note: None
Add a LocalKVCPUTime field to the Metrics proto. Populate it from
all KV-reading operators (ColBatchScan, ColBatchDirectScan,
ColIndexJoin, tableReader, joinReader, invertedJoiner,
zigzagJoiner) and from wrapped mutationPlanNodes. Accumulate it
into topLevelQueryStats via DistSQLReceiver.pushMeta and forward
it through forwardInnerQueryStats for cascades and triggers.

Informs cockroachdb#168272
Release note: None
@DrewKimball DrewKimball force-pushed the drewk/per-query-sql-cpu branch from d5be5ed to b62eb44 Compare May 19, 2026 20:43
Copy link
Copy Markdown
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

I think we also need to add CPUStopWatch to ParallelUnorderedSynchronizer input goroutines and the row-based router goroutine.

Good point, Done.

@DrewKimball made 11 comments and resolved 4 discussions.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on mw5h, yuzefovich, and ZhouXing19).


-- commits line 19 at r2:

Previously, ZhouXing19 (Jane Xing) wrote…

Claude noted that the kv read/write in pkg/sql/sequence.go were not accounted -- is this a miss or it's intentional?

Hm, call it a bit of both. I think it's OK to leave sequences plus a few other things as follow ups. I'll open an issue for it.


-- commits line 50 at r3:

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: probably deserves a release note.

Done.


pkg/sql/delete.go line 272 at r1 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

Seconding Yahor's point -- In kvBatchMetrics we use atomic.LoadInt64(h.atomics.localKVCPUTime). I wonder if we should be aligned for the read of the field. Ditto for deleteRangeNode, deleteSwapNode, updateNode, updateSwapNode and etc.


Thinking about this more -- maybe what this commit introduced don't need atomic, as they are all called from the plan node's execution goroutine; whereas in kvBatchMetrics, atomics are genuinely needed because the sendFunc can be called from KV streamer worker goroutines. Is this correct...?

Yeah, I think you're right that we don't need atomics at all for this CPU accounting. In fact, needing atomics is probably indicative of a bug, since you shouldn't be mixing goroutines for a goroutine-local measure. I'm inclined to just leave the atomic load for kvBatchMetrics simply because it doesn't matter, and it makes localKVCPUTime symmetrical with kvCPUTime, but I don't feel strongly.


pkg/sql/delete_range.go line 147 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: is there a reason to use atomics for this? If we do need it, then we probably also need to do atomic read too, right?

Good point, removed the atomics.


pkg/sql/insert_fast_path.go line 377 at r1 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

Similar issue -- the localKVCPUTime is accumulated with atomic.AddInt64 but kvCPUTime is simple addition. Shall we align them? Ditto for tableWriterBase.

Done.


pkg/sql/instrumentation.go line 1192 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: we probably need to make SQL CPU time deterministic (i.e. zero) when the testing knob is set.

I think this is handled by ComponentStats.MakeDeterministic. There was one place that did need to check the deterministic setting, where we override the top level query stat CPU metric.


pkg/sql/execinfrapb/data.proto line 339 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

How do things work with a distributed TableReader stage?

In that case, each flow emits its own metric with LocalKVCPUTime set, and it's all aggregated on the gateway. Does that answer the question?


pkg/sql/vecindex/vecstore/store_txn.go line 737 at r1 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

nit: Should it be atomic.AddInt64 as well? (Or at least should the writes be consistent with using atomic or not?)

Done.


pkg/sql/opt/exec/explain/output_test.go line 177 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: we should try vectorized on and off.

Done.


pkg/sql/opt/exec/explain/output_test.go line 195 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: looks like hideCPU is always false now.

Done. Removed the param.

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@yuzefovich reviewed 34 files and all commit messages, made 2 comments, and resolved 2 discussions.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on DrewKimball, mw5h, and ZhouXing19).


pkg/sql/execinfrapb/data.proto line 339 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

In that case, each flow emits its own metric with LocalKVCPUTime set, and it's all aggregated on the gateway. Does that answer the question?

Yeah, thanks.

@blathers-crl
Copy link
Copy Markdown

blathers-crl Bot commented May 19, 2026

Detected infrastructure failure (matched: self-hosted runner lost communication with the server). Automatically rerunning failed jobs. (run link)

Add a RawSQLCPUTime field to the Metrics proto and capture goroutine
grunning at the gateway, parallel post-query check workers, each
processor goroutine (via a new grunningMetaOutput wrapper injected
by FlowBase.StartInternal), the row-engine and vectorized outboxes,
the vectorized HashRouter, the ParallelUnorderedSynchronizer worker
goroutines, and the row-based router output goroutines. Each capture
site emits a Metrics.RawSQLCPUTime entry; DistSQLReceiver.pushMeta
accumulates them into topLevelQueryStats.rawSQLCPUTime, to which the
gateway adds its own grunning. A topLevelQueryStats.sqlCPUTime()
helper returns the corrected per-query SQL CPU time by subtracting
LocalKVCPUTime from the raw total.

EXPLAIN ANALYZE now displays SQL CPU time unconditionally, removing
prior vectorized-only and non-mutation restrictions in
instrumentation.go. The always-on override is skipped when the
DeterministicExplain testing knob is set so that test output remains
stable (the trace-derived SQL CPU time is already zeroed by
ComponentStats.MakeDeterministic in that case).

Fixes cockroachdb#168272

Release note (sql change): EXPLAIN ANALYZE now displays the SQL CPU
time line for all queries, including mutations and queries that run
through the row-by-row execution engine. Previously this line was
only shown for non-mutation queries that ran through the vectorized
execution engine.
@DrewKimball DrewKimball force-pushed the drewk/per-query-sql-cpu branch from e8d9623 to bbd03c9 Compare May 20, 2026 21:52
@DrewKimball
Copy link
Copy Markdown
Collaborator Author

TFYRs!

/trunk merge

@trunk-io trunk-io Bot merged commit 66625da into cockroachdb:master May 21, 2026
25 checks passed
@DrewKimball DrewKimball deleted the drewk/per-query-sql-cpu branch May 21, 2026 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: collect SQL CPU Time on all query executions as a top-level query stat

4 participants