Report gRPC status code in client-computed stats#10805
Conversation
9ca27a6 to
b39ed92
Compare
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 60 metrics, 11 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.61.0-SNAPSHOT~07ec9497bd, baseline=1.61.0-SNAPSHOT~c1e9ac6389
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.058 s) : 0, 1058195
Total [baseline] (8.831 s) : 0, 8830555
Agent [candidate] (1.061 s) : 0, 1060565
Total [candidate] (8.814 s) : 0, 8813956
section iast
Agent [baseline] (1.234 s) : 0, 1234495
Total [baseline] (9.568 s) : 0, 9567534
Agent [candidate] (1.236 s) : 0, 1236223
Total [candidate] (9.551 s) : 0, 9550536
gantt
title insecure-bank - break down per module: candidate=1.61.0-SNAPSHOT~07ec9497bd, baseline=1.61.0-SNAPSHOT~c1e9ac6389
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.193 ms) : 0, 1193
crashtracking [candidate] (1.189 ms) : 0, 1189
BytebuddyAgent [baseline] (628.675 ms) : 0, 628675
BytebuddyAgent [candidate] (627.709 ms) : 0, 627709
AgentMeter [baseline] (29.115 ms) : 0, 29115
AgentMeter [candidate] (29.103 ms) : 0, 29103
GlobalTracer [baseline] (256.959 ms) : 0, 256959
GlobalTracer [candidate] (257.032 ms) : 0, 257032
AppSec [baseline] (31.671 ms) : 0, 31671
AppSec [candidate] (31.553 ms) : 0, 31553
Debugger [baseline] (58.87 ms) : 0, 58870
Debugger [candidate] (58.644 ms) : 0, 58644
Remote Config [baseline] (593.265 µs) : 0, 593
Remote Config [candidate] (587.928 µs) : 0, 588
Telemetry [baseline] (8.667 ms) : 0, 8667
Telemetry [candidate] (8.69 ms) : 0, 8690
Flare Poller [baseline] (6.345 ms) : 0, 6345
Flare Poller [candidate] (10.099 ms) : 0, 10099
section iast
crashtracking [baseline] (1.217 ms) : 0, 1217
crashtracking [candidate] (1.203 ms) : 0, 1203
BytebuddyAgent [baseline] (802.126 ms) : 0, 802126
BytebuddyAgent [candidate] (803.751 ms) : 0, 803751
AgentMeter [baseline] (11.595 ms) : 0, 11595
AgentMeter [candidate] (11.609 ms) : 0, 11609
GlobalTracer [baseline] (248.829 ms) : 0, 248829
GlobalTracer [candidate] (248.352 ms) : 0, 248352
AppSec [baseline] (26.584 ms) : 0, 26584
AppSec [candidate] (26.564 ms) : 0, 26564
Debugger [baseline] (62.578 ms) : 0, 62578
Debugger [candidate] (62.997 ms) : 0, 62997
Remote Config [baseline] (536.586 µs) : 0, 537
Remote Config [candidate] (547.541 µs) : 0, 548
Telemetry [baseline] (14.907 ms) : 0, 14907
Telemetry [candidate] (15.415 ms) : 0, 15415
Flare Poller [baseline] (4.694 ms) : 0, 4694
Flare Poller [candidate] (4.279 ms) : 0, 4279
IAST [baseline] (25.303 ms) : 0, 25303
IAST [candidate] (25.323 ms) : 0, 25323
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.61.0-SNAPSHOT~07ec9497bd, baseline=1.61.0-SNAPSHOT~c1e9ac6389
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.054 s) : 0, 1054372
Total [baseline] (11.037 s) : 0, 11036900
Agent [candidate] (1.069 s) : 0, 1069188
Total [candidate] (11.089 s) : 0, 11089326
section appsec
Agent [baseline] (1.244 s) : 0, 1244267
Total [baseline] (11.08 s) : 0, 11080146
Agent [candidate] (1.248 s) : 0, 1247556
Total [candidate] (11.185 s) : 0, 11184541
section iast
Agent [baseline] (1.228 s) : 0, 1228354
Total [baseline] (11.24 s) : 0, 11240347
Agent [candidate] (1.226 s) : 0, 1225849
Total [candidate] (11.234 s) : 0, 11233644
section profiling
Agent [baseline] (1.18 s) : 0, 1180236
Total [baseline] (10.978 s) : 0, 10978266
Agent [candidate] (1.18 s) : 0, 1179879
Total [candidate] (11.028 s) : 0, 11027695
gantt
title petclinic - break down per module: candidate=1.61.0-SNAPSHOT~07ec9497bd, baseline=1.61.0-SNAPSHOT~c1e9ac6389
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.209 ms) : 0, 1209
crashtracking [candidate] (1.203 ms) : 0, 1203
BytebuddyAgent [baseline] (626.084 ms) : 0, 626084
BytebuddyAgent [candidate] (635.074 ms) : 0, 635074
AgentMeter [baseline] (28.956 ms) : 0, 28956
AgentMeter [candidate] (29.201 ms) : 0, 29201
GlobalTracer [baseline] (256.365 ms) : 0, 256365
GlobalTracer [candidate] (258.307 ms) : 0, 258307
AppSec [baseline] (31.522 ms) : 0, 31522
AppSec [candidate] (31.75 ms) : 0, 31750
Debugger [baseline] (59.301 ms) : 0, 59301
Debugger [candidate] (60.049 ms) : 0, 60049
Remote Config [baseline] (588.919 µs) : 0, 589
Remote Config [candidate] (610.702 µs) : 0, 611
Telemetry [baseline] (8.589 ms) : 0, 8589
Telemetry [candidate] (8.786 ms) : 0, 8786
Flare Poller [baseline] (5.766 ms) : 0, 5766
Flare Poller [candidate] (8.03 ms) : 0, 8030
section appsec
crashtracking [baseline] (1.188 ms) : 0, 1188
crashtracking [candidate] (1.189 ms) : 0, 1189
BytebuddyAgent [baseline] (658.091 ms) : 0, 658091
BytebuddyAgent [candidate] (657.681 ms) : 0, 657681
AgentMeter [baseline] (11.971 ms) : 0, 11971
AgentMeter [candidate] (12.127 ms) : 0, 12127
GlobalTracer [baseline] (257.385 ms) : 0, 257385
GlobalTracer [candidate] (259.237 ms) : 0, 259237
IAST [baseline] (23.816 ms) : 0, 23816
IAST [candidate] (24.063 ms) : 0, 24063
AppSec [baseline] (177.084 ms) : 0, 177084
AppSec [candidate] (177.469 ms) : 0, 177469
Debugger [baseline] (65.399 ms) : 0, 65399
Debugger [candidate] (66.167 ms) : 0, 66167
Remote Config [baseline] (571.637 µs) : 0, 572
Remote Config [candidate] (570.689 µs) : 0, 571
Telemetry [baseline] (9.05 ms) : 0, 9050
Telemetry [candidate] (9.124 ms) : 0, 9124
Flare Poller [baseline] (3.569 ms) : 0, 3569
Flare Poller [candidate] (3.635 ms) : 0, 3635
section iast
crashtracking [baseline] (1.194 ms) : 0, 1194
crashtracking [candidate] (1.188 ms) : 0, 1188
BytebuddyAgent [baseline] (797.113 ms) : 0, 797113
BytebuddyAgent [candidate] (794.976 ms) : 0, 794976
AgentMeter [baseline] (11.355 ms) : 0, 11355
AgentMeter [candidate] (11.311 ms) : 0, 11311
GlobalTracer [baseline] (247.559 ms) : 0, 247559
GlobalTracer [candidate] (247.17 ms) : 0, 247170
IAST [baseline] (25.142 ms) : 0, 25142
IAST [candidate] (25.123 ms) : 0, 25123
AppSec [baseline] (27.341 ms) : 0, 27341
AppSec [candidate] (27.129 ms) : 0, 27129
Debugger [baseline] (64.378 ms) : 0, 64378
Debugger [candidate] (64.126 ms) : 0, 64126
Remote Config [baseline] (536.912 µs) : 0, 537
Remote Config [candidate] (530.224 µs) : 0, 530
Telemetry [baseline] (13.492 ms) : 0, 13492
Telemetry [candidate] (14.067 ms) : 0, 14067
Flare Poller [baseline] (4.24 ms) : 0, 4240
Flare Poller [candidate] (4.22 ms) : 0, 4220
section profiling
crashtracking [baseline] (1.158 ms) : 0, 1158
crashtracking [candidate] (1.158 ms) : 0, 1158
BytebuddyAgent [baseline] (681.502 ms) : 0, 681502
BytebuddyAgent [candidate] (681.217 ms) : 0, 681217
AgentMeter [baseline] (8.642 ms) : 0, 8642
AgentMeter [candidate] (8.588 ms) : 0, 8588
GlobalTracer [baseline] (215.355 ms) : 0, 215355
GlobalTracer [candidate] (215.224 ms) : 0, 215224
AppSec [baseline] (31.788 ms) : 0, 31788
AppSec [candidate] (32.004 ms) : 0, 32004
Debugger [baseline] (63.021 ms) : 0, 63021
Debugger [candidate] (63.664 ms) : 0, 63664
Remote Config [baseline] (588.746 µs) : 0, 589
Remote Config [candidate] (590.42 µs) : 0, 590
Telemetry [baseline] (10.494 ms) : 0, 10494
Telemetry [candidate] (9.856 ms) : 0, 9856
Flare Poller [baseline] (3.481 ms) : 0, 3481
Flare Poller [candidate] (3.518 ms) : 0, 3518
ProfilingAgent [baseline] (93.447 ms) : 0, 93447
ProfilingAgent [candidate] (93.546 ms) : 0, 93546
Profiling [baseline] (94.01 ms) : 0, 94010
Profiling [candidate] (94.103 ms) : 0, 94103
LoadParameters
See matching parameters
SummaryFound 6 performance improvements and 1 performance regressions! Performance is the same for 12 metrics, 17 unstable metrics.
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~07ec9497bd, baseline=1.61.0-SNAPSHOT~c1e9ac6389
dateFormat X
axisFormat %s
section baseline
no_agent (1.168 ms) : 1156, 1179
. : milestone, 1168,
iast (3.164 ms) : 3126, 3202
. : milestone, 3164,
iast_FULL (6.102 ms) : 6040, 6165
. : milestone, 6102,
iast_GLOBAL (3.439 ms) : 3383, 3495
. : milestone, 3439,
profiling (1.971 ms) : 1954, 1988
. : milestone, 1971,
tracing (1.751 ms) : 1737, 1765
. : milestone, 1751,
section candidate
no_agent (1.202 ms) : 1189, 1215
. : milestone, 1202,
iast (3.053 ms) : 3016, 3089
. : milestone, 3053,
iast_FULL (5.774 ms) : 5716, 5831
. : milestone, 5774,
iast_GLOBAL (3.56 ms) : 3503, 3617
. : milestone, 3560,
profiling (1.926 ms) : 1910, 1942
. : milestone, 1926,
tracing (1.782 ms) : 1767, 1797
. : milestone, 1782,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~07ec9497bd, baseline=1.61.0-SNAPSHOT~c1e9ac6389
dateFormat X
axisFormat %s
section baseline
no_agent (18.142 ms) : 17956, 18329
. : milestone, 18142,
appsec (19.762 ms) : 19562, 19963
. : milestone, 19762,
code_origins (18.827 ms) : 18634, 19020
. : milestone, 18827,
iast (17.82 ms) : 17644, 17995
. : milestone, 17820,
profiling (19.264 ms) : 19073, 19455
. : milestone, 19264,
tracing (18.877 ms) : 18690, 19064
. : milestone, 18877,
section candidate
no_agent (18.155 ms) : 17967, 18343
. : milestone, 18155,
appsec (19.433 ms) : 19235, 19630
. : milestone, 19433,
code_origins (17.708 ms) : 17534, 17882
. : milestone, 17708,
iast (18.626 ms) : 18438, 18813
. : milestone, 18626,
profiling (18.516 ms) : 18329, 18703
. : milestone, 18516,
tracing (17.657 ms) : 17481, 17832
. : milestone, 17657,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~07ec9497bd, baseline=1.61.0-SNAPSHOT~c1e9ac6389
dateFormat X
axisFormat %s
section baseline
no_agent (1.48 ms) : 1469, 1492
. : milestone, 1480,
appsec (3.799 ms) : 3579, 4019
. : milestone, 3799,
iast (2.257 ms) : 2189, 2325
. : milestone, 2257,
iast_GLOBAL (2.315 ms) : 2246, 2385
. : milestone, 2315,
profiling (2.088 ms) : 2033, 2142
. : milestone, 2088,
tracing (2.072 ms) : 2019, 2126
. : milestone, 2072,
section candidate
no_agent (1.477 ms) : 1466, 1489
. : milestone, 1477,
appsec (3.829 ms) : 3608, 4049
. : milestone, 3829,
iast (2.263 ms) : 2194, 2332
. : milestone, 2263,
iast_GLOBAL (2.312 ms) : 2242, 2382
. : milestone, 2312,
profiling (2.119 ms) : 2063, 2176
. : milestone, 2119,
tracing (2.067 ms) : 2014, 2120
. : milestone, 2067,
Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~07ec9497bd, baseline=1.61.0-SNAPSHOT~c1e9ac6389
dateFormat X
axisFormat %s
section baseline
no_agent (14.9 s) : 14900000, 14900000
. : milestone, 14900000,
appsec (14.944 s) : 14944000, 14944000
. : milestone, 14944000,
iast (18.74 s) : 18740000, 18740000
. : milestone, 18740000,
iast_GLOBAL (17.647 s) : 17647000, 17647000
. : milestone, 17647000,
profiling (14.846 s) : 14846000, 14846000
. : milestone, 14846000,
tracing (15.202 s) : 15202000, 15202000
. : milestone, 15202000,
section candidate
no_agent (15.513 s) : 15513000, 15513000
. : milestone, 15513000,
appsec (15.216 s) : 15216000, 15216000
. : milestone, 15216000,
iast (18.152 s) : 18152000, 18152000
. : milestone, 18152000,
iast_GLOBAL (17.727 s) : 17727000, 17727000
. : milestone, 17727000,
profiling (15.099 s) : 15099000, 15099000
. : milestone, 15099000,
tracing (15.043 s) : 15043000, 15043000
. : milestone, 15043000,
|
21ec5e9 to
9f3f544
Compare
# Conflicts: # dd-trace-core/src/main/java/datadog/trace/common/metrics/MetricKey.java
When client-computed stats (CCS) are enabled, the agent **merges** stats it computes itself from raw spans with stats pre-computed by the tracer. For gRPC spans, without Client Computed Stats (metrics) the agent resolves the status code from the span's tags via [`getGRPCStatusCode()`](https://github.com/DataDog/datadog-agent/blob/47938ea8c9b9894dcb03dc3f81cf2c6e408f1b6c/pkg/trace/stats/aggregation.go#L167-L221), which always returns a numeric string (e.g. `4`) or an empty string. With CCS enabled, the code uses [`GRPCStatusCode`](https://github.com/DataDog/datadog-agent/blob/47938ea8c9b9894dcb03dc3f81cf2c6e408f1b6c/pkg/trace/stats/aggregation.go#L160) without translation. This change mimics the aggregation of the agent, and what is expected from the agent, in [`NewAggregationFromGroup`](https://github.com/DataDog/datadog-agent/blob/47938ea8c9b9894dcb03dc3f81cf2c6e408f1b6c/pkg/trace/stats/aggregation.go#L146-L165). Protocol wise [ClientGroupedStats.GRPC_status_code](https://github.com/DataDog/datadog-agent/blob/47938ea8c9b9894dcb03dc3f81cf2c6e408f1b6c/pkg/proto/datadog/trace/stats.proto#L103) is a `string`.
9f3f544 to
a3832a0
Compare
....84/src/main/java/datadog/trace/instrumentation/armeria/grpc/client/GrpcClientDecorator.java
Outdated
Show resolved
Hide resolved
....84/src/main/java/datadog/trace/instrumentation/armeria/grpc/server/GrpcServerDecorator.java
Outdated
Show resolved
Hide resolved
amarziali
left a comment
There was a problem hiding this comment.
Thanks for having fixed that. it looks good. I left a minor comment
b0bf34a to
07ec949
Compare
|
/merge |
|
View all feedbacks in Devflow UI.
This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
devflow unqueued this merge request: It did not become mergeable within the expected time |
What Does This Do
Reports the gRPC status code via Client Computed Stats.
This status is supported since 7.65.0 in the agent (DataDog/datadog-agent#34220), which is the minimal version needed to support CCS as well.
Current grpc instrumentations capture the status code, but not its numeric value, so it was chosen to add a new span tag that will be used in the client aggregation.
span.setTag("status.code", status.getCode().name()); span.setTag("grpc.status.code", status.getCode().name()); + span.setTag("rpc.grpc.status_code", status.getCode().value());This affects grpc and armeria instrumentations.
Note an additional system will be added DataDog/system-tests#6483
Motivation
Completeness of CCS.
Additional notes
When client-computed stats (CCS) are enabled, the agent merges stats it computes itself from raw spans with stats pre-computed by the tracer.
For gRPC spans, without Client Computed Stats (metrics) the agent resolves the status code from the span's tags via
getGRPCStatusCode(), which always returns a numeric string (e.g.4) or an empty string. With CCS enabled, the code usesGRPCStatusCodewithout translation.flowchart TB subgraph tracer["dd-trace-java"] span["gRPC span<br>grpc.status.code = 'DEADLINE_EXCEEDED'<br>rpc.grpc.status_code = 4"] span -->|raw spans| v04["POST /v0.4/traces<br>msgpack"] span --> agg["ConflatingMetricsAggregator<br>reads rpc.grpc.status_code<br>GRPCStatusCode = '4'"] agg -->|pre-computed stats| v06["POST /v0.6/stats<br>msgpack · GRPCStatusCode: '4'"] end subgraph agent["datadog-agent"] v04 --> agentPath["NewAggregationFromSpan<br>getGRPCStatusCode<br>meta[grpc.status.code]='DEADLINE_EXCEEDED' → '4'"] v06 --> ccsPath["NewAggregationFromGroup<br>GRPCStatusCode → '4'"] agentPath --> k1["key{GRPCStatusCode:'4',...}"] ccsPath --> k2["key{GRPCStatusCode:'4',...}"] endThis change mimics the aggregation of the agent, and what is expected from the agent, in
NewAggregationFromGroup.Protocol wise ClientGroupedStats.GRPC_status_code is a
string.