Skip to content

[c-930] use metrics v2 to trace relevant code paths#10

Open
dbrajovic wants to merge 12 commits intov1.x-injfrom
c-930/trace-everything
Open

[c-930] use metrics v2 to trace relevant code paths#10
dbrajovic wants to merge 12 commits intov1.x-injfrom
c-930/trace-everything

Conversation

@dbrajovic
Copy link
Copy Markdown

@dbrajovic dbrajovic commented Mar 30, 2026

Introduces injective metrics and traces code paths of interest


PR checklist

  • Tests written/updated
  • Changelog entry added in .changelog (we use unclog to manage our changelog)
  • Updated relevant documentation (docs/ or spec/) and code comments
  • Title follows the Conventional Commits spec

Summary by CodeRabbit

  • New Features

    • Added metrics/observability instrumentation across consensus, mempool, state storage, and state sync.
  • Chores

    • Bumped Go toolchain and updated numerous dependencies, including observability and gRPC-related modules.
  • Refactor

    • Adjusted node and subsystem constructors to accept optional meter wiring and streamlined constructor call-sites.
  • Tests

    • Updated test call sites to accommodate new constructor parameters and context-aware recheck calls.

@linear
Copy link
Copy Markdown

linear bot commented Mar 30, 2026

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 30, 2026

Hey there and thank you for opening this pull request! 👋🏼

We require pull request titles to follow the Conventional Commits specification and it looks like your proposed title needs to be adjusted.

Details:

No release type found in pull request title "[c-930] use metrics v2 to trace relevant code paths". Add a prefix to indicate what kind of release this pull request corresponds to. For reference, see https://www.conventionalcommits.org/

Available types:
 - feat
 - fix
 - build
 - chore
 - ci
 - docs
 - refactor
 - perf
 - test
 - revert
 - spec
 - merge

General format: type(scope): msg
Breaking change: type(scope)!: msg
Multi-scope change: type: msg
Types: feat, fix, build, chore, ci, docs, refactor, perf, test, revert, spec, merge.
Example: fix(cmd/cometbft/commands/debug): execute p.Signal only when p is not nil

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 30, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f41d679b-ba2e-4c98-be1d-c6a92b5c9e39

📥 Commits

Reviewing files that changed from the base of the PR and between 54c7b49 and ab25bba.

📒 Files selected for processing (2)
  • mempool/clist_mempool.go
  • node/node.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • mempool/clist_mempool.go

📝 Walkthrough

Walkthrough

Metrics instrumentation (Injective/cometbft meters) is threaded into node construction and subsystems (state, mempool, consensus, statesync). Numerous methods are wrapped with timing contexts; constructors and test call sites updated to accept an extra meter parameter. Dependencies were bumped (Go 1.23.9 + module upgrades).

Changes

Cohort / File(s) Summary
Node & setup
node/node.go, node/setup.go, node/node_test.go
Add meter metrics.Meter param/field; pass sub-meters into state/mempool/consensus/statesync; update NewNode/NewNodeWithCliParams signatures and tests to include trailing meter (often nil).
Consensus
internal/consensus/reactor.go, internal/consensus/state.go
Add injmetrics.Meter fields; provide ReactorMeter/StateMeter options; wrap Receive and handleMsg with meter.FuncTimingCtx(...).
Mempool
mempool/clist_mempool.go, mempool/clist_mempool_test.go
Add metrics.Meter + WithMeter option; instrument key methods with timing contexts; CheckTx uses named returns and async callback defers stop; recheckTxs now accepts ctx and tests updated to pass contexts.
State store
state/store.go
Add Meter injmetrics.Meter to StoreOptions, default to nil-meter; instrument many store methods with timing spans; change import alias to cmtmetrics.
State sync
statesync/reactor.go, statesync/reactor_test.go
Add meter injmetrics.Meter field and NewReactor(..., meter) param; instrument Receive and Sync; tests updated to pass trailing nil.
RPC / Tutorials / Examples
rpc/test/helpers.go, test/e2e/node/main.go, abci/tutorials/abci-v2-forum-app/forum.go
Update NewNode/NewCometBFT call sites to single-line and include extra trailing nil meter argument.
Dependencies
go.mod
Bump Go toolchain to go 1.23.9; upgrade many direct/indirect modules; add github.com/InjectiveLabs/metrics/v2 and github.com/grpc-ecosystem/grpc-gateway/v2; add replace for github.com/cometbft/cometbft/api./api.

Sequence Diagram

sequenceDiagram
    participant Caller as Caller
    participant NewNode as NewNode()
    participant Node as Node
    participant Subsystems as Subsystems\n(State/Mempool/Consensus/Statesync)
    participant Meter as Meter\n(NewNilMeter / Injected)
    participant Method as InstrumentedMethod

    Caller->>NewNode: NewNode(ctx, ..., meter?)
    NewNode->>Meter: meter := param or NewNilMeter()
    NewNode->>Subsystems: create subsystems with meter.SubMeter("svc")
    Subsystems->>Meter: request sub-meter (svc-tagged)
    Meter-->>Subsystems: return scoped sub-meter
    Caller->>Node: call method (e.g., CheckTx / Receive / Load)
    Node->>Method: Method starts meter.FuncTimingCtx(tag)
    Method->>Method: execute logic
    Method-->>Node: defer stop() after execution
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through modules, needle fine,

Wired meters to each tiny line,
Timed the hops from state to sync,
Counted beats before a blink,
A rabbit marks each tick — divine.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 73.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the primary change: introducing metrics v2 instrumentation for code path tracing throughout the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch c-930/trace-everything

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
node/node.go (1)

277-303: ⚠️ Potential issue | 🔴 Critical

This is a source-incompatible change to exported constructors.

Adding meter ahead of the variadic options changes the call shape of both NewNode and NewNodeWithCliParams, breaking all existing downstream callers. All call sites must now include a positional meter argument before options. Preserve the current signatures and inject tracing via an Option or a separate helper instead.

Additionally, the struct literal stores a fresh metrics.NewNilMeter() instead of using the normalized meter parameter passed to the function, discarding the caller-provided meter value.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@node/node.go` around lines 277 - 303, The exported constructors NewNode and
NewNodeWithCliParams were changed to take a positional meter argument before
variadic options (breaking callers) and the implementation overwrites the passed
meter with metrics.NewNilMeter(); revert the public API to its original
signature (keep meter injected via an Option or helper rather than a new
positional parameter) by removing the new positional meter parameter from
NewNode/NewNodeWithCliParams and instead accept a Meter through an Option (e.g.,
WithMeter) or a separate setter; also fix the Node construction code that
currently sets metrics.NewNilMeter() so it uses the normalized meter value
provided via the Option (or falls back to metrics.NewNilMeter() only when no
Option was supplied) to ensure caller-provided meters are preserved.
🧹 Nitpick comments (2)
internal/consensus/reactor.go (1)

272-273: Rejected messages are still invisible to this span.

Because the timer starts here, any MsgFromProto / ValidateBasic failure returns before tracing begins. If this span is supposed to represent the full receive path, start it earlier and fall back to a generic tag until the concrete message type is known.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/consensus/reactor.go` around lines 272 - 273, Start the receive
timing span before any MsgFromProto/ValidateBasic work so rejected messages are
included: move the conR.meter.FuncTimingCtx(context.Background(), "Receive",
...) call to before MsgFromProto and ValidateBasic and initially use a generic
tag (e.g. injmetrics.Tag("msg","unknown") or similar) until fmt.Sprintf("%T",
msg) is available; once MsgFromProto succeeds, replace or emit an updated tag
with the concrete type (using the meter API or by ending and restarting the span
if needed) so the span covers the full receive path and still records the real
message type when known.
node/node_test.go (1)

491-492: Please add one constructor case with a non-nil meter.

All of the updated tests still pass nil, so the new meter plumbing is never exercised. A tiny fake meter assertion in this file would catch forwarding regressions immediately.

Also applies to: 535-536, 568-569, 602-603, 624-625, 648-649

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@node/node_test.go` around lines 491 - 492, Add one test constructor
invocation that passes a non-nil fake meter to NewNode (instead of the nil meter
cases currently used) so the new metrics plumbing is exercised; create a tiny
fake meter implementation in node_test.go, pass it where
DefaultMetricsProvider(config.Instrumentation) (or the metrics argument) is
provided to NewNode, and add a simple assertion that the fake meter was
forwarded (e.g., the created Node exposes/holds the fake meter or the fake meter
observed a call). Reference NewNode and DefaultMetricsProvider (and the existing
CustomReactors usage) when locating the call sites to update.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@go.mod`:
- Line 38: Replace the vulnerable module versions in go.mod: bump
google.golang.org/grpc from v1.75.0 to v1.79.3 (or later) and bump
go.opentelemetry.io/otel/sdk from v1.38.0 to v1.40.0 (or later); after updating
the module lines for these symbols, run go mod tidy and re-run tests/build to
ensure dependency graph and imports resolve cleanly.

In `@internal/consensus/state.go`:
- Around line 226-229: The StateMeter option currently allows replacing the safe
default meter with nil causing panics; update StateMeter (used to configure
cs.meter) to ignore nil inputs—either do nothing when meter is nil or assign
injmetrics.NewNilMeter() instead—so NewState's default
(injmetrics.NewNilMeter()) is preserved; locate StateMeter and ensure the setter
only sets cs.meter when the provided meter is non-nil (or substitutes
injmetrics.NewNilMeter()) to prevent handleMsg from calling
cs.meter.FuncTimingCtx on nil.

In `@mempool/clist_mempool.go`:
- Around line 259-262: The WithMeter option currently overwrites the defensive
default metrics.NewNilMeter(), so change WithMeter(func mem *CListMempool) to
only set mem.meter when the provided meter is non-nil (i.e., if m != nil) to
preserve the nil-safe default; reference the WithMeter function and the
CListMempool.meter field (used by GetSenders, CheckTx, ReapMaxBytesMaxGas,
ReapMaxTxs, GetTxByHash, Update, recheckTxs) and ensure NewCListMempool's
metrics.NewNilMeter() remains in place unless a real meter is provided.
- Around line 798-800: The timing context returned by mem.meter.FuncTimingCtx is
being discarded—change the call so the derived context is assigned (e.g., ctx,
stop := mem.meter.FuncTimingCtx(context.Background(), "Update")) and ensure that
same ctx is passed into recheckTxs (and any subsequent calls within Update) so
the span/trace from FuncTimingCtx is propagated; keep the defer stop() to end
the timing span.

In `@node/node.go`:
- Line 598: NewNodeWithCliParams normalizes a meter but then the Node struct
literal unconditionally sets meter: metrics.NewNilMeter(), which discards the
caller-supplied/normalized meter; replace the hardcoded metrics.NewNilMeter() in
the Node initialization with the local normalized meter variable (the one
computed earlier in NewNodeWithCliParams) so the Node.meter field uses the
normalized value instead of always creating a fresh nil meter.

In `@node/setup.go`:
- Line 124: DefaultNewNode currently passes nil as the final Provider argument
to NewNodeWithCliParams, making the metrics Meter unreachable; replace that nil
with a concrete metrics Provider by constructing and passing the default
provider (use the package's DefaultMetricsProvider or the provider constructor
exposed by the metrics package) so DefaultNewNode forwards a real Provider to
NewNodeWithCliParams rather than nil.

---

Outside diff comments:
In `@node/node.go`:
- Around line 277-303: The exported constructors NewNode and
NewNodeWithCliParams were changed to take a positional meter argument before
variadic options (breaking callers) and the implementation overwrites the passed
meter with metrics.NewNilMeter(); revert the public API to its original
signature (keep meter injected via an Option or helper rather than a new
positional parameter) by removing the new positional meter parameter from
NewNode/NewNodeWithCliParams and instead accept a Meter through an Option (e.g.,
WithMeter) or a separate setter; also fix the Node construction code that
currently sets metrics.NewNilMeter() so it uses the normalized meter value
provided via the Option (or falls back to metrics.NewNilMeter() only when no
Option was supplied) to ensure caller-provided meters are preserved.

---

Nitpick comments:
In `@internal/consensus/reactor.go`:
- Around line 272-273: Start the receive timing span before any
MsgFromProto/ValidateBasic work so rejected messages are included: move the
conR.meter.FuncTimingCtx(context.Background(), "Receive", ...) call to before
MsgFromProto and ValidateBasic and initially use a generic tag (e.g.
injmetrics.Tag("msg","unknown") or similar) until fmt.Sprintf("%T", msg) is
available; once MsgFromProto succeeds, replace or emit an updated tag with the
concrete type (using the meter API or by ending and restarting the span if
needed) so the span covers the full receive path and still records the real
message type when known.

In `@node/node_test.go`:
- Around line 491-492: Add one test constructor invocation that passes a non-nil
fake meter to NewNode (instead of the nil meter cases currently used) so the new
metrics plumbing is exercised; create a tiny fake meter implementation in
node_test.go, pass it where DefaultMetricsProvider(config.Instrumentation) (or
the metrics argument) is provided to NewNode, and add a simple assertion that
the fake meter was forwarded (e.g., the created Node exposes/holds the fake
meter or the fake meter observed a call). Reference NewNode and
DefaultMetricsProvider (and the existing CustomReactors usage) when locating the
call sites to update.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cb66a95d-b185-48b4-81ca-e1a883f28e67

📥 Commits

Reviewing files that changed from the base of the PR and between dd0e3a3 and 74c0343.

⛔ Files ignored due to path filters (2)
  • go.sum is excluded by !**/*.sum
  • go.work is excluded by !**/*.work
📒 Files selected for processing (13)
  • abci/tutorials/abci-v2-forum-app/forum.go
  • go.mod
  • internal/consensus/reactor.go
  • internal/consensus/state.go
  • mempool/clist_mempool.go
  • node/node.go
  • node/node_test.go
  • node/setup.go
  • rpc/test/helpers.go
  • state/store.go
  • statesync/reactor.go
  • statesync/reactor_test.go
  • test/e2e/node/main.go

github.com/stretchr/testify v1.11.1
golang.org/x/crypto v0.41.0
golang.org/x/net v0.43.0
google.golang.org/grpc v1.75.0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

python - <<'PY'
import json
import urllib.request

queries = [
    ("google.golang.org/grpc", "1.75.0"),
    ("go.opentelemetry.io/otel/sdk", "1.38.0"),
]

for pkg, version in queries:
    body = json.dumps({
        "package": {"ecosystem": "Go", "name": pkg},
        "version": version,
    }).encode()
    req = urllib.request.Request(
        "https://api.osv.dev/v1/query",
        data=body,
        headers={"Content-Type": "application/json"},
    )
    with urllib.request.urlopen(req) as resp:
        payload = json.load(resp)
    print(json.dumps({
        "package": pkg,
        "version": version,
        "vulns": payload.get("vulns", []),
    }, indent=2))
PY

Repository: InjectiveLabs/cometbft

Length of output: 2648


🏁 Script executed:

cat go.mod | head -50

Repository: InjectiveLabs/cometbft

Length of output: 1787


🏁 Script executed:

sed -n '38p;141p' go.mod

Repository: InjectiveLabs/cometbft

Length of output: 147


🌐 Web query:

GO-2026-4762 GHSA-p77j-4mvh-x3m3 google.golang.org/grpc authorization bypass

💡 Result:

GO-2026-4762 (GHSA-p77j-4mvh-x3m3, CVE-2026-33186) is a critical authorization bypass vulnerability (CVSS 9.1) in google.golang.org/grpc versions before 1.79.3. The gRPC-Go server accepts HTTP/2 requests with a :path pseudo-header missing the leading slash (e.g., "Service/Method" instead of "/Service/Method"). It routes these to the correct handler, but authorization interceptors (e.g., google.golang.org/grpc/authz) evaluate the raw non-canonical path. Deny rules using canonical paths (with "/") fail to match, allowing bypass if a fallback allow rule exists. Impacts servers using path-based authorization interceptors with deny rules for canonical paths and fallback allows. Attackers send crafted HTTP/2 frames. Fixed in v1.79.3: rejects non-canonical :path with codes.Unimplemented before interceptors. Upgrade to v1.79.3 or later. Mitigate with validating interceptors or default-deny policies.

Citations:


🌐 Web query:

GO-2026-4394 GHSA-9h8m-3fm2-qjrq go.opentelemetry.io/otel PATH hijack RCE

💡 Result:

GO-2026-4394 (GHSA-9h8m-3fm2-qjrq, CVE-2026-24051) is a high-severity vulnerability in the OpenTelemetry Go SDK (go.opentelemetry.io/otel/sdk) enabling arbitrary code execution (RCE) via PATH hijacking on macOS/Darwin systems. Summary: The resource detection code in sdk/resource/host_id.go executes the 'ioreg' system command without an absolute path, relying on the PATH environment variable. An attacker able to modify PATH (local low-priv attacker) can place a malicious executable named 'ioreg' in a directory prioritized in PATH, leading to RCE in the application's context when resource.WithHostID or similar host detection is used. Affected versions: v1.21.0 to v1.39.0 (Go modules all versions in that range). Patched in: v1.40.0 via commit d45961b (uses absolute path for ioreg on macOS). Severity: High (CVSS 7.0: AV:L/AC:H/PR:L/UI:N/S:U/C:H/I:H/A:H). Official sources confirm the issue only affects macOS due to ioreg usage; other platforms use file reads or different safe methods. Mitigation: Upgrade to v1.40.0 or later. Avoid using resource.WithHostID if possible. Do not run in environments where PATH is attacker-controllable.

Citations:


Upgrade the vulnerable dependency pins before merging.

google.golang.org/grpc v1.75.0 contains a critical authorization-bypass vulnerability (GO-2026-4762 / GHSA-p77j-4mvh-x3m3, CVSS 9.1). Servers using path-based authorization interceptors are at risk when HTTP/2 requests with non-canonical :path pseudo-headers bypass deny rules. Upgrade to v1.79.3 or later.

go.opentelemetry.io/otel/sdk v1.38.0 contains a high-severity PATH-hijacking RCE vulnerability (GO-2026-4394 / GHSA-9h8m-3fm2-qjrq, CVSS 7.0) on macOS systems due to unqualified command execution. Upgrade to v1.40.0 or later.

Both versions are now in this diff.

🧰 Tools
🪛 OSV Scanner (2.3.5)

[CRITICAL] 38-38: google.golang.org/grpc 1.75.0: Authorization bypass in gRPC-Go via missing leading slash in :path in google.golang.org/grpc

(GO-2026-4762)


[CRITICAL] 38-38: google.golang.org/grpc 1.75.0: gRPC-Go has an authorization bypass via missing leading slash in :path

(GHSA-p77j-4mvh-x3m3)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@go.mod` at line 38, Replace the vulnerable module versions in go.mod: bump
google.golang.org/grpc from v1.75.0 to v1.79.3 (or later) and bump
go.opentelemetry.io/otel/sdk from v1.38.0 to v1.40.0 (or later); after updating
the module lines for these symbols, run go mod tidy and re-run tests/build to
ensure dependency graph and imports resolve cleanly.

@dbrajovic dbrajovic requested review from kakysha and maxim-inj March 31, 2026 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant