Skip to content

Conversation

@mateeullahmalik
Copy link
Contributor

No description provided.

@roomote
Copy link

roomote bot commented Dec 11, 2025

Rooviewer Clock   See task on Roo Cloud

All issues have been addressed. The lastNonPostponedState returning Unspecified is handled correctly in recoverFromPostponed which defaults to Active.

  • Is there a specific reason why only CpuCoresTotal, MemTotalGb and DiskTotalGb are validated to be strictly positive (> 0)? Other values like UptimeSeconds and PeersCount should also likely be non-negative, although their type float64 allows negative values. Also DiskFreeGb and MemFreeGb shouldn't be negative. Consider adding checks for these fields as well to ensure data integrity.
  • SupernodeMetrics struct uses uint32 for PeersCount and OpenPorts. However, SupernodeMetricsState uses uint64 for ReportCount and int64 for Height. It is fine, but in metrics.proto file PeersCount is uint32. The SupernodeMetrics struct in x/supernode/v1/types/metrics.pb.go correctly reflects this.

However, in evaluateCompliance in x/supernode/v1/keeper/metrics_validation.go, m.PeersCount is accessed but not validated. It is implicitly >= 0 because it is unsigned, but maybe we should ensure it's not excessively large or 0 if that's invalid?

More importantly, m.UptimeSeconds is a float64. This should probably be validated to be non-negative.

  • It seems that m.CpuUsagePercent, m.MemUsagePercent and m.DiskUsagePercent are validated to be between 0 and 100. But m.CpuCoresTotal, m.MemTotalGb and m.DiskTotalGb are only checked against the minimum requirements. It might be safer to also check they are not infinity or NaN, which is done by checkFinite, but checkFinite appends to issues but doesn't return or stop execution.

So subsequent checks like if m.CpuCoresTotal < params.MinCpuCores might still run with NaN or Inf if they weren't caught/returned early. Although m.CpuCoresTotal < params.MinCpuCores handles Inf correctly (Inf is not < finite), NaN comparisons always yield false, so NaN < MinCpuCores is false, which might bypass the check if we rely only on the < check. But checkFinite adds an issue for NaN, so that's good.

However, for UptimeSeconds and PeersCount, there are no checks at all. UptimeSeconds should be finite and >= 0.

  • The HandleMetricsStaleness function iterates over ALL supernodes. This could be a performance issue if there are many supernodes. Consider if we can optimize this, perhaps by using an index or only checking a subset per block, or if the number of supernodes is expected to be small enough.
  • lastNonPostponedState logic might return Unspecified if no prior history exists, potentially setting an invalid state on recovery. (Handled correctly in recoverFromPostponed which defaults to Active)
  • MemFreeGb > MemTotalGb check depends on potentially inconsistent reported data.
  • HandleMetricsStaleness logic for lastHeight == 0 relies on sn.States order/existence which might be affected by pruning.
Previous reviews

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mateeullahmalik mateeullahmalik force-pushed the supernode-self-report branch 4 times, most recently from d21a47e to e90126c Compare December 11, 2025 12:27
j-rafique
j-rafique previously approved these changes Dec 11, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a comprehensive supernode metrics self-reporting system. Supernodes can now report structured performance metrics (CPU, memory, storage, version, network connectivity) which are validated against configurable thresholds. Non-compliant nodes are automatically transitioned to a new POSTPONED state, and can recover by submitting compliant reports. An end-block staleness check ensures active supernodes report metrics regularly or face postponement.

Key Changes

  • Introduces MsgReportSupernodeMetrics for structured metrics reporting with compliance validation
  • Adds SuperNodeStatePostponed state for temporarily non-compliant supernodes with automatic recovery
  • Implements end-block staleness enforcement based on configurable intervals and grace periods

Reviewed changes

Copilot reviewed 49 out of 52 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
x/supernode/v1/types/tx.pb.go Generated protobuf code for new MsgReportSupernodeMetrics and response types
x/supernode/v1/types/query.pb.go Generated protobuf code for GetMetrics query endpoint
x/supernode/v1/types/params.pb.go Generated protobuf code for 11 new metrics-related parameters
x/supernode/v1/types/supernode_state.pb.go Generated protobuf code adding SUPERNODE_STATE_POSTPONED enum value
x/supernode/v1/types/metrics.pb.go New generated protobuf code for SupernodeMetrics and SupernodeMetricsState structures
x/supernode/v1/types/msg_report_supernode_metrics.go Validation logic for metrics reporting message
x/supernode/v1/types/params.go Parameter defaults, validation, and WithDefaults method for backward compatibility
x/supernode/v1/types/msg_update_params.go Removed inline validation to support partial parameter updates
x/supernode/v1/types/keys.go Added MetricsStateKey prefix for metrics storage
x/supernode/v1/types/events.go Added EventTypeMetricsReported, EventTypeSupernodePostponed, EventTypeSupernodeRecovered
x/supernode/v1/types/codec.go Registered MsgReportSupernodeMetrics in codec
x/supernode/v1/types/expected_keepers.go Added SetMetricsState and GetMetricsState methods to SupernodeKeeper interface
x/supernode/v1/keeper/msg_server_report_supernode_metrics.go Handler for metrics reporting with compliance evaluation and state transitions
x/supernode/v1/keeper/metrics_validation.go Core compliance checking logic against parameter thresholds
x/supernode/v1/keeper/metrics_store.go Storage and retrieval of SupernodeMetricsState
x/supernode/v1/keeper/metrics_state.go State transition helpers for POSTPONED and recovery
x/supernode/v1/keeper/metrics_staleness.go End-block staleness checking with overdue threshold enforcement
x/supernode/v1/keeper/query_get_metrics.go Query handler to retrieve latest metrics for a validator
x/supernode/v1/keeper/abci.go Begin/EndBlocker delegation to staleness handler
x/supernode/v1/keeper/params.go Applied WithDefaults in GetParams and SetParams for backward compatibility
x/supernode/v1/keeper/msg_update_params.go Implemented parameter merging to support partial governance updates
x/supernode/v1/keeper/msg_server_register_supernode.go Removed legacy Metrics field initialization
x/supernode/v1/keeper/query_get_top_super_nodes_for_block.go Updated filtering to exclude POSTPONED by default unless explicitly requested
x/supernode/v1/module/module.go Applied WithDefaults in genesis validation and init, enabled BeginBlock/EndBlock
x/supernode/v1/README.md Comprehensive documentation of metrics system, parameters, and state transitions
tests/systemtests/supernode_metrics_test.go E2E tests for metrics reporting, postponement, and recovery
tests/systemtests/supernode_metrics_staleness_test.go E2E tests for staleness detection and recovery flows
tests/systemtests/cli.go Added FundAddressWithNode helper for multi-node testing
go.mod Promoted github.com/Masterminds/semver/v3 to direct dependency
Various test files Unit and system tests covering new functionality

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@roomote
Copy link

roomote bot commented Dec 16, 2025

Rooviewer Clock   Follow along on Roo Cloud

Reviewing your PR now. I'll share feedback in a few minutes!

@roomote
Copy link

roomote bot commented Dec 16, 2025

Rooviewer Clock   Follow along on Roo Cloud

Reviewing your PR now. Feedback coming soon!

@a-ok123 a-ok123 self-requested a review December 16, 2025 14:42
a-ok123
a-ok123 previously approved these changes Dec 16, 2025
mateeullahmalik and others added 4 commits December 17, 2025 17:58
* Implement supernode metrics reporting and postponement

* Fix metrics helper wiring and postpone utils

* Simplify metrics report compliance aggregation

* Ensure deterministic metrics validation and add tests

* Handle partial supernode params updates

* Default missing supernode params

* Fix metrics compliance tests

* Extend governance proposal wait in system tests
@mateeullahmalik mateeullahmalik merged commit 0615b38 into master Dec 17, 2025
5 of 6 checks passed
@mateeullahmalik mateeullahmalik deleted the supernode-self-report branch December 19, 2025 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants