fix: ensure time_since_last_block accuracy with polling fallback #28

auricom · 2025-12-22T08:25:56Z

Problem

Two critical issues were identified in the metrics system:

time_since_last_block_seconds showing outdated large values: The metric was only updated by the verifier's WebSocket subscription, but when that failed, the metric would show increasingly large values (e.g., 220,366 seconds = ~2.5 days) even though the drift exporter's HTTP polling was correctly seeing new blocks at height 3,709,309.
block_time_seconds_bucket corruption after ~190 blocks: The initial fix attempt incorrectly used RecordBlockTime() in the drift exporter, which corrupted the block time histogram by recording invalid inter-block times from polling data.

Solution

Implemented a proper separation of concerns between real-time and polling metrics:

Added new UpdateLastBlockTime() method in pkg/metrics/metrics.go:
- Updates last block arrival time and resets time_since_last_block
- Does NOT record block time histogram (unlike RecordBlockTime())
- Designed specifically for pollers that can't measure inter-block time
Updated drift exporter in pkg/exporters/drift/drift.go:
- Changed from RecordBlockTime() to UpdateLastBlockTime()
- Added calls when recording both reference node and full node heights
- Provides redundancy for time_since_last_block metric

Changes Made

pkg/metrics/metrics.go: Added UpdateLastBlockTime() method
pkg/exporters/drift/drift.go: Updated to use UpdateLastBlockTime() instead of RecordBlockTime()

Impact

✅ time_since_last_block_seconds now stays accurate even if WebSocket fails
✅ block_time_seconds_bucket continues to work correctly for SLO calculations
✅ No more metric corruption after prolonged running
✅ Proper redundancy between real-time and polling metrics
✅ Maintains backward compatibility and all existing functionality

Testing

Build passes: go build -o ev-metrics .
Tests pass: go test ./...
No breaking changes to existing API

gemini-code-assist · 2025-12-22T08:26:13Z

Summary of Changes

Hello @auricom, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the reliability and accuracy of block-related metrics within the system. It addresses critical issues where "time_since_last_block_seconds" could become stale and "block_time_seconds_bucket" could be corrupted, by implementing a clear separation of concerns for metric updates. A new method, "UpdateLastBlockTime()", is introduced for polling mechanisms, ensuring metrics are consistently updated and accurate, even in the event of real-time data source failures, without introducing side effects to other metrics.

Highlights

time_since_last_block_seconds accuracy: Ensures the "time_since_last_block_seconds" metric remains accurate by updating it via polling mechanisms, even if the primary WebSocket subscription fails, preventing it from showing outdated large values.
block_time_seconds_bucket integrity: Prevents corruption of the "block_time_seconds_bucket" histogram by no longer using "RecordBlockTime()" with polling data, which previously recorded invalid inter-block times.
Dedicated polling metric update: Introduces a new "UpdateLastBlockTime()" method specifically for pollers, allowing them to update the last block arrival time and reset the "time_since_last_block" metric without affecting the block time histogram.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new UpdateLastBlockTime method in the metrics package, designed to update the last block arrival time and reset the 'time since last block' metric, particularly for pollers to ensure metric accuracy even if WebSocket subscriptions fail. This new method is then integrated into the drift exporter, where it's called after recording reference node and full node block heights. The reviewer suggests refactoring the UpdateLastBlockTime method by extracting its core logic into an unexported helper function, updateLastBlockTimeUnsafe, to reduce code duplication and improve maintainability, proposing that both UpdateLastBlockTime and the existing RecordBlockTime method could utilize this helper.

gemini-code-assist · 2025-12-22T08:28:05Z

pkg/metrics/metrics.go

+// UpdateLastBlockTime updates the last block arrival time and resets time since last block metric
+// without recording block time histogram. This is used by pollers that can't measure inter-block time.
+func (m *Metrics) UpdateLastBlockTime(chainID string, arrivalTime time.Time) {
+	m.mu.Lock()
+	defer m.mu.Unlock()
+
+	// update last seen arrival time
+	m.lastBlockArrivalTime[chainID] = arrivalTime
+	// reset time since last block to 0
+	m.TimeSinceLastBlock.WithLabelValues(chainID).Set(0)
+}


To improve maintainability and reduce code duplication, you can extract the common logic from UpdateLastBlockTime and RecordBlockTime into an unexported helper function. This new function would handle updating lastBlockArrivalTime and resetting TimeSinceLastBlock. Both UpdateLastBlockTime and RecordBlockTime (which is outside the diff but can be refactored separately) can then call this helper while holding the lock. This makes the code cleaner and ensures consistency.

// UpdateLastBlockTime updates the last block arrival time and resets time since last block metric // without recording block time histogram. This is used by pollers that can't measure inter-block time. func (m *Metrics) UpdateLastBlockTime(chainID string, arrivalTime time.Time) { m.mu.Lock() defer m.mu.Unlock() m.updateLastBlockTimeUnsafe(chainID, arrivalTime) } // updateLastBlockTimeUnsafe is an unexported helper that updates the last block arrival time // and resets the time since last block gauge. // This function is not thread-safe and should be called with a lock held. func (m *Metrics) updateLastBlockTimeUnsafe(chainID string, arrivalTime time.Time) { // update last seen arrival time m.lastBlockArrivalTime[chainID] = arrivalTime // reset time since last block to 0 m.TimeSinceLastBlock.WithLabelValues(chainID).Set(0) }

fix: ensure time_since_last_block accuracy with polling fallback

b27dc0d

gemini-code-assist bot reviewed Dec 22, 2025

View reviewed changes

gemini assist

2566db1

auricom merged commit e20a6b2 into main Dec 22, 2025
3 checks passed

auricom deleted the claude/connection_fix branch December 22, 2025 14:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: ensure time_since_last_block accuracy with polling fallback #28

fix: ensure time_since_last_block accuracy with polling fallback #28

Uh oh!

auricom commented Dec 22, 2025

Uh oh!

gemini-code-assist bot commented Dec 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: ensure time_since_last_block accuracy with polling fallback #28

fix: ensure time_since_last_block accuracy with polling fallback #28

Uh oh!

Conversation

auricom commented Dec 22, 2025

Problem

Solution

Changes Made

Impact

Testing

Uh oh!

gemini-code-assist bot commented Dec 22, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants