Skip to content

feat(sequencer): catchup from base#3057

Draft
julienrbrt wants to merge 21 commits intomainfrom
julien/catchup-base
Draft

feat(sequencer): catchup from base#3057
julienrbrt wants to merge 21 commits intomainfrom
julien/catchup-base

Conversation

@julienrbrt
Copy link
Member

Closes: #3016

Opus did a pretty good job. I'll be improving it.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedFeb 18, 2026, 5:02 PM

@claude
Copy link
Contributor

claude bot commented Feb 9, 2026

Claude finished @julienrbrt's task in 2m 2s —— View job


Code Review

  • Gathering context
  • Reading changed files
  • Analyzing code quality & architecture
  • Reviewing error handling & resilience
  • Checking security
  • Evaluating performance
  • Reviewing tests
  • Writing review feedback

Summary

This PR implements a sequencer catch-up mechanism that allows the single sequencer to resynchronize after extended downtime by replaying missed DA epochs with forced-inclusion transactions only. The syncer's DAHeight advancement logic is also updated to stay aligned with the epoch being replayed (rather than jumping to the submission height). The design and motivation are sound. Below are findings ranging from correctness bugs to minor style suggestions.


Critical Issues

1. Timestamp non-monotonicity during catch-up (potential chain halt)

File: pkg/sequencers/single/sequencer.go:370-377

When catch-up produces multiple blocks for a single DA epoch, each block receives a timestamp derived from currentDAEndTime adjusted by remaining forced txs:

timestamp = daEndTime.Add(-time.Duration(remainingForcedTxs) * time.Millisecond)

If an epoch has, say, 3 forced txs split across 2 blocks (first block gets 2 txs, second gets 1), the timestamps for the two blocks would be:

  • Block 1: daEndTime - 2ms (before fetching epoch, cachedForcedInclusionTxs has 3 txs, TxIndex=0, remaining=3) → daEndTime - 3ms
  • Block 2: daEndTime - 1ms

While this gives different timestamps, the logic is fragile: the millisecond spacing is arbitrary, the value of remainingForcedTxs is computed on the already-decremented TxIndex, and the formula does not guarantee strict increase relative to the previous block's timestamp. If currentState.LastBlockTime from the previous block happens to be within a few milliseconds of the computed value, the execution layer could reject the block.

For an epoch with 0 forced txs (empty epoch), c.currentDAEndTime is not updated (line 481-483 only sets it when !forcedTxsEvent.Timestamp.IsZero()). After the empty epoch, the next block reuses the timestamp from the prior non-empty epoch, which may be older than currentState.LastBlockTime, causing a definite monotonicity violation.

Recommendation: Track lastProducedTimestamp and guarantee each successive timestamp is at least 1ns (or 1 block-time unit) greater than the previous one, even during catch-up. Alternatively, clamp timestamp = max(computedDATimestamp, lastBlockTime + minimumIncrement). Fix this →


2. Unsynchronized access to currentDAEndTime

File: pkg/sequencers/single/sequencer.go:71

currentDAEndTime is a plain time.Time field (a struct with three fields internally). It is written in fetchNextDAEpoch (line 482) and read in GetNextBatch (line 370). While the single sequencer's block-production loop typically calls these sequentially in the same goroutine, isCatchingUp() is a public-ish method (unexported but named suggestively) that could be called from monitoring code concurrently. More importantly, if fetchNextDAEpoch and GetNextBatch are ever invoked from different goroutines (e.g., during testing or future refactors), there is a data race on time.Time.

The catchUpState was correctly made an atomic.Int32, but currentDAEndTime was not given equivalent protection.

Recommendation: Replace currentDAEndTime time.Time with an atomic.Pointer[time.Time] (or protect it with a mutex). Fix this →


Medium Issues

3. Syncer DAHeight catch-up logic can under-advance

File: block/internal/syncing/syncer.go:784-805

The catch-up epoch advancement in TrySyncNextBlock advances DAHeight to nextEpochStart = epochEnd + 1. However, types.CalculateEpochBoundaries computes the epoch for newState.DAHeight, which is the value before this block was applied. On every block during catch-up, the syncer advances exactly one epoch. This is correct as long as every catch-up block corresponds to exactly one DA epoch.

The subtle risk is that after catch-up ends and the gap closes (gap <= epochSize), the code falls into newState.DAHeight = event.DaHeight, which may still skip multiple DA heights in one step. This is fine functionally but worth a comment noting that the gap condition gap > epochSize was chosen to be exactly one epoch, not zero, to handle the normal case where a block is submitted at a DA height slightly ahead of the current epoch end.

4. ErrForceInclusionNotConfigured resets catch-up state unconditionally

File: pkg/sequencers/single/sequencer.go:470-474

} else if errors.Is(err, block.ErrForceInclusionNotConfigured) {
    c.cachedForcedInclusionTxs = [][]byte{}
    c.catchUpState.Store(catchUpDone)
    return 0, nil
}

This code path is reached on every call to fetchNextDAEpoch when forced inclusion is not configured. However, updateCatchUpState already handles this case (returns early when epochSize == 0, leaving catchUpDone set). The catchUpDone store here is harmless but redundant. What's not handled is if ErrForceInclusionNotConfigured is returned while catchUpState == catchUpInProgress — in that case catch-up is silently aborted without a log message.

5. updateCatchUpState uses epoch-number arithmetic that may be misleading

File: pkg/sequencers/single/sequencer.go:565-578

currentEpoch := types.CalculateEpochNumber(currentDAHeight, daStartHeight, epochSize)
latestEpoch := types.CalculateEpochNumber(latestDAHeight, daStartHeight, epochSize)
missedEpochs := latestEpoch - currentEpoch
if missedEpochs <= 1 {

The threshold <= 1 means catch-up is skipped if the sequencer is within the same or next epoch. This is fine for the intended use case, but note that if latestEpoch - currentEpoch == 1, the sequencer may still have missed the entire prior epoch's worth of forced txs (e.g., if it stopped at the very start of an epoch). The comment explains the "no catch-up within one epoch" design choice, but it's worth documenting why one missed epoch is acceptable (presumably because the single sequencer would have processed forced txs from the prior epoch before stopping).

6. GetLatestDAHeight called with a bare context.Background() (no timeout)

The daClient.GetLatestDAHeight(ctx) call in updateCatchUpState (line 548) passes through the ctx from fetchNextDAEpoch, which ultimately comes from the block-production loop's context. This is fine for cancellation, but there is no timeout on the individual call. The daClient may internally add a timeout (the concrete client in block/internal/da/client.go adds a timeout via defaultTimeout), but the mock in tests does not. If the DA RPC hangs, it will block the block-production loop indefinitely.


Minor Issues

7. Magic constant time.Millisecond in timestamp computation

File: pkg/sequencers/single/sequencer.go:376

timestamp = daEndTime.Add(-time.Duration(remainingForcedTxs) * time.Millisecond)

This constant is unexplained. If the intent is to spread multiple catch-up blocks within a DA epoch, it should be a named constant. This also conflicts with the 30*time.Second magic constant concern already raised by Gemini.

8. isCatchingUp() is unexported but resembles a public API

File: pkg/sequencers/single/sequencer.go:429

The method isCatchingUp() is unexported, but is used in tests via seq.isCatchingUp(). Tests in the same package can access it, so that's valid. However, if external callers (e.g., RPC handlers, metrics) ever need this, the exported name IsCatchingUp() would be needed. Currently it doesn't exist. If it's intended as internal-only, the name is fine; if external access is anticipated, export it now.

9. Test logger inconsistency

File: pkg/sequencers/single/sequencer_test.go (multiple locations)

Gemini's observation stands: some tests use zerolog.New(zerolog.NewConsoleWriter()) (e.g., test at line ~1250) while others use zerolog.New(zerolog.NewTestWriter(t)) or zerolog.Nop(). This causes noisy console output during go test. Standardize on zerolog.New(zerolog.NewTestWriter(t)) throughout the new catch-up tests. Fix this →

10. RetrieveForcedInclusion missing tracing in tracedClient

File: block/internal/da/tracing.go

client.go adds RetrieveForcedInclusion and GetLatestDAHeight methods. GetLatestDAHeight is correctly traced (line 126-138 in tracing.go). However, RetrieveForcedInclusion is not traced in tracedClient. This means spans are missing for forced inclusion retrieval, which is a new hot path in this feature.

Fix this →

11. Empty catch-up block skips currentDAEndTime update

File: pkg/sequencers/single/sequencer.go:481-483

if !forcedTxsEvent.Timestamp.IsZero() {
    c.currentDAEndTime = forcedTxsEvent.Timestamp.UTC()
}

If a DA epoch has no forced txs (and thus no timestamp, or a zero timestamp), currentDAEndTime is not updated. The next catch-up block will use the previous epoch's timestamp. Combined with issue #1, this can cause timestamp regression.

12. Catch-up detection is skipped on GetLatestDAHeight failure

File: pkg/sequencers/single/sequencer.go:549-553

if err != nil {
    c.logger.Warn().Err(err).
        Msg("failed to get latest DA height for catch-up detection, skipping check")
    return
}

When the DA client fails, catchUpState has already been set to catchUpDone (line 537). This means the sequencer will proceed with normal (non-catch-up) operation after a transient DA failure. This is likely the desired behavior (fail open), but the comment on line 536 ("Optimistically mark as done; overridden to catchUpInProgress below if catch-up is actually needed") doesn't make this explicit. Consider a note that if the check fails, we assume we're caught up to be conservative.


Code Quality Observations

Positive:

  • The catchUpState state machine using atomic.Int32 with well-named constants (catchUpUnchecked, catchUpInProgress, catchUpDone) is clean and self-documenting.
  • The syncer's incremental epoch advancement logic in TrySyncNextBlock is well-commented with a clear explanation of why jumping directly to event.DaHeight would be incorrect.
  • Comprehensive unit tests covering detection, mempool skipping, timestamp usage, DA-head exit, and multi-epoch replay. The test coverage for the happy path is good.
  • The design decision to enter catch-up based on epoch number gap (not wall-clock time) is robust.
  • Using atomic.Int32 with CompareAndSwap-style patterns avoids the data race that existed on the timestamp field.

Concerns:

  • The currentDAEndTime field is not protected by the same atomic mechanisms as catchUpState. Given the other fields use atomic.*, this stands out as an oversight.
  • The timestamp monotonicity issue in catch-up is the most likely source of production bugs; most EVM chains enforce strictly increasing block timestamps.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @julienrbrt, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a critical "catch-up" feature for the sequencer, designed to enhance its resilience and consistency following periods of inactivity or downtime. By intelligently replaying missed Data Availability (DA) epochs, the sequencer can resynchronize its state, ensuring that all forced inclusion transactions are processed in the correct order and with accurate timestamps, thereby maintaining a consistent view of the blockchain's history, even when operating in a decentralized environment. This mechanism allows the sequencer to seamlessly transition back to normal operation without compromising data integrity.

Highlights

  • Sequencer Catch-up Mechanism: Introduced a new mechanism allowing the sequencer to "catch up" on missed Data Availability (DA) epochs after extended downtime.
  • Forced Inclusion Priority: During catch-up, the sequencer temporarily processes only forced inclusion transactions, ensuring consistency with base sequencing nodes and temporarily skipping mempool transactions.
  • DA Height Synchronization: Modified syncer.go to incrementally advance the DAHeight by one epoch during catch-up, preventing issues with forced inclusion transaction verification.
  • Timestamp Alignment: Block timestamps generated during catch-up are now aligned with the DA epoch's end timestamp for historical accuracy.
  • Comprehensive Testing: Added extensive unit tests covering various catch-up scenarios, including detection, mempool skipping, timestamp usage, exit conditions, and multi-epoch replay.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • block/internal/syncing/syncer.go
    • Updated the logic for DAHeight updates to handle large discrepancies between event.DaHeight and newState.DAHeight. When a significant gap (more than one epoch) is detected, newState.DAHeight is now advanced by exactly one epoch per block, rather than jumping directly to event.DaHeight. This ensures correct verification of forced inclusion transactions during sequencer catch-up.
    • Added detailed comments explaining the new DAHeight update strategy and its importance for catch-up blocks and forced inclusion verification.
  • pkg/sequencers/single/sequencer.go
    • Added catchingUp (boolean) and currentDAEndTime (time.Time) to the Sequencer struct to manage the catch-up state and store the DA epoch's end timestamp.
    • Modified GetNextBatch to conditionally skip fetching transactions from the mempool when the sequencer is in catchingUp mode. This ensures that only forced inclusion transactions are included in batches during catch-up.
    • Updated GetNextBatch to use sequencer.currentDAEndTime as the block timestamp when catchingUp is true, aligning block timestamps with the historical DA epoch.
    • Introduced a public IsCatchingUp() method to query the sequencer's current catch-up status.
    • Modified fetchNextDAEpoch to store the forcedTxsEvent.Timestamp in currentDAEndTime and to call a new updateCatchUpState method to manage the catchingUp flag. It also now exits catch-up mode if StatusHeightFromFuture is returned by the DA client.
    • Implemented a new private method updateCatchUpState that determines whether the sequencer should be in catch-up mode based on the timestamp of the fetched DA epoch. It enters catch-up if the epoch timestamp is significantly in the past (more than one epoch's wall-clock duration) and exits if it's recent.
  • pkg/sequencers/single/sequencer_test.go
    • Added a comprehensive suite of tests (TestSequencer_CatchUp_...) to validate the new catch-up functionality. These tests cover:
    • Detection of old epochs and entering catch-up mode.
    • Skipping mempool transactions during catch-up.
    • Using DA epoch timestamps for blocks during catch-up.
    • Exiting catch-up mode when the DA head is reached (either a recent epoch or HeightFromFuture).
    • Scenarios where catch-up is not triggered (recent epoch, no forced inclusion configured).
    • Multi-epoch replay scenarios.
    • Correct advancement of the internal checkpoint (DAHeight) during catch-up.
Activity
  • The pull request addresses and closes issue https://github.com/evstack/ev-node/issues/3016.
  • The author indicates that this work builds upon a previous attempt by "Opus", aiming for improvement.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codecov
Copy link

codecov bot commented Feb 9, 2026

Codecov Report

❌ Patch coverage is 62.71186% with 44 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.25%. Comparing base (a16e4c9) to head (53dde2d).

Files with missing lines Patch % Lines
block/internal/syncing/syncer.go 22.22% 13 Missing and 1 partial ⚠️
block/internal/da/tracing.go 0.00% 10 Missing ⚠️
block/internal/da/client.go 0.00% 9 Missing ⚠️
pkg/sequencers/single/sequencer.go 91.89% 4 Missing and 2 partials ⚠️
block/internal/executing/executor.go 28.57% 3 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3057      +/-   ##
==========================================
+ Coverage   61.16%   61.25%   +0.08%     
==========================================
  Files         113      113              
  Lines       11444    11552     +108     
==========================================
+ Hits         7000     7076      +76     
- Misses       3655     3686      +31     
- Partials      789      790       +1     
Flag Coverage Δ
combined 61.25% <62.71%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a robust catch-up mechanism for the sequencer, designed to handle restarts after extended downtime. While the overall approach for consuming and verifying catch-up blocks in the syncer, including the incremental advancement of DAHeight, is well-implemented and tested, the implementation of catch-up mode in the single sequencer has significant flaws. Specifically, it produces non-monotonic block timestamps when multiple blocks are generated for a single DA epoch or when empty epochs are encountered, which will likely cause the execution layer to reject blocks and halt the chain. Additionally, there is a data race on the new catch-up state fields due to a lack of synchronization primitives. Minor suggestions for code clarity and testing experience were also noted.

Comment on lines 362 to 365
timestamp := time.Now()
if c.catchingUp && !c.currentDAEndTime.IsZero() {
timestamp = c.currentDAEndTime
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

During catch-up mode, the sequencer uses the timestamp from the last fetched DA epoch as the block timestamp. However, multiple blocks can be produced for a single DA epoch (e.g., if the epoch contains many transactions that don't fit in one block). In this case, all blocks produced for that epoch will have the exact same timestamp (c.currentDAEndTime). Furthermore, if an empty DA epoch is encountered, c.currentDAEndTime is not updated, causing the next block to reuse the timestamp from the previous non-empty epoch.

Most blockchain execution layers (like EVM) require that each new block has a timestamp strictly greater than its parent. If the sequencer produces multiple blocks with the same timestamp, the execution layer will reject them, halting the block production process and causing a Denial of Service.

Comment on lines 60 to 63
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The fields catchingUp and currentDAEndTime are accessed concurrently without proper synchronization. GetNextBatch and fetchNextDAEpoch (which update these fields) can be called by the block production loop, while IsCatchingUp() can be called concurrently by other components (e.g., RPC, monitoring, or metrics).

In Go, concurrent read/write access to non-atomic fields is a data race, which leads to undefined behavior. Specifically, a race on time.Time (which is a struct) can result in a corrupted timestamp being read, potentially leading to invalid blocks and a Denial of Service.

Comment on lines 533 to 535
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The value 30*time.Second is used as a minimum threshold for the catch-up logic. To improve readability and maintainability, consider defining this as a named constant at the package level, for example: const minCatchUpThreshold = 30 * time.Second, and using that constant here. This makes the intent of the value clearer and centralizes its definition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using zerolog.New(zerolog.NewConsoleWriter()) in tests can lead to verbose output in the console during test runs. It's generally better to use a logger that integrates with the testing framework, such as zerolog.New(zerolog.NewTestWriter(t)), which only shows logs for failed tests, or zerolog.Nop() if logs are not needed. This practice is followed in other new tests in this file (e.g., TestSequencer_CatchUp_UsesDATimestamp). Applying this consistently would improve the developer experience. This comment also applies to lines 1309, 1457, and 1676.

Suggested change
logger := zerolog.New(zerolog.NewConsoleWriter())
logger := zerolog.New(zerolog.NewTestWriter(t))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Sequencer catchup on restart from base

1 participant

Comments