Skip to content

Conversation

@julienrbrt
Copy link
Member

ref: #2906

@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedJan 9, 2026, 3:37 PM

@claude
Copy link
Contributor

claude bot commented Jan 5, 2026

Claude finished @julienrbrt's task in 4m 34s —— View job


Comprehensive Code Review

I've completed a thorough review of the async epoch fetching implementation. Here's my analysis across all key areas:

Review Summary

This PR introduces a well-designed async prefetching mechanism for DA blocks that significantly improves performance by caching upcoming epoch data. The implementation is solid overall with good concurrency patterns and proper resource cleanup. However, there are several areas that need attention.


1. Code Quality & Architecture

✅ Strengths

  • Clean separation of concerns: AsyncBlockRetriever handles prefetching, ForcedInclusionRetriever orchestrates DA fetching
  • Good use of interfaces and dependency injection
  • Proper context handling throughout
  • Idiomatic Go code style

⚠️ Issues Found

CRITICAL: Fragile Key Parsing (async_block_retriever.go:318)

var height uint64
_, err := fmt.Sscanf(key.String(), "/block/%d", &height)

This parsing is fragile and error-prone. The recommended approach is to use key.Name() with strconv.ParseUint:

heightStr := key.Name()
height, err := strconv.ParseUint(heightStr, 10, 64)

This is more robust and follows datastore key conventions.

Missing Documentation (async_block_retriever.go:22-28)
The AsyncBlockRetriever interface lacks godoc comments. All exported interfaces should be documented:

// AsyncBlockRetriever provides background prefetching of DA blocks with caching.
// It maintains an in-memory cache and automatically prefetches blocks within a
// configurable window ahead of the current DA height.
type AsyncBlockRetriever interface { ... }

Package Organization (forced_inclusion_retriever.go:36-44)
The NewForcedInclusionRetriever function creates and starts the async fetcher internally. Consider whether this is the right level of control - callers have no visibility into or control over the async fetcher's lifecycle independently.


2. Error Handling & Resilience

✅ Strengths

  • Proper error wrapping with context
  • Graceful handling of StatusHeightFromFuture for retry
  • Good use of errors.Is() for error type checking

⚠️ Issues Found

Silent Error in Cleanup (async_block_retriever.go:305-306)

if err != nil {
    f.logger.Debug().Err(err).Msg("failed to query cache for cleanup")
    return
}

This returns early on any query error, potentially blocking all cleanup. Consider logging and continuing:

if err != nil {
    f.logger.Warn().Err(err).Msg("failed to query cache for cleanup - skipping")
    return
}

Context Cancellation in Prefetch (async_block_retriever.go:223)
The fetchAndCacheBlock method uses f.ctx which may be cancelled during shutdown. While this is intentional, consider whether partial writes to cache during shutdown could cause issues.

Error Accumulation in RetrieveForcedIncludedTxs (forced_inclusion_retriever.go:144-170)
Errors are accumulated with errors.Join() but the function continues processing. This could lead to partial data being returned. The current behavior returns empty event on any error, which is correct, but the logic could be clearer.


3. Security

✅ Strengths

  • No credential handling
  • Input validation for DA heights
  • Bounds checking on array access

⚠️ Issues Found

Potential Memory Exhaustion (async_block_retriever.go:243-248)

for _, blob := range result.Data {
    if len(blob) > 0 {
        block.Blobs = append(block.Blobs, blob)
    }
}

There's no validation on the total size or count of blobs returned from DA. A malicious or buggy DA layer could return massive blobs leading to memory exhaustion. Consider adding:

  • Maximum blob size validation
  • Maximum number of blobs per height validation

DoS Risk in Cache (async_block_retriever.go:277)
The cache uses an in-memory datastore with no size limits beyond prefetch window cleanup. If cleanup fails or DA heights contain very large blobs, memory could grow unbounded. Consider:

  • Setting a maximum cache size
  • Implementing LRU eviction
  • Adding memory pressure monitoring

4. Performance & Resource Efficiency

✅ Strengths

  • Excellent use of background prefetching to reduce latency
  • Proper use of atomic operations for currentDAHeight
  • Good cache hit optimization in RetrieveForcedIncludedTxs

⚠️ Issues Found

Inefficient Cleanup Query (async_block_retriever.go:302-331)
The cleanup iterates all cache keys on every prefetch cycle. With a large prefetch window, this could become expensive. Consider:

  • Only running cleanup periodically (e.g., every 10 prefetch cycles)
  • Tracking oldest key to avoid full scans
  • Using a TTL-based cache instead

Redundant Cache Checks (async_block_retriever.go:202-207)

_, err := f.cache.Get(f.ctx, key)
if err == nil {
    continue
}

This performs a full cache Get just to check existence. Consider tracking prefetched heights in a map or bloom filter for O(1) existence checks.

Serialization Overhead (async_block_retriever.go:262-274)
Every cache write serializes to protobuf. For hot paths, consider whether the serialization overhead is justified, or if keeping raw BlockData in memory would be more efficient.

Blocking Prefetch Loop (async_block_retriever.go:198-211)
Prefetch happens sequentially for all blocks in the window. Consider parallelizing with a worker pool:

var wg sync.WaitGroup
for i := uint64(0); i < f.prefetchWindow; i++ {
    wg.Add(1)
    go func(h uint64) {
        defer wg.Done()
        // Check cache and fetch
    }(currentHeight + i)
}
wg.Wait()

5. Testing & Reliability

✅ Strengths

  • Good test coverage for core functionality
  • Tests use mocks appropriately
  • Tests verify both success and error paths

⚠️ Issues Found

Flaky Test: Time-Based Polling (async_block_retriever_test.go:85-98)

for i := 0; i < 40; i++ {
    block, err = fetcher.GetCachedBlock(ctx, 100)
    require.NoError(t, err)
    if block != nil {
        break
    }
    time.Sleep(50 * time.Millisecond)
}

This test could be flaky on slow CI systems. Consider using a channel or synchronization primitive to signal when prefetch completes rather than polling.

Missing Test: Restart Scenario (Addressed in review comments)
As noted by tac0turtle, there should be a test for restart scenarios where cache is lost. While the parent component handles this, unit tests should verify the component behaves correctly when starting with empty cache.

Missing Test: Concurrent Access
No tests verify concurrent access to the cache or concurrent calls to UpdateCurrentHeight. Add race detection tests:

func TestAsyncBlockRetriever_ConcurrentUpdates(t *testing.T) {
    // Test concurrent UpdateCurrentHeight and GetCachedBlock calls
}

Missing Test: Cache Cleanup
No tests verify that cleanupOldBlocks actually removes old entries from cache. Add a test that:

  1. Populates cache with old heights
  2. Updates current height forward
  3. Verifies old heights are removed

6. Concurrency Patterns & Safety

✅ Strengths

  • Proper use of atomic.Uint64 for currentDAHeight
  • Good goroutine lifecycle management with context and WaitGroup
  • Thread-safe cache with dsync.MutexWrap

⚠️ Issues Found

Compare-and-Swap Loop Could Spin (async_block_retriever.go:114-125)

for {
    current := f.currentDAHeight.Load()
    if height <= current {
        return
    }
    if f.currentDAHeight.CompareAndSwap(current, height) {
        return
    }
}

While unlikely, this infinite loop could spin if CompareAndSwap fails repeatedly. Consider adding a maximum retry count or using a simple mutex here since contention is expected to be low.

Race Condition in Cleanup (async_block_retriever.go:293-331)
The cleanup function queries the cache and then deletes entries. Between query and delete, the cache could be modified by prefetch operations. This is benign (cleanup just skips the entry), but the race should be documented or the logic should use transactions if the datastore supports them.


7. Documentation & Developer Experience

⚠️ Issues Found

Missing Package Documentation (async_block_retriever.go:1)
The package lacks a package-level comment explaining its purpose and architecture.

Unclear Prefetch Window Semantics (async_block_retriever.go:72-74)
The default prefetch window of 10 blocks is not explained. Add documentation:

// prefetchWindow determines how many DA blocks ahead to prefetch.
// A larger window reduces cache misses but increases memory usage.
// For epoch-based systems, this should typically be >= epoch size.
if prefetchWindow == 0 {
    prefetchWindow = 10 // Default: prefetch next 10 blocks
}

Missing Metric/Observability
No metrics are exported for:

  • Cache hit/miss rates
  • Prefetch success/failure rates
  • Cache size
  • Cleanup operations

Consider adding prometheus metrics or at least debug logs with counts.


8. Integration Issues

Lifecycle Management (sequencer.go:114, based/sequencer.go:82)
Sequencers create ForcedInclusionRetriever which internally starts an async fetcher, but there's no explicit cleanup in sequencer lifecycle. While the retriever has a Stop() method, it's unclear if/when sequencers call it. Verify sequencers properly stop the retriever on shutdown.

Missing Call to Stop (single/sequencer.go)
The single sequencer creates a new fiRetriever at line 114 and potentially recreates it at line 187, but I don't see where fiRetriever.Stop() is called during sequencer shutdown. This could leak goroutines. Add:

func (c *Sequencer) Stop() {
    if c.fiRetriever != nil {
        c.fiRetriever.Stop()
    }
}

Summary of Required Changes

High Priority

  1. Fix fragile key parsing with key.Name() and strconv.ParseUint - block/internal/da/async_block_retriever.go:318
  2. Add lifecycle management: ensure sequencers call fiRetriever.Stop() on shutdown - pkg/sequencers/single/sequencer.go
  3. Add blob size validation to prevent memory exhaustion - block/internal/da/async_block_retriever.go:243-248
  4. Add test for concurrent access and race conditions - block/internal/da/async_block_retriever_test.go

Medium Priority

  1. Optimize cleanup to run periodically instead of every prefetch - block/internal/da/async_block_retriever.go:293-331
  2. Add cache size limits and memory pressure handling - block/internal/da/async_block_retriever.go:83
  3. Add comprehensive godoc for exported interfaces and functions
  4. Consider parallelizing prefetch operations for better performance

Low Priority

  1. Add observability metrics for cache performance
  2. Improve test stability by removing time-based polling
  3. Add test coverage for cache cleanup functionality

Positive Notes

  • The async prefetching architecture is well-designed and should significantly reduce DA fetch latency
  • Concurrency patterns are generally sound with proper atomic operations
  • Error handling is comprehensive with good retry logic
  • Integration with existing sequencer code is clean
  • Test coverage demonstrates good understanding of edge cases

The implementation is production-ready with the high-priority fixes applied. The medium and low priority items are optimizations that would improve robustness and observability.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @julienrbrt, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements an asynchronous epoch fetching mechanism to optimize the retrieval of forced inclusion transactions from the Data Availability (DA) layer. By introducing an AsyncEpochFetcher that prefetches and caches upcoming epoch data in the background, the system can significantly reduce the latency associated with fetching this data, particularly at epoch transitions. This enhancement improves the overall performance and responsiveness of sequencers and the block syncing process by providing immediate access to frequently needed DA epoch information.

Highlights

  • Introduced AsyncEpochFetcher: A new component for background prefetching and caching of Data Availability (DA) epoch data, designed to reduce latency at epoch boundaries.
  • Integrated with ForcedInclusionRetriever: The existing ForcedInclusionRetriever now utilizes the AsyncEpochFetcher to first check for cached epoch data before performing a synchronous fetch.
  • Deployment across Sequencers and Syncer: The AsyncEpochFetcher is instantiated and started in the createSequencer functions for EVM, gRPC, and TestApp, as well as within the Syncer component, ensuring widespread performance benefits.
  • Serialization and Deserialization: The AsyncEpochFetcher includes custom serialization and deserialization logic for ForcedInclusionEvent to efficiently store and retrieve cached epoch data.
  • Comprehensive Testing: New unit tests have been added for the AsyncEpochFetcher to ensure its correct functionality, including caching, background operations, and graceful shutdown.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an AsyncEpochFetcher component designed to prefetch Data Availability (DA) epoch data in the background, thereby improving performance for sequencers and the block syncer. The new fetcher includes an in-memory cache for storing prefetched epochs, manages background fetching, tracks the current DA height, and cleans up old cached data. The ForcedInclusionRetriever has been updated to leverage this AsyncEpochFetcher, prioritizing cached data before making direct DA client calls. The AsyncEpochFetcher is integrated into the evm, grpc, and testapp applications, the block syncer, and the single sequencer, with corresponding unit tests added and existing tests updated. Review comments indicate that the mu mutex in AsyncEpochFetcher is redundant due to dsync.MutexWrap already providing thread-safety, leading to double-locking, and suggest replacing the fragile fmt.Sscanf for parsing datastore keys with key.Name() and strconv.ParseUint for robustness.


// In-memory cache for prefetched epoch data
cache ds.Batching
mu sync.RWMutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The cache field is already wrapped with dsync.MutexWrap, which makes it thread-safe. The additional mu field is redundant and leads to double-locking. Please remove this mutex and all its usages (f.mu.Lock/Unlock/RLock/RUnlock) throughout the file.

Comment on lines 374 to 378
var epochEnd uint64
_, err := fmt.Sscanf(key.String(), "/epoch/%d", &epochEnd)
if err != nil {
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using fmt.Sscanf to parse the key string is fragile. A more robust and idiomatic way to handle datastore keys is to use key.Name() to get the last component of the key and then parse it using strconv.ParseUint. You will need to import the strconv package.

Suggested change
var epochEnd uint64
_, err := fmt.Sscanf(key.String(), "/epoch/%d", &epochEnd)
if err != nil {
continue
}
var epochEnd uint64
epochEndStr := key.Name()
epochEnd, err := strconv.ParseUint(epochEndStr, 10, 64)
if err != nil {
continue
}

@codecov
Copy link

codecov bot commented Jan 5, 2026

Codecov Report

❌ Patch coverage is 61.58730% with 121 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.76%. Comparing base (453a8a4) to head (a7fdca7).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...quencers/common/forced_inclusion_retriever_mock.go 0.00% 62 Missing ⚠️
block/internal/da/async_block_retriever.go 74.54% 31 Missing and 11 partials ⚠️
block/internal/da/forced_inclusion_retriever.go 84.61% 8 Missing and 4 partials ⚠️
pkg/sequencers/single/sequencer.go 57.14% 3 Missing ⚠️
block/internal/syncing/syncer.go 0.00% 1 Missing ⚠️
block/public.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2952      +/-   ##
==========================================
- Coverage   58.74%   57.76%   -0.99%     
==========================================
  Files          93       97       +4     
  Lines        8863     9306     +443     
==========================================
+ Hits         5207     5376     +169     
- Misses       3067     3326     +259     
- Partials      589      604      +15     
Flag Coverage Δ
combined 57.76% <61.58%> (-0.99%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@julienrbrt julienrbrt marked this pull request as ready for review January 8, 2026 19:25
@julienrbrt julienrbrt requested review from alpe and tac0turtle January 8, 2026 23:39
logger: logger.With().Str("component", "async_block_retriever").Logger(),
namespace: namespace,
daStartHeight: daStartHeight,
cache: dsync.MutexWrap(ds.NewMapDatastore()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are restarts a concern here or is the height reset so we query it again?

Copy link
Member Author

@julienrbrt julienrbrt Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would lose the cache yeah, but the cache is only 2 epochs, so it will be rebuilt up fast.
Additionally, if there is no cache, we hit the da directly

Copy link
Contributor

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK, left two comments overall nice job!!

@julienrbrt julienrbrt added this pull request to the merge queue Jan 9, 2026
@julienrbrt julienrbrt removed this pull request from the merge queue due to a manual request Jan 9, 2026
@julienrbrt julienrbrt enabled auto-merge January 9, 2026 15:38
@julienrbrt julienrbrt added this pull request to the merge queue Jan 9, 2026
Merged via the queue into main with commit aaae087 Jan 9, 2026
34 of 35 checks passed
@julienrbrt julienrbrt deleted the julien/aync-epochs branch January 9, 2026 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants