Skip to content

Conversation

@julienrbrt
Copy link
Member

Closes: #2890

Add submission strategies.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 13, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedJan 16, 2026, 11:59 AM

@claude
Copy link
Contributor

claude bot commented Jan 13, 2026

Claude finished @julienrbrt's task in 1m 56s —— View job


Code Review Complete

I've completed a comprehensive code review of this PR implementing DA submission batching strategies. Overall, this is a solid implementation that introduces an important optimization for DA submissions. The code is well-structured with good test coverage. Below are my findings organized by priority and category.


Architecture & Design ✅

Strengths:

  • Clean separation of concerns with the BatchingStrategy interface allowing pluggable strategies
  • Four well-designed strategy implementations (immediate, size-based, time-based, adaptive)
  • Good use of the existing cache infrastructure for marshalled data caching
  • Proper integration with submitter loop using non-blocking mutex patterns

Observations:

  • The adaptive strategy provides good balance for most use cases and is appropriately set as a reasonable option
  • Configuration is well-documented with helpful comments explaining each parameter

Concurrency & Thread Safety ⚠️

Issue 1: Race condition potential in marshalling cache (Medium Priority)

In pending_base.go:88-90, the clearMarshalledCacheUpTo method uses Range to iterate and delete:

func (pb *pendingBase[T]) clearMarshalledCacheUpTo(height uint64) {
    pb.marshalledCache.Range(func(key, _ any) bool {
        if h := key.(uint64); h <= height {
            pb.marshalledCache.Delete(h)
        }
        return true
    })
}

While sync.Map is thread-safe for individual operations, there's a potential issue: concurrent goroutines might be reading from the cache (in GetPendingHeaders/GetPendingData) while this cleanup is happening. This is generally safe with sync.Map, but consider:

  1. The iteration could miss newly added entries
  2. New entries at the same height could be added between the check and delete

Recommendation: Document this behavior or consider more explicit synchronization if exact-once cleanup is critical.


Performance & Resource Management 🚀

Positive Changes:

  • Marshalling cache (marshalledCache in pending_base.go) is excellent - eliminates redundant marshalling when batching strategy checks sizes multiple times
  • Cache cleanup in setLastSubmittedHeight prevents unbounded memory growth
  • Non-blocking submission pattern using TryLock() and goroutines prevents blocking the ticker loop

Issue 2: Potential memory pressure during high throughput (Low Priority)

The marshalling cache in pendingBase is never bounded except by clearing submitted items. During periods of DA layer issues or delays, this could grow significantly. Consider adding:

  • Maximum cache size limit
  • TTL-based eviction
  • Monitoring/metrics for cache size

Issue 3: Goroutine coordination in submitter (Medium Priority)

In submitter.go:183-228 and 237-283, each ticker fires potentially multiple goroutines:

if s.headerSubmissionMtx.TryLock() {
    go func() {
        defer s.headerSubmissionMtx.Unlock()
        // submission logic
    }()
}

Concerns:

  1. No bounded goroutine pool - rapid ticker could spawn many goroutines during lock contention
  2. No tracking of active submission goroutines during shutdown
  3. Potential for goroutine leaks if context is cancelled while goroutine is waiting

Recommendation:

  • Add goroutine tracking to sync.WaitGroup
  • Consider using a worker pool pattern
  • Ensure goroutines check context in all operations

Error Handling & Resilience ⚠️

Issue 4: Silent marshalling errors lose accuracy (Addressed in review comments)

This has already been discussed in the review comments. The current implementation in submitter.go doesn't handle marshalling errors during size estimation, which could lead to incorrect batching decisions. While the actual submission will catch these errors, the strategy might make suboptimal choices.

Positive: The subsequent actual submission catches marshalling errors properly.

Issue 5: Division by zero potential (Low Priority)

In batching_strategy.go:65 and 129:

threshold := int(float64(maxBlobSize) * s.sizeThreshold)

If maxBlobSize is 0, this returns 0, which is safe. However, it would cause the strategy to never submit (or always submit depending on comparison). Consider adding validation in NewBatchingStrategy or the strategy constructors.

Issue 6: Missing context checks in submission goroutines (Medium Priority)

The submission goroutines in submitter.go don't check for context cancellation before performing expensive operations like fetching and marshalling headers/data. Add context checks:

select {
case <-s.ctx.Done():
    return
default:
    // proceed with submission
}

Code Quality & Go Idioms ✅

Strengths:

  • Good use of atomic operations for lock-free reads (lastHeaderSubmit, lastDataSubmit)
  • Proper use of sync.Mutex with TryLock() pattern
  • Clean test organization with table-driven tests
  • Good use of generics in pendingBase[T]

Issue 7: Type assertion without check (Low Priority)

In pending_base.go:115 and 127:

return val.([]byte)  // line 115
if h := key.(uint64); h <= height {  // line 127

While these should always be the correct types, defensive programming suggests using the two-value form:

if b, ok := val.([]byte); ok {
    return b
}

Configuration & Defaults ✅

Excellent:

  • Comprehensive configuration with sensible defaults
  • BatchMinItems default of 1 allows immediate submission when needed
  • BatchSizeThreshold of 0.8 (80%) provides good balance
  • Default "time" strategy is conservative and safe

Issue 8: MaxBlobSize reduction (Addressed)

I see that DefaultMaxBlobSize was reduced from 8MB to 7MB in consts.go. This is good for padding, but note that:

  • Configuration comments in config.go:197 still reference the 8MB Celestia limit
  • Test files use 8 * 1024 * 1024 directly in several places

Recommendation: Consider using the constant everywhere or documenting why tests use 8MB.


Testing & Coverage 📊

Strengths:

  • Comprehensive unit tests for all batching strategies
  • Good edge case coverage (empty batches, threshold boundaries)
  • Integration test patterns follow project conventions
  • Comparison tests demonstrate strategy differences well

Coverage Analysis:
The codecov report shows 81.89% patch coverage with 44 lines missing. Key gaps:

  • submitter.go: 71.95% - submission error paths and edge cases
  • da_submitter.go: 72.22% - retry logic and error handling paths
  • batching_strategy.go: 93.22% - very good coverage

Recommendations:

  1. Add tests for concurrent access to marshalling cache
  2. Test submission cancellation scenarios
  3. Test cache cleanup edge cases
  4. Add integration tests for different batching strategies

Security Considerations ✅

Good:

  • No sensitive data logged in error messages
  • Proper validation of configuration parameters
  • Safe defaults that don't expose the system to resource exhaustion

Note: The marshalling cache unbounded growth (mentioned above) could be considered a potential DoS vector under adversarial conditions, though unlikely in normal operation.


Documentation & Developer Experience ✅

Excellent:

  • Configuration parameters have clear comments explaining purpose and examples
  • Batching strategy interface is well-documented
  • Good inline comments explaining complex logic
  • CHANGELOG.md updated appropriately

Summary of Priority Issues

High Priority:

  • None identified

Medium Priority:

  1. Goroutine lifecycle management in submitter (no WaitGroup tracking for submission goroutines)
  2. Missing context cancellation checks in submission goroutines
  3. Marshalling cache race condition documentation/handling

Low Priority:

  1. Marshalling cache unbounded growth during DA delays
  2. Type assertions without checks
  3. Division by zero edge cases (mostly handled by defaults)
  4. Test coverage gaps in error paths

Addressed Issues from Previous Reviews ✅

I see that several concerns raised by earlier reviewers have been addressed:

  • ✅ Double marshalling eliminated by caching in pending_base.go
  • ✅ MaxBlobSize reduced to account for padding
  • ✅ Marshalling done once and cached for size estimation

Final Recommendation

This PR represents a solid implementation of DA batching strategies that will improve submission efficiency. The core logic is sound, well-tested, and follows Go best practices. The identified issues are primarily related to robustness under edge cases (high load, DA failures, cancellation) rather than fundamental design problems.

Suggested actions before merge:

  1. Add goroutine tracking to sync.WaitGroup for submission goroutines
  2. Add context checks before expensive operations in submission goroutines
  3. Document the marshalling cache cleanup behavior
  4. Consider adding metrics for marshalling cache size
  5. Add integration test for cancellation scenarios

The code is production-ready with the understanding that the medium-priority issues should be addressed in a follow-up PR if not in this one.

Excellent work on this feature! 👏


@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @julienrbrt, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the Data Availability (DA) layer submission process by introducing a robust and configurable batching system. Instead of immediate submission, the system can now intelligently decide when to send data based on various strategies, such as accumulating a certain amount of data, waiting for a specific time interval, or an adaptive combination of both. This enhancement aims to improve the efficiency and cost-effectiveness of data submissions, particularly for DA layers like Celestia, by better utilizing blob capacity and reducing transaction overhead. The maximum blob size has also been updated to reflect current DA layer capabilities.

Highlights

  • Flexible DA Submission Strategies: Introduced an interface and four concrete implementations for batching strategies: Immediate, Size-Based, Time-Based, and Adaptive. This allows for configurable logic to determine when to submit data to the Data Availability layer, optimizing for latency, throughput, or cost.
  • Increased Default Max Blob Size: The DefaultMaxBlobSize has been increased from 2MB to 8MB across the codebase, aligning with Celestia's current blob size limits. This change allows for larger batches of data to be submitted in a single transaction.
  • Configurable Batching Parameters: New configuration flags and fields have been added to DAConfig to control the chosen batching strategy, size thresholds, maximum delays, and minimum item counts, providing fine-grained control over DA submission behavior.
  • Integrated into Submitter Loop: The DA submission loop in submitter.go has been refactored to utilize the new batching strategies. It now periodically checks the strategy to decide whether to submit pending headers and data, improving efficiency by avoiding premature or undersized submissions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces configurable batching strategies for DA submissions, a valuable feature for optimizing costs and latency. The implementation is well-structured, adding immediate, size, time, and adaptive strategies, along with comprehensive tests. My review focuses on improving the efficiency and robustness of the new logic in the submitter, particularly around size estimation and data fetching.

// Wait if current utilization is below minimum threshold
// Use epsilon for floating point comparison
const epsilon = 0.001
currentUtilization := float64(currentSize) / float64(maxBlobSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a potential for division by zero if maxBlobSize is 0. In Go, floating-point division by zero results in +Inf or NaN rather than a panic, but this can lead to unexpected behavior in the comparison that follows. It would be safer to add a guard against this, similar to the pattern used in calculateBatchMetrics.

Comment on lines 198 to 201
data, err := h.MarshalBinary()
if err == nil {
totalSize += len(data)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Errors from h.MarshalBinary() are silently ignored. This could lead to an inaccurate totalSize, causing the batching strategy to make a suboptimal decision (e.g., delaying a submission). It's better to log these errors for visibility.

Additionally, this size estimation logic is duplicated for data submission. Consider extracting it into a shared helper function to improve maintainability.

              data, err := h.MarshalBinary()
              if err != nil {
                s.logger.Warn().Err(err).Msg("failed to marshal header for size estimation")
                continue
              }
              totalSize += len(data)

Comment on lines 256 to 259
data, err := d.MarshalBinary()
if err == nil {
totalSize += len(data)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the header submission logic, errors from d.MarshalBinary() are silently ignored here. This can lead to inaccurate size estimations and suboptimal batching. It's better to log these errors for improved diagnostics and robustness.

              data, err := d.MarshalBinary()
              if err != nil {
                s.logger.Warn().Err(err).Msg("failed to marshal data for size estimation")
                continue
              }
              totalSize += len(data)

@codecov
Copy link

codecov bot commented Jan 15, 2026

Codecov Report

❌ Patch coverage is 79.83871% with 50 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.13%. Comparing base (52825bf) to head (3d6996b).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
block/internal/submitting/submitter.go 71.95% 18 Missing and 5 partials ⚠️
block/internal/submitting/da_submitter.go 60.97% 10 Missing and 6 partials ⚠️
block/internal/cache/pending_data.go 78.94% 2 Missing and 2 partials ⚠️
block/internal/submitting/batching_strategy.go 93.22% 2 Missing and 2 partials ⚠️
block/internal/cache/pending_headers.go 89.47% 1 Missing and 1 partial ⚠️
block/internal/cache/manager.go 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2973      +/-   ##
==========================================
+ Coverage   58.77%   59.13%   +0.36%     
==========================================
  Files         101      102       +1     
  Lines        9685     9822     +137     
==========================================
+ Hits         5692     5808     +116     
- Misses       3381     3397      +16     
- Partials      612      617       +5     
Flag Coverage Δ
combined 59.13% <79.83%> (+0.36%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@julienrbrt julienrbrt marked this pull request as ready for review January 15, 2026 12:40
Copy link
Contributor

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left two comments, i think we should cleanup marshaling flow to not marshal then not use the data. it will end up in wasted cpu cycles. not a large issue in our case but we should strive to reduce cpu cycles

return nil, fmt.Errorf("failed to sign envelope: %w", err)
}
// Create the envelope and marshal it
envelope, err := header.MarshalDAEnvelope(envelopeSignature)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can open an issue for this as we are marshaling the header twice in this flow. one in the cache(this pr, previously elsewhere) and in the envelope again, but we sign over the encoded data. if we made this a function that takes in the encoded header then it removes the passing of headers on top of marshaled data.

Copy link
Contributor

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice job

@tac0turtle tac0turtle enabled auto-merge January 15, 2026 17:24
@julienrbrt
Copy link
Member Author

checking e2e, will fix!

@tac0turtle tac0turtle added this pull request to the merge queue Jan 16, 2026
Merged via the queue into main with commit 7d30f97 Jan 16, 2026
26 of 27 checks passed
@tac0turtle tac0turtle deleted the julien/da-strategy branch January 16, 2026 12:24
alpe added a commit that referenced this pull request Jan 16, 2026
* main:
  fix: inconsistent state detection and rollback (#2983)
  chore: improve graceful shutdown restarts (#2985)
  feat(submitting): add posting strategies (#2973)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] bump blob size to 8mb

3 participants