Conversation
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
|
Claude finished @alpe's task in 2m 3s —— View job Code Review: Block Production Throughput Optimization
SummaryThis PR targets the sequencer hot path with a set of well-motivated optimizations. The intent is solid and several changes are straightforwardly correct. However, there are a few issues ranging from data-race bugs to correctness concerns that need to be addressed before merging. Critical Issues1. Data race on
|
Summary of ChangesHello @alpe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly optimizes the block production throughput for sequencers by systematically identifying and removing bottlenecks related to inter-process communication (RPCs), disk I/O, and blocking CPU-intensive tasks. The changes aim to streamline the block creation and propagation process, leading to a more efficient and responsive system. A new benchmark was also added to measure and analyze these performance improvements, ensuring the changes are quantifiable and maintainable. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces significant performance optimizations by implementing caching mechanisms and asynchronous operations in the block production and EVM execution paths. In block/internal/executing/executor.go, a lastBlockCacheEntry and associated fields were added to cache the previous block's header, data, and signature, reducing redundant store reads during CreateBlock. Signer information is also now cached. The ProduceBlock function was refactored to conditionally save/delete pending blocks only for Raft-enabled setups and to skip self-validation for sequencer-produced blocks. P2P block broadcasting was made asynchronous to avoid blocking block production. In execution/evm/execution.go, pre-computed constants were introduced to reduce allocations, and a prevBlockInfo cache was added to EngineClient to avoid eth_getBlockByNumber RPC calls during ExecuteTxs. The saveExecMeta calls were made asynchronous, and the TxHash computation was removed from saveExecMeta. Additionally, updateForkchoiceState was added to update in-memory state without an immediate RPC call. Test helper functions across execution/evm/test/test_helpers.go, execution/evm/test_helpers.go, test/e2e/evm_contract_e2e_test.go, and test/e2e/evm_test_common.go were updated to use testing.TB instead of *testing.T to support benchmarking, and a new benchmark file test/e2e/evm_contract_bench_test.go was added to measure contract roundtrip latency with OpenTelemetry tracing. Review comments highlighted that the asynchronous saveExecMeta calls should use context.Background() instead of the parent's context to ensure they complete even if the parent's context is canceled, as they are intended to be 'fire-and-forget' operations.
| // This allows resuming the payload build if we crash before completing | ||
| c.saveExecMeta(ctx, blockHeight, timestamp.Unix(), newPayloadID[:], nil, nil, txs, ExecStageStarted) | ||
| // Save ExecMeta with payloadID for crash recovery (Stage="started") — async. | ||
| go c.saveExecMeta(ctx, blockHeight, timestamp.Unix(), newPayloadID[:], nil, nil, txs, ExecStageStarted) |
There was a problem hiding this comment.
Calling saveExecMeta in a new goroutine with the parent's context can be problematic. If the context passed to ExecuteTxs is request-scoped, it might be canceled after ExecuteTxs returns, causing saveExecMeta to fail unexpectedly. For a true 'fire-and-forget' operation that should outlive the parent function, it's safer to use a new, detached context like context.Background().
| go c.saveExecMeta(ctx, blockHeight, timestamp.Unix(), newPayloadID[:], nil, nil, txs, ExecStageStarted) | |
| go c.saveExecMeta(context.Background(), blockHeight, timestamp.Unix(), newPayloadID[:], nil, nil, txs, ExecStageStarted) |
| // 4. Save ExecMeta (Promoted) | ||
| c.saveExecMeta(ctx, blockHeight, blockTimestamp, payloadID[:], blockHash[:], payloadResult.ExecutionPayload.StateRoot.Bytes(), txs, ExecStagePromoted) | ||
| // 5. Save ExecMeta (Promoted) — async, best-effort. | ||
| go c.saveExecMeta(ctx, blockHeight, blockTimestamp, payloadID[:], blockHash[:], payloadResult.ExecutionPayload.StateRoot.Bytes(), txs, ExecStagePromoted) |
There was a problem hiding this comment.
Similar to the other async call to saveExecMeta, using the parent's context in this goroutine can lead to issues if the context is short-lived. For a 'fire-and-forget' operation, it's safer to use a background context to ensure it completes even if the parent function's context is canceled.
| go c.saveExecMeta(ctx, blockHeight, blockTimestamp, payloadID[:], blockHash[:], payloadResult.ExecutionPayload.StateRoot.Bytes(), txs, ExecStagePromoted) | |
| go c.saveExecMeta(context.Background(), blockHeight, blockTimestamp, payloadID[:], blockHash[:], payloadResult.ExecutionPayload.StateRoot.Bytes(), txs, ExecStagePromoted) |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3079 +/- ##
==========================================
+ Coverage 61.11% 61.19% +0.08%
==========================================
Files 113 113
Lines 11444 11468 +24
==========================================
+ Hits 6994 7018 +24
- Misses 3661 3666 +5
+ Partials 789 784 -5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| } | ||
| if err := e.savePendingBlock(ctx, header, data); err != nil { | ||
| return fmt.Errorf("failed to save block data: %w", err) | ||
| // Only persist pending block for raft crash recovery — skip for non-raft |
There was a problem hiding this comment.
This is wrong, we do want to persist it every time.
| e.lastBlockMu.Unlock() | ||
|
|
||
| // P2P broadcast is fire-and-forget — doesn't block next block production. | ||
| go func() { |
There was a problem hiding this comment.
This one makes sense given we just alog anyway
There was a problem hiding this comment.
EDIT: maybe it isn't preferred as we could broadcast out of order, and this will fail go-header verification.
|
|
||
| // lastBlockCacheEntry caches the last produced block's header hash, data hash, | ||
| // and signature to avoid store reads in CreateBlock. | ||
| type lastBlockCacheEntry struct { |
There was a problem hiding this comment.
This should be removed, we are caching at store level, so this isn't an issue (see cached_store.go)
Optimize block production throughput
Reduces per-block overhead by eliminating redundant RPCs, store I/O, and blocking operations on the sequencer hot path.
Changes
Eliminated RPCs
prevBlockInfo) → skipeth_getBlockByNumberRPCreconcileExecutionAtHeighton sequencer (always fails — block doesn't exist yet) → skip 1 store read + 1 eth RPCsetFinalWithHeightForkchoiceUpdated → skip 1 engine RPC per block (state carried by next block's FCU)Eliminated store I/O
lastBlockCacheEntry) → skipGetBlockData+GetSignaturereads inCreateBlocksavePendingBlock/deletePendingBlockon non-raft nodes (raft crash recovery only)saveExecMetaasync (best-effort, non-blocking)Eliminated CPU / blocking work
ValidateBlockon sequencer — self-produced blocks don't need self-validationsaveExecMeta(unused for decisions)pubKey+validatorHashafter first computation (immutable)errgroup.Wait()zeroHashHex,emptyWithdrawals,emptyBlobHashes,emptyExecReqs)Impact