Skip to content

[DO NOT MERGE] moonshot2#3083

Closed
alpe wants to merge 11 commits intomainfrom
alex/ai_moonshot2
Closed

[DO NOT MERGE] moonshot2#3083
alpe wants to merge 11 commits intomainfrom
alex/ai_moonshot2

Conversation

@alpe
Copy link
Contributor

@alpe alpe commented Feb 19, 2026

Overview

@github-actions
Copy link
Contributor

github-actions bot commented Feb 19, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedFeb 19, 2026, 2:39 PM

@claude
Copy link
Contributor

claude bot commented Feb 19, 2026

Claude finished @alpe's task in 7m 21s —— View job


Code Review

Tasks

  • Gather context and understand the PR
  • Review changed files
    • block/internal/executing/executor.go - Core execution with caching
    • pkg/store/ - Storage optimizations
    • types/ - Data structures and hashing
    • Benchmark and test files
  • Compile findings and post review

Summary

This PR introduces targeted performance optimizations to the block production hot path: in-memory caching of last-block hashes, amortized pending-limit checks, sequential P2P broadcasting, and store-key allocation improvements. The direction is solid. However, there are a few correctness issues introduced alongside the optimizations that are worth addressing.

Note on Gemini's review: Gemini's security findings about PrevHeaderHash/PrevDataHash cache poisoning in types/p2p_envelope.go do not apply to this PR — those proto changes were reverted in commit 8cdd819. The current PR does not touch p2p_envelope.go. Gemini appears to be reviewing an earlier version.


Issues

1. hasPendingBlock not set to true after savePendingBlock — regression in crash recovery (Medium)

block/internal/executing/executor.go:508

After savePendingBlock is called (line 508), hasPendingBlock remains false. If ApplyBlock, signing, or any step after it fails (returning an error), the next iteration of the execution loop will:

  1. See hasPendingBlock.Load() == false → skip getPendingBlock()
  2. Create a new block with a new timestamp from the sequencer
  3. Overwrite the pending metadata with this new block

The old code unconditionally called getPendingBlock() on every ProduceBlock call, so the same pending block was always retried. The new code loses this invariant for in-process failures. A crash-restart correctly recovers via initializeState, but mid-run failures create a new block instead.

Fix: add e.hasPendingBlock.Store(true) immediately after line 508.

Fix this →


2. deletePendingBlock removed — stale metadata on every restart (Medium)

block/internal/executing/executor.go:585-589 / block/internal/executing/pending.go:80

In the original code, deletePendingBlock(batch) was added to the commit batch so that the pending block metadata was atomically erased on successful commit. That call was removed and replaced with only e.hasPendingBlock.Store(false) (in-memory only).

Effect: after every successful block commit, m/pending_header and m/pending_data remain in the store. On the next restart, initializeState detects the metadata, sets hasPendingBlock = true, and ProduceBlock triggers a getPendingBlock() read — which loads the previously-committed (stale) block, compares heights, discards it, and then creates a new one. This is one extra store read per restart, but it also means every restart is treated as if there was an in-progress pending block.

The simplest fix is to re-add e.deletePendingBlock(batch) back into the commit batch before batch.Commit().

Fix this →


3. signHeader returns bytes that are never used (Low)

block/internal/executing/executor.go:528 and executor.go:799-818

The signHeader method was changed to return (Signature, []byte, error), where the second return value (the signing payload) is immediately discarded:

signature, _, err := e.signHeader(&header.Header)

The comment on the function says "The caller can reuse headerBytes in SaveBlockDataFromBytes to avoid a redundant MarshalBinary call", but ProduceBlock still calls batch.SaveBlockData(header, data, &signature) (line 547), not SaveBlockDataFromBytes. The signing payload bz from AggregatorNodeSignatureBytesProvider is also not the same as header.MarshalBinary(), so it cannot be reused for the Raft marshal path either.

Either: use the returned bytes where intended, or revert signHeader back to returning (Signature, error) to avoid dead API surface.


4. AssertValidSequence removes DACommitment check — reduced defense-in-depth (Low)

types/state.go:92-116

The replacement of blockProducer.ValidateBlock with currentState.AssertValidSequence removes the Validate(header, data) call, which verifies that header.DataHash == data.DACommitment(). The justification ("we just produced this block") is sound for performance, but this means bugs in CreateBlock's DataHash computation (e.g., wrong DACommitment() used in the empty-tx path) would now go undetected until peers reject the block.

Worth considering: add a testing.Short() guard in tests to run the full validation, or add a single unit test specifically for AssertValidSequence to document what it does and doesn't check.


5. Amortized pending check can silently overshoot the configured limit (Low)

block/internal/executing/executor.go:449-469

When MaxPendingHeadersAndData > 64, the limit is checked only every 64 blocks. If the pending count is already at MaxPendingHeadersAndData - 1, the system can produce up to 63 more blocks before detecting the breach, potentially overshooting by pendingCheckInterval - 1. The comment acknowledges this, but the condition is:

shouldCheck := e.config.Node.MaxPendingHeadersAndData <= pendingCheckInterval ||
    e.pendingCheckCounter%pendingCheckInterval == 0

This means limits ≤ 64 are checked every block (correct), but limits > 64 can overshoot. Consider documenting this as an intentional tradeoff (soft limit, not hard), or reducing the interval when the delta between current count and limit is small.


6. Cache warm-up errors silently discarded in initializeState (Low)

block/internal/executing/executor.go:301-311

if state.LastBlockHeight > 0 {
    h, d, err := e.store.GetBlockData(e.ctx, state.LastBlockHeight)
    if err == nil {
        // ...
    }
    // err != nil is silently ignored
}

If GetBlockData or GetSignature fails at startup, the cold-start path is used silently. This is safe (it falls back to the store on first block), but a warning log would help operators diagnose unexpected startup behavior (e.g., corrupt or pruned block data).


Positive highlights

  • hasPendingBlock atomic flag: clean optimization that avoids a store hit on every ProduceBlock call in the common case.
  • lastHeaderHash/lastDataHash/lastSignature cache: well-designed — populated in initializeState for warm starts, updated atomically after each commit, cold-start fallback preserved.
  • heightKey and GenerateKey optimizations in pkg/store: measurable reduction in allocations for hot paths. The ds.RawKey switch is appropriate given that all keys are constructed with controlled, well-formed inputs.
  • Signer.MarshalledPubKey() caching: correctly uses a pointer receiver and initializes via NewSigner; the FromProto paths also correctly populate the cache from the wire bytes.
  • TxsByteSize(): a genuinely useful addition — avoids full proto serialization just to get a byte count for metrics.
  • Benchmark stubs (stubSequencer, stubExecClient, stubBroadcaster): well-designed for zero-overhead benchmarking; the removal of mock framework overhead is the right call for b.N loops.
  • Sequential broadcast ordering: the reason is correctly documented. The ergrouped concurrent approach had a real subtle ordering risk.
  • timestamppb inline construction: valid micro-optimization; semantically equivalent.

@alpe alpe changed the base branch from main to alex/bench_phase1_start February 19, 2026 08:07
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @alpe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the performance of the block production pipeline by implementing extensive caching strategies and optimizing data handling. It also upgrades the P2P protocol to streamline block verification for peers and introduces robust new benchmarking tools and refactored test utilities to ensure the stability and efficiency of these changes.

Highlights

  • Performance Optimizations in Block Production: Implemented extensive caching within the Executor to reduce redundant store reads and cryptographic operations, including lastHeaderHash, lastDataHash, lastSignature, cachedPubKey, and cachedValidatorHash. Introduced an atomic flag hasPendingBlock to avoid unnecessary datastore lookups and amortized NumPendingHeaders/NumPendingData checks to reduce expensive store scans. Optimized SaveBlockData by adding SaveBlockDataFromBytes to prevent re-marshalling, and refactored P2P broadcasting to be sequential for correct validation order. Also, TxsByteSize() was added for efficient metrics reporting, and key generation in pkg/store was optimized for minimal allocations.
  • P2P Protocol Enhancements: Added PrevHeaderHash and PrevDataHash fields to P2PSignedHeader and P2PData protobuf messages. These fields carry pre-computed hashes of the previous block's header and data, allowing peers to skip re-computation during verification and significantly improving P2P validation efficiency.
  • Testing Infrastructure Improvements: Introduced executor_benchmark_test.go with lightweight stubs (stubSequencer, stubExecClient, stubBroadcaster) for precise ProduceBlock performance benchmarking. Added evm_contract_bench_test.go for end-to-end EVM contract roundtrip benchmarking, featuring an in-process OTLP/HTTP collector for detailed trace analysis. Numerous test helper functions were updated to accept testing.TB, making them compatible with both *testing.T and *testing.B contexts, and go.uber.org/zap was integrated for improved test logging.
  • Refactoring and Code Quality: Removed unused golang.org/x/sync/errgroup import. Introduced AssertValidSequence in types/state.go for lightweight validation of self-produced blocks, avoiding expensive re-computations. Optimized State.ToProto() to avoid timestamppb.New allocation and cached marshalled public keys in types.Signer to prevent repeated serialization.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • block/internal/executing/executor.go
    • Removed unused errgroup import.
    • Added fields for caching pending block status, last block hashes, signature, signer public key, and validator hash.
    • Initialized cached values and signer info in initializeState.
    • Modified pending limit checks to be amortized.
    • Updated ProduceBlock to use cached values and skip redundant store lookups for pending blocks.
    • Removed explicit savePendingBlock call, relying on final commit.
    • Changed signHeader to return serialized header bytes and updated its call.
    • Replaced blockProducer.ValidateBlock with currentState.AssertValidSequence for self-produced blocks.
    • Removed errgroup for P2P broadcasting and broadcast header then data sequentially.
    • Updated last-block cache after successful commit.
    • Modified CreateBlock to use cached last block info and signer info.
    • Optimized transaction conversion in ApplyBlock.
    • Updated recordBlockMetrics to use data.TxsByteSize() for block size.
  • block/internal/executing/executor_benchmark_test.go
    • Added new file for Executor benchmarking.
    • Implemented stubSequencer, stubExecClient, and stubBroadcaster for lightweight testing.
    • Added newBenchExecutorWithStubs and newBenchExecutorWithMocks helper functions.
    • Included BenchmarkProduceBlock for performance testing.
  • execution/evm/test/go.mod
    • Added go.uber.org/zap dependency.
    • Moved go.uber.org/zap from indirect to direct dependency.
  • execution/evm/test/test_helpers.go
    • Imported go.uber.org/zap and zaptest.
    • Updated getTestScopedDockerSetup, SetupTestRethNode, waitForRethContainer to accept testing.TB.
    • Modified SetupTestRethNode to use reth.NodeBuilder and zap logger.
  • execution/evm/test_helpers.go
    • Updated GetRandomTransaction to accept testing.TB.
  • pkg/store/batch.go
    • Changed ds.NewKey to ds.RawKey for height key.
    • Added SaveBlockDataFromBytes function to save pre-serialized block data.
    • Modified SaveBlockData to use SaveBlockDataFromBytes.
    • Changed ds.NewKey to ds.RawKey for header, data, signature, and index keys.
    • Changed ds.NewKey to ds.RawKey for state at height key.
  • pkg/store/keys.go
    • Added heightKey helper for efficient key generation.
    • Updated GetHeaderKey, getHeaderKey, GetDataKey, getDataKey, GetSignatureKey, getSignatureKey, getStateAtHeightKey to use heightKey.
    • Changed getHeightKey to return a static string.
  • pkg/store/kv.go
    • Removed path import.
    • Optimized GenerateKey to pre-calculate size and use strings.Builder for minimal allocations.
  • pkg/store/tracing.go
    • Added SaveBlockDataFromBytes to tracedBatch to wrap the underlying call.
  • pkg/store/tracing_test.go
    • Added SaveBlockDataFromBytes to tracingMockBatch.
  • pkg/store/types.go
    • Added SaveBlockDataFromBytes method to Batch interface.
  • proto/evnode/v1/evnode.proto
    • Added prev_header_hash to P2PSignedHeader.
    • Added prev_data_hash to P2PData.
  • test/e2e/evm_contract_bench_test.go
    • Added new file for end-to-end EVM contract benchmarking.
    • Implemented otlpCollector for in-process OpenTelemetry trace collection.
    • Added BenchmarkEvmContractRoundtrip for performance testing with trace analysis.
  • test/e2e/evm_contract_e2e_test.go
    • Updated setupTestSequencer and deployContract to accept testing.TB.
    • Added extraArgs parameter to setupTestSequencer.
  • test/e2e/evm_test_common.go
    • Updated numerous test helper functions to accept testing.TB.
    • Added extraArgs parameter to setupSequencerNode and setupSequencerOnlyTest.
  • test/e2e/go.mod
    • Added go.opentelemetry.io/proto/otlp dependency.
    • Removed go.opentelemetry.io/proto/otlp from indirect dependencies.
  • test/e2e/sut_helper.go
    • Updated SystemUnderTest struct and NewSystemUnderTest constructor to use testing.TB.
    • Updated AwaitNodeUp, AwaitNodeLive, AwaitNBlocks, MustCopyFile, NodeID to accept testing.TB.
  • types/data.go
    • Added cachedHash field to Data struct.
    • Added TxsByteSize() method for efficient transaction byte size calculation.
  • types/hashing.go
    • Added cachedHash check to Header.Hash() and Data.Hash().
    • Added SetCachedHash methods for Header and Data.
  • types/header.go
    • Added cachedHash field to Header struct.
  • types/p2p_envelope.go
    • Added PrevHeaderHash to P2PSignedHeader and PrevDataHash to P2PData.
    • Modified Verify methods to use cached hashes from untrusted headers/data.
    • Updated MarshalBinary for P2PSignedHeader and P2PData to include new fields.
    • Updated UnmarshalBinary for P2PSignedHeader and P2PData to handle new fields.
  • types/serialization.go
    • Updated SignedHeader.ToProto, SignedHeader.FromProto, SignedHeader.ToDAEnvelopeProto, SignedHeader.FromDAEnvelopeProto to use Signer.MarshalledPubKey() and cache marshalled public key.
    • Optimized State.ToProto to avoid timestamppb.New allocation.
    • Updated SignedData.ToProto and SignedData.FromProto to use Signer.MarshalledPubKey() and cache marshalled public key.
  • types/signer.go
    • Added marshalledPubKey field to Signer struct for caching.
    • Added MarshalledPubKey() method to Signer.
    • Modified NewSigner to pre-cache marshalled public key.
  • types/state.go
    • Added AssertValidSequence method for lightweight block validation.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/benchmark.yml
Activity
  • The pull request is marked as '[DO NOT MERGE]', indicating it is a work-in-progress or experimental branch.
  • The pull request description is empty, suggesting the author is using it for internal development or testing.
  • No human comments or reviews have been recorded on this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive performance optimizations to the block execution and storage pathways, focusing on reducing allocations, caching, and minimizing redundant computations. However, a critical cache poisoning vulnerability has been introduced in the P2P verification logic due to insecure handling of the new PrevHeaderHash and PrevDataHash fields. This flaw allows malicious peers to bypass core integrity and signature checks, compromising the node's state, and requires immediate attention. Additionally, while a potential race condition in P2P broadcasting was addressed, there is one suggestion to simplify a small portion of the code for improved maintainability.

Comment on lines 49 to 53
func (p *P2PSignedHeader) Verify(untrusted *P2PSignedHeader) error {
if len(untrusted.PrevHeaderHash) > 0 {
p.Header.SetCachedHash(untrusted.PrevHeaderHash)
}
return p.SignedHeader.Verify(untrusted.SignedHeader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

A critical cache poisoning vulnerability exists here. The Verify method trusts the PrevHeaderHash field from an untrusted peer (untrusted) and uses it to set the cached hash of a local, trusted header object (p). Any subsequent operation calling p.Header.Hash() will use this attacker-controlled value, bypassing signature and integrity checks. This can lead to the acceptance of invalid blocks and chain corruption. The input from the untrusted peer must never be used to modify the state of a trusted local object.

Comment on lines 132 to 136
func (p *P2PData) Verify(untrusted *P2PData) error {
if len(untrusted.PrevDataHash) > 0 {
p.Data.SetCachedHash(untrusted.PrevDataHash)
}
return p.Data.Verify(untrusted.Data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

This section contains the same critical cache poisoning vulnerability as found earlier with headers. The Verify method for P2PData trusts the PrevDataHash from an untrusted peer and uses it to poison the hash cache of a local, trusted data object (p.Data). This allows an attacker to bypass data integrity checks, which could lead to the acceptance of blocks with invalid transactions. The input from the untrusted peer must be considered malicious and should not be used to modify trusted local data structures.

Comment on lines 303 to 322
// Cache static signer info — computed once, reused every CreateBlock call.
if e.signer != nil {
pubKey, err := e.signer.GetPublic()
if err != nil {
return fmt.Errorf("failed to cache public key: %w", err)
}
e.cachedPubKey = pubKey

vHash, err := e.options.ValidatorHasherProvider(e.genesis.ProposerAddress, pubKey)
if err != nil {
return fmt.Errorf("failed to cache validator hash: %w", err)
}
e.cachedValidatorHash = vHash
} else {
vHash, err := e.options.ValidatorHasherProvider(e.genesis.ProposerAddress, nil)
if err != nil {
return fmt.Errorf("failed to cache validator hash: %w", err)
}
e.cachedValidatorHash = vHash
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for caching the validator hash is duplicated in the if and else branches. This can be refactored to remove the duplication by making a single call to ValidatorHasherProvider after the if block. The pubKey variable will be correctly nil if e.signer is nil, which seems to be handled by the provider.

	// Cache static signer info — computed once, reused every CreateBlock call.
	var pubKey crypto.PubKey
	if e.signer != nil {
		var err error
		pubKey, err = e.signer.GetPublic()
		if err != nil {
			return fmt.Errorf("failed to cache public key: %w", err)
		}
		e.cachedPubKey = pubKey
	}

	vHash, err := e.options.ValidatorHasherProvider(e.genesis.ProposerAddress, pubKey)
	if err != nil {
		return fmt.Errorf("failed to cache validator hash: %w", err)
	}
	e.cachedValidatorHash = vHash

@alpe alpe force-pushed the alex/bench_phase1_start branch 3 times, most recently from 7150135 to 8efaece Compare February 19, 2026 09:56
@alpe alpe force-pushed the alex/bench_phase1_start branch from 53311f4 to 7dea42f Compare February 19, 2026 11:26
@alpe alpe force-pushed the alex/ai_moonshot2 branch from daedae4 to dbe6446 Compare February 19, 2026 12:05
Base automatically changed from alex/bench_phase1_start to main February 19, 2026 16:20
@alpe alpe closed this Feb 19, 2026
@julienrbrt julienrbrt deleted the alex/ai_moonshot2 branch February 19, 2026 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments