Skip to content

[WIP] Start phase1 benchmarks#3081

Draft
alpe wants to merge 1 commit intoalex/bench_e2e_startfrom
alex/bench_phase1_start
Draft

[WIP] Start phase1 benchmarks#3081
alpe wants to merge 1 commit intoalex/bench_e2e_startfrom
alex/bench_phase1_start

Conversation

@alpe
Copy link
Contributor

@alpe alpe commented Feb 18, 2026

Overview

@github-actions
Copy link
Contributor

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedFeb 18, 2026, 2:29 PM

@claude
Copy link
Contributor

claude bot commented Feb 18, 2026

Claude finished @alpe's task in 1m 9s —— View job


Code Review

  • Gather context and read changed files
  • Review benchmark test file
  • Review workflow file
  • Post final review

Overall Assessment

A solid start for phase1 benchmarks. The BenchmarkProduceBlock test is well-structured with proper use of b.Helper(), b.ReportAllocs(), and b.ResetTimer(). A few issues worth addressing before merging.


block/internal/executing/executor_benchmark_test.go

Issues

  1. BatchData field is a duplicate of TransactionsGetNextBatchResponse is returned with both Batch.Transactions and BatchData set to the same txs slice (lines 85–88). This appears redundant and may skew benchmark results by inflating apparent work per iteration. Confirm whether both fields are intentionally populated by the sequencer in production.

  2. exec.ctx / exec.cancel fields are set directly (lines 80–81) — This accesses unexported struct fields from within the same package (package executing), which is acceptable here, but it introduces coupling to the internal lifecycle of Executor. The benchmark bypasses Run() / Start() lifecycle methods. This is fine for microbenchmarks, but worth a brief comment explaining that the executor is being driven manually to avoid goroutine overhead.

  3. StartTime is set to the past (line 53) — time.Now().Add(-time.Hour) is intentional to ensure the genesis block is "old enough," but there's no comment explaining why. Add one to aid future readers.

  4. newBenchExecutor only creates a fresh executor per benchmark subcase, not per b.N iteration — For most benchmarks this is fine, but since ProduceBlock is stateful (it increments block height and writes to the store), the store accumulates state across iterations. This is by design in standard Go benchmarks (setup is amortized), but it means later iterations may behave differently (e.g., store lookups grow). Not necessarily a bug, but worth a comment documenting this deliberate choice.

  5. Missing larger-payload test cases — The two specs ("empty batch" and "single tx") are minimal. To identify scaling behavior, consider adding a case with many small transactions (e.g., 100×32-byte txs) and one with large payloads. Fix this →

Minor / Nits

  • cfg.Node.MaxPendingHeadersAndData = 100000 — Very large value set to prevent back-pressure. A short comment explaining the intent would help.
  • The map iteration order for specs is non-deterministic. Go's testing/B subtests handle this fine, but using a slice of structs with names preserves a deterministic run order, which makes output diffs easier to read when comparing benchmark runs.

.github/workflows/benchmark.yml

Issues

  1. Both benchmark steps share a single jobevm-benchmark runs EVM benchmarks and Block Executor benchmarks in the same job. If the EVM benchmark step fails (e.g., build failure), the Block Executor benchmark never runs. Consider splitting into separate jobs, or at minimum using continue-on-error: true per step if isolation is desired.

  2. block_executor_output.txt is written to the repo root — The go test output is redirected to block_executor_output.txt (line 47) which resolves relative to the runner's working directory (repo root), while the EVM benchmark output goes into test/e2e/output.txt. These should be co-located or use a consistent naming scheme under a dedicated benchmark-results/ path.

  3. go test output includes timing metadata that may vary — The > block_executor_output.txt redirect captures full go test output including build logs. If any === RUN lines appear (e.g., from test setup), benchmark-action may fail to parse. Confirm this is clean with -run='^$' (it should be, just verify locally).

  4. No -count= flag on Block Executor benchmark — The EVM benchmark doesn't use one either, but for stateful benchmarks like ProduceBlock, running with -count=3 would give more statistically stable results and help benchmark-action detect regressions more reliably. Fix this →

  5. permissions: {} at top level, but contents: write at job level — This is correct practice (deny-by-default, grant per job). Good.

  6. auto-push: true commits benchmark results to the repo on every push to main — Confirm a dedicated gh-pages branch or similar is configured as the data store for benchmark-action. If not configured, it will push to the current branch, which is undesirable.


Summary

Area Status
Go benchmark structure Good
Mock setup Good
Test case coverage Needs expansion
CI workflow isolation Improve
Output file placement Minor inconsistency
Documentation/comments Add a few clarifying comments

@alpe alpe changed the base branch from main to alex/bench_e2e_start February 18, 2026 14:29
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @alpe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request initiates the first phase of performance benchmarking for EVM contract operations within the ev-node system. It establishes a framework for measuring transaction roundtrip latency and integrates OpenTelemetry for granular performance analysis. The changes also involve making existing test utilities more versatile for use in both testing and benchmarking contexts, alongside necessary dependency updates.

Highlights

  • New EVM Contract Benchmark: Introduced a new benchmark, BenchmarkEvmContractRoundtrip, to measure the store-retrieve latency of EVM contracts against a real Reth node, providing performance insights into contract interactions.
  • OpenTelemetry Tracing Integration: Integrated OpenTelemetry tracing into the new benchmark to collect detailed performance data, allowing for a hierarchical timing report of ev-node components like Engine API calls, executor, and sequencer.
  • In-process OTLP/HTTP Collector: Implemented a lightweight, in-process OTLP/HTTP receiver (otlpCollector) to efficiently collect trace spans from ev-node during benchmark execution.
  • Refactored Test Helpers for Benchmarks: Updated numerous test helper functions across execution/evm/test, test/e2e, and test/e2e/sut_helper.go to accept testing.TB instead of *testing.T, making them reusable for both tests and benchmarks.
  • Dependency Updates: Adjusted go.mod files to explicitly include go.uber.org/zap and go.opentelemetry.io/proto/otlp as direct dependencies, supporting new logging and tracing functionalities.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • block/internal/executing/executor_benchmark_test.go
    • Added newBenchExecutor function to create a mocked Executor for benchmarking.
    • Added BenchmarkProduceBlock function to benchmark block production with varying transaction batches.
  • execution/evm/test/go.mod
    • Added go.uber.org/zap as a direct dependency.
    • Removed go.uber.org/zap from indirect dependencies.
  • execution/evm/test/test_helpers.go
    • Imported go.uber.org/zap and go.uber.org/zap/zaptest.
    • Updated getTestScopedDockerSetup to accept testing.TB.
    • Updated SetupTestRethNode to accept testing.TB and integrated zap logger for Reth node setup.
    • Updated waitForRethContainer to accept testing.TB.
  • execution/evm/test_helpers.go
    • Updated GetRandomTransaction to accept testing.TB.
  • test/e2e/evm_contract_bench_test.go
    • Added new file for BenchmarkEvmContractRoundtrip to measure contract store/retrieve latency.
    • Implemented otlpCollector for in-process OpenTelemetry trace collection.
    • Added printCollectedTraceReport to analyze and log trace breakdowns.
    • Added waitForReceipt helper function.
  • test/e2e/evm_contract_e2e_test.go
    • Updated setupTestSequencer to accept testing.TB and extraArgs.
    • Updated deployContract to accept testing.TB.
  • test/e2e/evm_test_common.go
    • Updated mustGetAvailablePort to accept testing.TB.
    • Updated createPassphraseFile to accept testing.TB.
    • Updated createJWTSecretFile to accept testing.TB.
    • Updated getNodeP2PAddress to accept testing.TB.
    • Updated setupSequencerNode to accept testing.TB and extraArgs.
    • Updated setupSequencerNodeLazy to accept testing.TB.
    • Updated setupFullNode to accept testing.TB.
    • Updated submitTransactionAndGetBlockNumber to accept testing.TB.
    • Updated setupCommonEVMTest to accept testing.TB.
    • Updated checkBlockInfoAt to accept testing.TB.
    • Updated setupSequencerOnlyTest to accept testing.TB and extraArgs.
    • Updated restartDAAndSequencer to accept testing.TB.
    • Updated restartDAAndSequencerLazy to accept testing.TB.
    • Updated restartSequencerNode to accept testing.TB.
    • Updated verifyNoBlockProduction to accept testing.TB.
  • test/e2e/go.mod
    • Added go.opentelemetry.io/proto/otlp as a direct dependency.
    • Removed go.opentelemetry.io/proto/otlp from indirect dependencies.
  • test/e2e/sut_helper.go
    • Updated SystemUnderTest struct field t type to testing.TB.
    • Updated NewSystemUnderTest constructor to accept testing.TB.
    • Updated AwaitNodeUp to accept testing.TB.
    • Updated AwaitNodeLive to accept testing.TB.
    • Updated AwaitNBlocks to accept testing.TB.
    • Updated MustCopyFile to accept testing.TB.
    • Updated NodeID to accept testing.TB.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/benchmark.yml
Activity
  • The pull request is currently marked as [WIP], indicating ongoing development.
  • The description body is empty, suggesting that detailed context and rationale are yet to be provided by the author.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codecov
Copy link

codecov bot commented Feb 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.13%. Comparing base (b3b0881) to head (7150135).

Additional details and impacted files
@@                   Coverage Diff                    @@
##           alex/bench_e2e_start    #3081      +/-   ##
========================================================
- Coverage                 61.14%   61.13%   -0.02%     
========================================================
  Files                       113      113              
  Lines                     11444    11444              
========================================================
- Hits                       6998     6996       -2     
- Misses                     3657     3658       +1     
- Partials                    789      790       +1     
Flag Coverage Δ
combined 61.13% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces benchmarks for the ProduceBlock function. The overall structure is good, but there are a couple of areas for improvement to ensure the benchmarks are robust and deterministic. I've pointed out that using time.Now() can lead to non-deterministic tests and should be replaced with a fixed time. More critically, the mock setup for GetNextBatch uses a static timestamp, which could cause benchmark failures if timestamp monotonicity is required for blocks. I've suggested a fix to generate an increasing timestamp for each block produced in the benchmark loop.

Comment on lines +84 to +89
mockSeq.EXPECT().GetNextBatch(mock.Anything, mock.AnythingOfType("sequencer.GetNextBatchRequest")).
Return(&coreseq.GetNextBatchResponse{
Batch: &coreseq.Batch{Transactions: txs},
Timestamp: time.Now(),
BatchData: txs,
}, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The mock for GetNextBatch is configured to return a response with Timestamp: time.Now(). This timestamp is evaluated only once when the mock expectation is set up. Consequently, every call to ProduceBlock within the benchmark loop will receive the same timestamp. If there is a requirement for block timestamps to be monotonically increasing, this will cause ProduceBlock to fail after the first iteration.

To fix this, use RunAndReturn to generate a new, increasing timestamp for each call. This ensures the benchmark correctly simulates the production of a sequence of valid blocks.

Suggested change
mockSeq.EXPECT().GetNextBatch(mock.Anything, mock.AnythingOfType("sequencer.GetNextBatchRequest")).
Return(&coreseq.GetNextBatchResponse{
Batch: &coreseq.Batch{Transactions: txs},
Timestamp: time.Now(),
BatchData: txs,
}, nil)
var blockNum uint64
mockSeq.EXPECT().GetNextBatch(mock.Anything, mock.AnythingOfType("sequencer.GetNextBatchRequest")).
RunAndReturn(func(_ context.Context, _ coreseq.GetNextBatchRequest) (*coreseq.GetNextBatchResponse, error) {
blockNum++
timestamp := gen.StartTime.Add(time.Duration(blockNum) * cfg.Node.BlockTime.Duration)
return &coreseq.GetNextBatchResponse{
Batch: &coreseq.Batch{Transactions: txs},
Timestamp: timestamp,
BatchData: txs,
}, nil
})

gen := genesis.Genesis{
ChainID: "bench-chain",
InitialHeight: 1,
StartTime: time.Now().Add(-time.Hour),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using time.Now() in tests can lead to non-deterministic behavior. It's better to use a fixed timestamp to ensure the benchmark is repeatable and not affected by the time it is run.

Suggested change
StartTime: time.Now().Add(-time.Hour),
StartTime: time.Date(2023, 1, 1, 12, 0, 0, 0, time.UTC),

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments