Skip to content

feat(tests,ci): Verify filled benchmark fixtures against EELS via json_loader#2894

Merged
danceratopz merged 3 commits into
ethereum:forks/amsterdamfrom
danceratopz:verify-benchmarks-against-eels
May 27, 2026
Merged

feat(tests,ci): Verify filled benchmark fixtures against EELS via json_loader#2894
danceratopz merged 3 commits into
ethereum:forks/amsterdamfrom
danceratopz:verify-benchmarks-against-eels

Conversation

@danceratopz
Copy link
Copy Markdown
Member

@danceratopz danceratopz commented May 21, 2026

🗒️ Description

Benchmark tests are filled with an external EVM (geth, evmone,...) as the EELS EVM is extremely slow at filling these compute-heavy tests. This has led to a consensus testing gap: The canonical spec (EELS) is not part of the benchmark workflow.

This PR updates the bench-gas recipe to additionally run the externally generated fixtures against EELS via the json_loader; this is then verified upon push in CI.

It does not update the target fork used for benchmarks (currently Osaka); this should be bumped!

bench-gas recipe / benchmark.yaml workflow

This PR adds a Step 2 to bench-gas: after filling with geth, re-execute the resulting blockchain_test fixtures against EELS via the json_loader. EELS' state_transition validates each block's state root against the header internally (src/ethereum/forks/osaka/fork.py:266), and the lastblockhash assertion provides the final cryptographic check. Any gas, state, or opcode-behavior divergence between geth and EELS will now fail the verify step.

Modifications:

  1. feat(tests): Add --allow-post-state-hash option to json_loader: previously the json_loader xfailed any fixture without a full postState dict. Benchmark fixtures only carry postStateHash (since include_full_post_state_in_output defaults to False). With the new flag, EELS' state transition runs against these fixtures and lastblockhash validates the resulting state root via the block header. Behavior is unchanged when the flag is absent, so the existing json-loader recipe is unaffected.
  2. feat(bench-gas): Verify filled fixtures against EELS via json_loader: extends the recipe with a clearly-labeled Step 2. Symlinks the filled output under tests/json_loader/ so the path-scoped conftest applies, then runs pytest with --allow-post-state-hash --fork Osaka under PyPy 3.11.
  3. ci(bench): Install PyPy for bench-gas json_loader verify: pre-installs PyPy 3.11 in the CI job so the recipe doesn't pay the on-demand download cost. Gated to the bench-gas matrix entry.

PyPy gives a ~9x speedup over CPython on this workload (1082 passed in 2:08 vs 19:32 locally), so the verification adds about two minutes to the existing CI job rather than twenty. The bottleneck is pure-Python Merkle Patricia Trie hashing and RLP encoding inside EELS' state transition, which is exactly the workload PyPy's JIT was designed for; the fill step still runs on CPython since it bottlenecks on subprocess I/O to the EVM binary, not on hot Python loops.

🔗 Related Issues or PRs

N/A.

✅ Checklist

  • All: Ran fast static checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    just static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).

Cute Animal Picture

image

Some fixture formats include `postStateHash` but omit the full
`postState` dict (e.g. benchmark fixtures, which default to
`include_full_post_state_in_output=False`). Without this flag the
json_loader xfails such fixtures and EELS' state transition never runs.

With `--allow-post-state-hash`, state transition runs and the existing
`lastblockhash` assertion validates the post-state root via the block
header. Behavior is unchanged when the flag is absent.
The existing `bench-gas` recipe only checked that the configured EVM
(geth via `EVM_BIN`) can fill benchmark fixtures. It said nothing about
whether EELS and the configured EVM agree on the resulting state.

This adds a Step 2 that symlinks the filled output under
`tests/json_loader/` so the json_loader conftest applies, then runs
`pytest --allow-post-state-hash --fork Osaka` against the fixtures.
The EELS state transition validates each block's state root against
the header internally; the `lastblockhash` assertion provides the final
cryptographic check.

Step 2 runs under PyPy 3.11 for a ~9x speedup over CPython on this
workload (`1082 passed in 2:08` vs `19:32`). Step 1 stays on CPython
since filling bottlenecks on subprocess I/O to the EVM binary, not on
hot Python loops, so PyPy doesn't help there.

The new `tests/json_loader/bench_gas_fixtures` symlink is gitignored.
The new Step 2 in `bench-gas` runs the EELS verification under
`uv run --python pypy3.11`. Pre-install PyPy via `uv` so the recipe
doesn't pay the on-demand download cost mid-run. Gated to the
`bench-gas` matrix entry; the other recipes don't need it.
@danceratopz danceratopz added C-feat Category: an improvement or new feature A-test-benchmark Area: execution_testing.benchmark and tests/benchmark A-tooling Area: Improvements or changes to auxiliary tooling such as uv, ruff, mypy, ... A-ci Area: Continuous Integration labels May 21, 2026
@danceratopz danceratopz marked this pull request as draft May 21, 2026 09:51
@LouisTsai-Csie LouisTsai-Csie self-requested a review May 21, 2026 09:58
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.44%. Comparing base (bbacd74) to head (2c60971).
⚠️ Report is 3 commits behind head on forks/amsterdam.

Additional details and impacted files
@@               Coverage Diff                @@
##           forks/amsterdam    #2894   +/-   ##
================================================
  Coverage            90.43%   90.44%           
================================================
  Files                  535      535           
  Lines                32413    32439   +26     
  Branches              3012     3012           
================================================
+ Hits                 29312    29338   +26     
  Misses                2573     2573           
  Partials               528      528           
Flag Coverage Δ
unittests 90.44% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@danceratopz danceratopz marked this pull request as ready for review May 27, 2026 07:43
@danceratopz
Copy link
Copy Markdown
Member Author

I considered adding the same check to benchmark artifact generation/release flows, but leaving as-is for now as the release flow is in a it of flux #2888.

@danceratopz danceratopz merged commit d89c7ce into ethereum:forks/amsterdam May 27, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ci Area: Continuous Integration A-test-benchmark Area: execution_testing.benchmark and tests/benchmark A-tooling Area: Improvements or changes to auxiliary tooling such as uv, ruff, mypy, ... C-feat Category: an improvement or new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants