feat(tests,ci): Verify filled benchmark fixtures against EELS via json_loader#2894
Merged
danceratopz merged 3 commits intoMay 27, 2026
Conversation
Some fixture formats include `postStateHash` but omit the full `postState` dict (e.g. benchmark fixtures, which default to `include_full_post_state_in_output=False`). Without this flag the json_loader xfails such fixtures and EELS' state transition never runs. With `--allow-post-state-hash`, state transition runs and the existing `lastblockhash` assertion validates the post-state root via the block header. Behavior is unchanged when the flag is absent.
The existing `bench-gas` recipe only checked that the configured EVM (geth via `EVM_BIN`) can fill benchmark fixtures. It said nothing about whether EELS and the configured EVM agree on the resulting state. This adds a Step 2 that symlinks the filled output under `tests/json_loader/` so the json_loader conftest applies, then runs `pytest --allow-post-state-hash --fork Osaka` against the fixtures. The EELS state transition validates each block's state root against the header internally; the `lastblockhash` assertion provides the final cryptographic check. Step 2 runs under PyPy 3.11 for a ~9x speedup over CPython on this workload (`1082 passed in 2:08` vs `19:32`). Step 1 stays on CPython since filling bottlenecks on subprocess I/O to the EVM binary, not on hot Python loops, so PyPy doesn't help there. The new `tests/json_loader/bench_gas_fixtures` symlink is gitignored.
The new Step 2 in `bench-gas` runs the EELS verification under `uv run --python pypy3.11`. Pre-install PyPy via `uv` so the recipe doesn't pay the on-demand download cost mid-run. Gated to the `bench-gas` matrix entry; the other recipes don't need it.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## forks/amsterdam #2894 +/- ##
================================================
Coverage 90.43% 90.44%
================================================
Files 535 535
Lines 32413 32439 +26
Branches 3012 3012
================================================
+ Hits 29312 29338 +26
Misses 2573 2573
Partials 528 528
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
SamWilsn
approved these changes
May 26, 2026
Member
Author
|
I considered adding the same check to benchmark artifact generation/release flows, but leaving as-is for now as the release flow is in a it of flux #2888. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🗒️ Description
Benchmark tests are filled with an external EVM (geth, evmone,...) as the EELS EVM is extremely slow at filling these compute-heavy tests. This has led to a consensus testing gap: The canonical spec (EELS) is not part of the benchmark workflow.
This PR updates the
bench-gasrecipe to additionally run the externally generated fixtures against EELS via thejson_loader; this is then verified upon push in CI.It does not update the target fork used for benchmarks (currently Osaka); this should be bumped!
bench-gasrecipe /benchmark.yamlworkflowThis PR adds a Step 2 to
bench-gas: after filling with geth, re-execute the resultingblockchain_testfixtures against EELS via thejson_loader. EELS'state_transitionvalidates each block's state root against the header internally (src/ethereum/forks/osaka/fork.py:266), and thelastblockhashassertion provides the final cryptographic check. Any gas, state, or opcode-behavior divergence between geth and EELS will now fail the verify step.Modifications:
feat(tests): Add --allow-post-state-hash option to json_loader: previously the json_loader xfailed any fixture without a fullpostStatedict. Benchmark fixtures only carrypostStateHash(sinceinclude_full_post_state_in_outputdefaults toFalse). With the new flag, EELS' state transition runs against these fixtures andlastblockhashvalidates the resulting state root via the block header. Behavior is unchanged when the flag is absent, so the existingjson-loaderrecipe is unaffected.feat(bench-gas): Verify filled fixtures against EELS via json_loader: extends the recipe with a clearly-labeled Step 2. Symlinks the filled output undertests/json_loader/so the path-scoped conftest applies, then runs pytest with--allow-post-state-hash --fork Osakaunder PyPy 3.11.ci(bench): Install PyPy for bench-gas json_loader verify: pre-installs PyPy 3.11 in the CI job so the recipe doesn't pay the on-demand download cost. Gated to thebench-gasmatrix entry.PyPy gives a ~9x speedup over CPython on this workload (
1082 passed in 2:08vs19:32locally), so the verification adds about two minutes to the existing CI job rather than twenty. The bottleneck is pure-Python Merkle Patricia Trie hashing and RLP encoding inside EELS' state transition, which is exactly the workload PyPy's JIT was designed for; thefillstep still runs on CPython since it bottlenecks on subprocess I/O to the EVM binary, not on hot Python loops.🔗 Related Issues or PRs
N/A.
✅ Checklist
just statictype(scope):.Cute Animal Picture