feat(tooling, tests): detect and mark gas-checking tests by leolara · Pull Request #2906 · ethereum/execution-specs

leolara · 2026-05-23T04:26:29Z

🗒️ Description

Adds a pytest plugin that identifies tests asserting specific gas values, and applies a @pytest.mark.gas_check marker to the tests it finds.

The plugin (gas_taint.py) is enabled with --detect-gas-checks in fill mode only. It uses two detection modes:

post[addr].storage[slot] — taint propagation. A GasTainted(int) carrier is returned by Bytecode.gas_cost, per-fork gas_costs / opcode_gas_calculator / transaction_intrinsic_cost_calculator, and propagated through arithmetic. Number.__new__ and FixedSizeHexNumber.__new__ are patched so taint survives HashInt / HexNumber construction.
Receipt / header / block-level / benchmark fields — field-presence. The field name itself (cumulative_gas_used, expected_gas_used, blob_gas_used, …) is the assertion signal; if set, flag.

OOG-expecting tests (tx.error, block_exception, or @pytest.mark.exception_test) are excluded — their gas value lives on tx.gas_limit, an input, not a sink.

Worker results are aggregated to the master via workeroutput and pytest_testnodedown. Output is a JSON report; default path is gas_check_report.json.

Block.expected_gas_used and BaseTest.expected_benchmark_gas_used change from int to HexNumber so they pass through the patched Number.__new__. Both are internal assertion-only fields, not serialised to fixtures, so the type change has no fixture impact.

scripts/mark_tests.py is an idempotent script that applies a configurable @pytest.mark.<name> decorator to test functions listed in a JSON file. Three input shapes accepted: bare mapping, bare list of node IDs, or wrapped form with embedded marker name. Used here to apply @pytest.mark.gas_check to the 54 functions identified by a full-suite scan; re-running is a no-op.

Result on the current suite

Full-suite run (fill --detect-gas-checks --until=Amsterdam -m "not slow" -n 8 tests/): 184,683 passed, 0 failures, 0 errors. 12,890 parametrized test cases flagged, collapsing to 54 unique functions across 31 files / 16 EIP directories.

Sink kind	Hits	Detection
storage	10,184	taint
header.blob_gas_used	5,784	presence
receipt.cumulative_gas_used	1,338	presence
block_expected_gas_used	68	presence

🔗 Related Issues or PRs

N/A.

✅ Checklist

All: Ran fast static checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
just static
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Considered updating the online docs in the ./docs/ directory.
All: Set appropriate labels for the changes (only maintainers can apply labels).
Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Cute Animal Picture

Adds a pytest plugin enabled with `--detect-gas-checks` (fill mode only) that identifies tests asserting specific gas values. The plugin uses two detection modes, chosen per sink: - **Storage in post** (`post[addr].storage[slot]`) — taint propagation. Wraps `Bytecode.gas_cost`, per-fork `gas_costs` / `opcode_gas_calculator` / `transaction_intrinsic_cost_calculator`, and patches `Number.__new__` + `FixedSizeHexNumber.__new__` so a `GasTainted(int)` carrier survives `HashInt` / `HexNumber` construction. Tests whose post-storage holds a tainted value are flagged. - **Receipt / header / block-level / benchmark fields** — field-presence. The field name (`cumulative_gas_used`, `expected_gas_used`, `blob_gas_used`, etc.) is itself the assertion signal; if set, flag. Tests with `tx.error`, `block_exception`, or the `exception_test` marker are excluded so OOG-expecting tests don't pollute the output. Worker results are aggregated to the master via `workeroutput` / `pytest_testnodedown`. `Block.expected_gas_used` and `BaseTest.expected_benchmark_gas_used` change from `int` to `HexNumber`. Both are internal assertion-only and not serialised to fixtures, so the type change has no fixture impact and brings them under the same constructor path as the other gas sinks. The benchmark validator widens a local var type to accommodate the union. Also adds `scripts/mark_tests.py`, an idempotent script that applies a configurable `@pytest.mark.<name>` decorator to test functions listed in a JSON file. Accepts three input shapes (bare mapping, bare list, wrapped with embedded marker name); `--marker` overrides any embedded value.

Applies `@pytest.mark.gas_check` to the 54 test functions identified by running `fill --detect-gas-checks` on the full suite. Produced by `scripts/mark_tests.py gas_check_report.json`; re-running the script is idempotent. Affects 31 files across 16 EIP directories; one decorator each, inserted at the top of the existing decorator stack.

codecov · 2026-05-23T04:37:58Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.44%. Comparing base (55d774b) to head (b4211a1).
⚠️ Report is 4 commits behind head on forks/amsterdam.

Additional details and impacted files

@@               Coverage Diff                @@
##           forks/amsterdam    #2906   +/-   ##
================================================
  Coverage            90.44%   90.44%           
================================================
  Files                  535      535           
  Lines                32439    32439           
  Branches              3012     3012           
================================================
  Hits                 29338    29338           
  Misses                2573     2573           
  Partials               528      528

Flag	Coverage Δ
unittests	`90.44% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

38 tests covering: - ``GasTainted`` carrier: construction, arithmetic propagation (incl. reflected ops), origin union/dedup, and the NotImplemented forward path that fixes ``bytes_literal * GasTainted(n)``. - ``install_taint`` / ``uninstall_taint``: taint reaches Bytecode and fork gas-cost outputs, survives ``HashInt`` / ``HexNumber`` / ``ZeroPaddedHexNumber`` and Pydantic ``Storage`` validation, doesn't leak onto plain ints, and uninstall reverts. - ``collect_taint_hits``: storage taint detection, presence-based receipt / header / block-level / benchmark detection, the BenchmarkTest MRO gate that prevents filler-default false positives, and the OOG exclusions (tx.error, block_exception, exception_test).

marioevz

Nice! A couple of comments.

One thing that I've noticed is that this requires the test to be already using Bytecode.gas_cost, which is ok, but another step could be that we also catch (perhaps with a separate script) tests that are sensible to gas changes and do not use this construct (think static_tests). It might be unfeasible but worth exploring in a future PR.

marioevz · 2026-05-25T17:50:46Z

+    return accounts
+
+
+def _make_test(


My main concern with the tests that use this function (and the mock that it returns) is that, e.g. if we change post from BlockchainTest or StateTest, this unit test will not detect it because SimpleNamespace will still have a post field in it – I know it's unlikely that we rename post but we can make this unit test resist these types of changes.

Can we use the type of unit testing used in packages/testing/src/execution_testing/cli/pytest_commands/plugins/filler/tests/test_benchmarking.py or packages/testing/src/execution_testing/cli/pytest_commands/plugins/filler/tests/test_filler.py, where a mock pytest file is created and then the results analyzed?

marioevz · 2026-05-25T17:52:50Z

+
+
+def _is_oog_test(test: Any, node: pytest.Item | None) -> bool:
+    tx = getattr(test, "tx", None)


Is there any way we can infer the type here instead of trying to fetch attributes? It's a genuine question as it might be not possible and then simply making the unit tests more robust like I recommend in the other comment might be the way to go.

marioevz · 2026-05-25T17:55:09Z

+
+::
+
+    uv run python scripts/mark_tests.py REPORT.json --marker gas_check


A couple of requests for this script:

Make it available in the just command (so we can just mark_gas_tests ./tests/amsterdam/... and such)

Allow a test file as input argument, so we can reduce the scope of which tests we want to analyze to a single file or function.

Replace ``Any`` and ``getattr``-based field access in the sink walker with direct attribute access against ``BaseTest`` / ``StateTest`` / ``BlockchainTest`` / ``BenchmarkTest`` / ``Alloc`` / ``Transaction`` / ``Block``. Spec-type-specific access (``test.tx``, ``test.blocks``) now dispatches via ``isinstance`` rather than ``getattr(test, "...", None)``, so a future rename of ``post`` / ``tx`` / ``expected_receipt`` / etc. fails type-checking instead of silently returning empty hits. Also tightens the few remaining ``Any``s where possible: the wrapped opcode-gas calculator is now typed as ``Callable[[OpcodeBase], int]``, and ``pytest_testnodedown`` uses ``xdist.workermanage.WorkerController`` (stub added) and ``object | None``, matching the upstream xdist hook. The unit tests built with ``SimpleNamespace`` fakes no longer satisfy the new typed signature and will be replaced in a follow-up commit.

Replace the SimpleNamespace-based fakes in the walker tests with real ``StateTest`` / ``BlockchainTest`` / ``BenchmarkTest`` Pydantic instances so that field renames or retypings on the spec models fail the unit tests instead of silently masking them. The storage and mixed-sink tests now use the ``taint_installed`` fixture because Pydantic ``Storage`` validation strips the ``GasTainted`` subclass without the ``FixedSizeHexNumber.__new__`` patch. Carrier tests (``TestGasTaintedCarrier``) and install tests (``TestTaintInstallation``) are unchanged — they already exercised the real types.

Add four end-to-end tests that load the real fill plugin chain into a pytester subprocess and exercise the bits the unit tests deliberately skip: - ``--detect-gas-checks`` / ``--gas-check-report`` appear in ``fill --help``. - A synthetic test using ``CodeGasMeasure`` is detected and recorded as a ``storage`` hit with gas-derived origins. - A synthetic test marked ``@pytest.mark.exception_test`` is excluded from the report. - Without the flag, no report file is produced. Co-located with ``test_benchmarking.py`` and added to ``--ignore`` in ``test-tests`` / ``test-tests-pypy`` so unit-test runs stay fast; runs under ``test-tests-bench`` alongside the existing pytester suite. Needs ``EVM_BIN`` (defaults to ``evm``) on the runner.

Add a negative integration test that exercises a synthetic state test which stores a literal value (no ``CodeGasMeasure``, no ``expected_receipt``, no header verify) and asserts the resulting report is empty. Guards against false positives: a test that simply runs successfully without touching any of the detector's triggers must not appear in the JSON output.

…ipes - ``just mark_gas_tests <path-or-selector>`` runs ``fill --detect-gas-checks`` against the given target, then pipes the report through ``scripts/mark_tests.py`` to apply ``@pytest.mark.gas_check`` to the detected tests. Accepts any pytest path/selector, so the workflow can be scoped to a single file or function. - ``just mark_gas_tests_test`` runs the pytester end-to-end tests for the gas_taint plugin in their own ``[gas check]`` group, instead of piggybacking on ``test-tests-bench`` (which is for benchmark framework tests, not plugin e2e).

leolara · 2026-05-26T13:35:50Z

@marioevz I have addressed your comments.

I was thinking abotu it more like a temporary fix, but this way is more useful for the long run.

I made the code typed
I made the tests with real object of the type that should be
I added pytester to test the plugin like in the benchmarks
I added the just command that you said

leolara added 2 commits May 23, 2026 11:02

leolara marked this pull request as ready for review May 23, 2026 08:21

marioevz reviewed May 25, 2026

View reviewed changes

leolara added 5 commits May 26, 2026 18:48

leolara mentioned this pull request May 29, 2026

feat(tooling): detect and mark out-of-gas-by-design tests #2941

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tooling, tests): detect and mark gas-checking tests#2906

feat(tooling, tests): detect and mark gas-checking tests#2906
leolara wants to merge 8 commits into
ethereum:forks/amsterdamfrom
leolara:leolara/mark-gas-checking-tests

leolara commented May 23, 2026

Uh oh!

codecov Bot commented May 23, 2026 •

edited

Loading

Uh oh!

marioevz left a comment

Uh oh!

marioevz May 25, 2026

Uh oh!

marioevz May 25, 2026

Uh oh!

marioevz May 25, 2026

Uh oh!

leolara commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		def _is_oog_test(test: Any, node: pytest.Item \| None) -> bool:
		tx = getattr(test, "tx", None)


		::

		uv run python scripts/mark_tests.py REPORT.json --marker gas_check

Conversation

leolara commented May 23, 2026

🗒️ Description

Result on the current suite

🔗 Related Issues or PRs

✅ Checklist

Cute Animal Picture

Uh oh!

codecov Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

marioevz left a comment

Choose a reason for hiding this comment

Uh oh!

marioevz May 25, 2026

Choose a reason for hiding this comment

Uh oh!

marioevz May 25, 2026

Choose a reason for hiding this comment

Uh oh!

marioevz May 25, 2026

Choose a reason for hiding this comment

Uh oh!

leolara commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented May 23, 2026 •

edited

Loading