Skip to content

feat(tooling, tests): detect and mark gas-checking tests#2906

Open
leolara wants to merge 8 commits into
ethereum:forks/amsterdamfrom
leolara:leolara/mark-gas-checking-tests
Open

feat(tooling, tests): detect and mark gas-checking tests#2906
leolara wants to merge 8 commits into
ethereum:forks/amsterdamfrom
leolara:leolara/mark-gas-checking-tests

Conversation

@leolara
Copy link
Copy Markdown
Member

@leolara leolara commented May 23, 2026

🗒️ Description

Adds a pytest plugin that identifies tests asserting specific gas values, and applies a @pytest.mark.gas_check marker to the tests it finds.

The plugin (gas_taint.py) is enabled with --detect-gas-checks in fill mode only. It uses two detection modes:

  • post[addr].storage[slot] — taint propagation. A GasTainted(int) carrier is returned by Bytecode.gas_cost, per-fork gas_costs / opcode_gas_calculator / transaction_intrinsic_cost_calculator, and propagated through arithmetic. Number.__new__ and FixedSizeHexNumber.__new__ are patched so taint survives HashInt / HexNumber construction.
  • Receipt / header / block-level / benchmark fields — field-presence. The field name itself (cumulative_gas_used, expected_gas_used, blob_gas_used, …) is the assertion signal; if set, flag.

OOG-expecting tests (tx.error, block_exception, or @pytest.mark.exception_test) are excluded — their gas value lives on tx.gas_limit, an input, not a sink.

Worker results are aggregated to the master via workeroutput and pytest_testnodedown. Output is a JSON report; default path is gas_check_report.json.

Block.expected_gas_used and BaseTest.expected_benchmark_gas_used change from int to HexNumber so they pass through the patched Number.__new__. Both are internal assertion-only fields, not serialised to fixtures, so the type change has no fixture impact.

scripts/mark_tests.py is an idempotent script that applies a configurable @pytest.mark.<name> decorator to test functions listed in a JSON file. Three input shapes accepted: bare mapping, bare list of node IDs, or wrapped form with embedded marker name. Used here to apply @pytest.mark.gas_check to the 54 functions identified by a full-suite scan; re-running is a no-op.

Result on the current suite

Full-suite run (fill --detect-gas-checks --until=Amsterdam -m "not slow" -n 8 tests/): 184,683 passed, 0 failures, 0 errors. 12,890 parametrized test cases flagged, collapsing to 54 unique functions across 31 files / 16 EIP directories.

Sink kind Hits Detection
storage 10,184 taint
header.blob_gas_used 5,784 presence
receipt.cumulative_gas_used 1,338 presence
block_expected_gas_used 68 presence

🔗 Related Issues or PRs

N/A.

✅ Checklist

  • All: Ran fast static checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    just static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

leolara added 2 commits May 23, 2026 11:02
Adds a pytest plugin enabled with `--detect-gas-checks` (fill mode only)
that identifies tests asserting specific gas values. The plugin uses two
detection modes, chosen per sink:

- **Storage in post** (`post[addr].storage[slot]`) — taint propagation.
  Wraps `Bytecode.gas_cost`, per-fork `gas_costs` /
  `opcode_gas_calculator` / `transaction_intrinsic_cost_calculator`, and
  patches `Number.__new__` + `FixedSizeHexNumber.__new__` so a
  `GasTainted(int)` carrier survives `HashInt` / `HexNumber`
  construction. Tests whose post-storage holds a tainted value are
  flagged.
- **Receipt / header / block-level / benchmark fields** — field-presence.
  The field name (`cumulative_gas_used`, `expected_gas_used`,
  `blob_gas_used`, etc.) is itself the assertion signal; if set, flag.

Tests with `tx.error`, `block_exception`, or the `exception_test` marker
are excluded so OOG-expecting tests don't pollute the output. Worker
results are aggregated to the master via `workeroutput` /
`pytest_testnodedown`.

`Block.expected_gas_used` and `BaseTest.expected_benchmark_gas_used`
change from `int` to `HexNumber`. Both are internal assertion-only and
not serialised to fixtures, so the type change has no fixture impact and
brings them under the same constructor path as the other gas sinks.
The benchmark validator widens a local var type to accommodate the
union.

Also adds `scripts/mark_tests.py`, an idempotent script that applies a
configurable `@pytest.mark.<name>` decorator to test functions listed in
a JSON file. Accepts three input shapes (bare mapping, bare list,
wrapped with embedded marker name); `--marker` overrides any embedded
value.
Applies `@pytest.mark.gas_check` to the 54 test functions identified by
running `fill --detect-gas-checks` on the full suite. Produced by
`scripts/mark_tests.py gas_check_report.json`; re-running the script is
idempotent.

Affects 31 files across 16 EIP directories; one decorator each, inserted
at the top of the existing decorator stack.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.44%. Comparing base (55d774b) to head (b4211a1).
⚠️ Report is 4 commits behind head on forks/amsterdam.

Additional details and impacted files
@@               Coverage Diff                @@
##           forks/amsterdam    #2906   +/-   ##
================================================
  Coverage            90.44%   90.44%           
================================================
  Files                  535      535           
  Lines                32439    32439           
  Branches              3012     3012           
================================================
  Hits                 29338    29338           
  Misses                2573     2573           
  Partials               528      528           
Flag Coverage Δ
unittests 90.44% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

38 tests covering:

- ``GasTainted`` carrier: construction, arithmetic propagation (incl.
  reflected ops), origin union/dedup, and the NotImplemented forward
  path that fixes ``bytes_literal * GasTainted(n)``.
- ``install_taint`` / ``uninstall_taint``: taint reaches Bytecode and
  fork gas-cost outputs, survives ``HashInt`` / ``HexNumber`` /
  ``ZeroPaddedHexNumber`` and Pydantic ``Storage`` validation, doesn't
  leak onto plain ints, and uninstall reverts.
- ``collect_taint_hits``: storage taint detection, presence-based
  receipt / header / block-level / benchmark detection, the
  BenchmarkTest MRO gate that prevents filler-default false positives,
  and the OOG exclusions (tx.error, block_exception, exception_test).
@leolara leolara marked this pull request as ready for review May 23, 2026 08:21
Copy link
Copy Markdown
Member

@marioevz marioevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! A couple of comments.

One thing that I've noticed is that this requires the test to be already using Bytecode.gas_cost, which is ok, but another step could be that we also catch (perhaps with a separate script) tests that are sensible to gas changes and do not use this construct (think static_tests). It might be unfeasible but worth exploring in a future PR.

return accounts


def _make_test(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern with the tests that use this function (and the mock that it returns) is that, e.g. if we change post from BlockchainTest or StateTest, this unit test will not detect it because SimpleNamespace will still have a post field in it – I know it's unlikely that we rename post but we can make this unit test resist these types of changes.

Can we use the type of unit testing used in packages/testing/src/execution_testing/cli/pytest_commands/plugins/filler/tests/test_benchmarking.py or packages/testing/src/execution_testing/cli/pytest_commands/plugins/filler/tests/test_filler.py, where a mock pytest file is created and then the results analyzed?



def _is_oog_test(test: Any, node: pytest.Item | None) -> bool:
tx = getattr(test, "tx", None)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way we can infer the type here instead of trying to fetch attributes? It's a genuine question as it might be not possible and then simply making the unit tests more robust like I recommend in the other comment might be the way to go.

Comment thread scripts/mark_tests.py

::

uv run python scripts/mark_tests.py REPORT.json --marker gas_check
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of requests for this script:

  • Make it available in the just command (so we can just mark_gas_tests ./tests/amsterdam/... and such)
  • Allow a test file as input argument, so we can reduce the scope of which tests we want to analyze to a single file or function.

leolara added 5 commits May 26, 2026 18:48
Replace ``Any`` and ``getattr``-based field access in the sink walker
with direct attribute access against ``BaseTest`` / ``StateTest`` /
``BlockchainTest`` / ``BenchmarkTest`` / ``Alloc`` / ``Transaction`` /
``Block``. Spec-type-specific access (``test.tx``, ``test.blocks``) now
dispatches via ``isinstance`` rather than ``getattr(test, "...", None)``,
so a future rename of ``post`` / ``tx`` / ``expected_receipt`` / etc.
fails type-checking instead of silently returning empty hits.

Also tightens the few remaining ``Any``s where possible: the wrapped
opcode-gas calculator is now typed as ``Callable[[OpcodeBase], int]``,
and ``pytest_testnodedown`` uses ``xdist.workermanage.WorkerController``
(stub added) and ``object | None``, matching the upstream xdist hook.

The unit tests built with ``SimpleNamespace`` fakes no longer satisfy
the new typed signature and will be replaced in a follow-up commit.
Replace the SimpleNamespace-based fakes in the walker tests with real
``StateTest`` / ``BlockchainTest`` / ``BenchmarkTest`` Pydantic instances
so that field renames or retypings on the spec models fail the unit
tests instead of silently masking them.

The storage and mixed-sink tests now use the ``taint_installed`` fixture
because Pydantic ``Storage`` validation strips the ``GasTainted``
subclass without the ``FixedSizeHexNumber.__new__`` patch.

Carrier tests (``TestGasTaintedCarrier``) and install tests
(``TestTaintInstallation``) are unchanged — they already exercised the
real types.
Add four end-to-end tests that load the real fill plugin chain into a
pytester subprocess and exercise the bits the unit tests deliberately
skip:

- ``--detect-gas-checks`` / ``--gas-check-report`` appear in
  ``fill --help``.
- A synthetic test using ``CodeGasMeasure`` is detected and recorded as
  a ``storage`` hit with gas-derived origins.
- A synthetic test marked ``@pytest.mark.exception_test`` is excluded
  from the report.
- Without the flag, no report file is produced.

Co-located with ``test_benchmarking.py`` and added to ``--ignore`` in
``test-tests`` / ``test-tests-pypy`` so unit-test runs stay fast; runs
under ``test-tests-bench`` alongside the existing pytester suite. Needs
``EVM_BIN`` (defaults to ``evm``) on the runner.
Add a negative integration test that exercises a synthetic state test
which stores a literal value (no ``CodeGasMeasure``, no
``expected_receipt``, no header verify) and asserts the resulting
report is empty. Guards against false positives: a test that simply
runs successfully without touching any of the detector's triggers must
not appear in the JSON output.
…ipes

- ``just mark_gas_tests <path-or-selector>`` runs ``fill
  --detect-gas-checks`` against the given target, then pipes the report
  through ``scripts/mark_tests.py`` to apply ``@pytest.mark.gas_check``
  to the detected tests. Accepts any pytest path/selector, so the
  workflow can be scoped to a single file or function.

- ``just mark_gas_tests_test`` runs the pytester end-to-end tests for
  the gas_taint plugin in their own ``[gas check]`` group, instead of
  piggybacking on ``test-tests-bench`` (which is for benchmark
  framework tests, not plugin e2e).
@leolara
Copy link
Copy Markdown
Member Author

leolara commented May 26, 2026

@marioevz I have addressed your comments.

I was thinking abotu it more like a temporary fix, but this way is more useful for the long run.

  • I made the code typed
  • I made the tests with real object of the type that should be
  • I added pytester to test the plugin like in the benchmarks
  • I added the just command that you said

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants