feat(tooling, tests): detect and mark gas-checking tests#2906
Conversation
Adds a pytest plugin enabled with `--detect-gas-checks` (fill mode only) that identifies tests asserting specific gas values. The plugin uses two detection modes, chosen per sink: - **Storage in post** (`post[addr].storage[slot]`) — taint propagation. Wraps `Bytecode.gas_cost`, per-fork `gas_costs` / `opcode_gas_calculator` / `transaction_intrinsic_cost_calculator`, and patches `Number.__new__` + `FixedSizeHexNumber.__new__` so a `GasTainted(int)` carrier survives `HashInt` / `HexNumber` construction. Tests whose post-storage holds a tainted value are flagged. - **Receipt / header / block-level / benchmark fields** — field-presence. The field name (`cumulative_gas_used`, `expected_gas_used`, `blob_gas_used`, etc.) is itself the assertion signal; if set, flag. Tests with `tx.error`, `block_exception`, or the `exception_test` marker are excluded so OOG-expecting tests don't pollute the output. Worker results are aggregated to the master via `workeroutput` / `pytest_testnodedown`. `Block.expected_gas_used` and `BaseTest.expected_benchmark_gas_used` change from `int` to `HexNumber`. Both are internal assertion-only and not serialised to fixtures, so the type change has no fixture impact and brings them under the same constructor path as the other gas sinks. The benchmark validator widens a local var type to accommodate the union. Also adds `scripts/mark_tests.py`, an idempotent script that applies a configurable `@pytest.mark.<name>` decorator to test functions listed in a JSON file. Accepts three input shapes (bare mapping, bare list, wrapped with embedded marker name); `--marker` overrides any embedded value.
Applies `@pytest.mark.gas_check` to the 54 test functions identified by running `fill --detect-gas-checks` on the full suite. Produced by `scripts/mark_tests.py gas_check_report.json`; re-running the script is idempotent. Affects 31 files across 16 EIP directories; one decorator each, inserted at the top of the existing decorator stack.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## forks/amsterdam #2906 +/- ##
================================================
Coverage 90.44% 90.44%
================================================
Files 535 535
Lines 32439 32439
Branches 3012 3012
================================================
Hits 29338 29338
Misses 2573 2573
Partials 528 528
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
38 tests covering: - ``GasTainted`` carrier: construction, arithmetic propagation (incl. reflected ops), origin union/dedup, and the NotImplemented forward path that fixes ``bytes_literal * GasTainted(n)``. - ``install_taint`` / ``uninstall_taint``: taint reaches Bytecode and fork gas-cost outputs, survives ``HashInt`` / ``HexNumber`` / ``ZeroPaddedHexNumber`` and Pydantic ``Storage`` validation, doesn't leak onto plain ints, and uninstall reverts. - ``collect_taint_hits``: storage taint detection, presence-based receipt / header / block-level / benchmark detection, the BenchmarkTest MRO gate that prevents filler-default false positives, and the OOG exclusions (tx.error, block_exception, exception_test).
marioevz
left a comment
There was a problem hiding this comment.
Nice! A couple of comments.
One thing that I've noticed is that this requires the test to be already using Bytecode.gas_cost, which is ok, but another step could be that we also catch (perhaps with a separate script) tests that are sensible to gas changes and do not use this construct (think static_tests). It might be unfeasible but worth exploring in a future PR.
| return accounts | ||
|
|
||
|
|
||
| def _make_test( |
There was a problem hiding this comment.
My main concern with the tests that use this function (and the mock that it returns) is that, e.g. if we change post from BlockchainTest or StateTest, this unit test will not detect it because SimpleNamespace will still have a post field in it – I know it's unlikely that we rename post but we can make this unit test resist these types of changes.
Can we use the type of unit testing used in packages/testing/src/execution_testing/cli/pytest_commands/plugins/filler/tests/test_benchmarking.py or packages/testing/src/execution_testing/cli/pytest_commands/plugins/filler/tests/test_filler.py, where a mock pytest file is created and then the results analyzed?
|
|
||
|
|
||
| def _is_oog_test(test: Any, node: pytest.Item | None) -> bool: | ||
| tx = getattr(test, "tx", None) |
There was a problem hiding this comment.
Is there any way we can infer the type here instead of trying to fetch attributes? It's a genuine question as it might be not possible and then simply making the unit tests more robust like I recommend in the other comment might be the way to go.
|
|
||
| :: | ||
|
|
||
| uv run python scripts/mark_tests.py REPORT.json --marker gas_check |
There was a problem hiding this comment.
A couple of requests for this script:
- Make it available in the
justcommand (so we canjust mark_gas_tests ./tests/amsterdam/...and such) - Allow a test file as input argument, so we can reduce the scope of which tests we want to analyze to a single file or function.
Replace ``Any`` and ``getattr``-based field access in the sink walker with direct attribute access against ``BaseTest`` / ``StateTest`` / ``BlockchainTest`` / ``BenchmarkTest`` / ``Alloc`` / ``Transaction`` / ``Block``. Spec-type-specific access (``test.tx``, ``test.blocks``) now dispatches via ``isinstance`` rather than ``getattr(test, "...", None)``, so a future rename of ``post`` / ``tx`` / ``expected_receipt`` / etc. fails type-checking instead of silently returning empty hits. Also tightens the few remaining ``Any``s where possible: the wrapped opcode-gas calculator is now typed as ``Callable[[OpcodeBase], int]``, and ``pytest_testnodedown`` uses ``xdist.workermanage.WorkerController`` (stub added) and ``object | None``, matching the upstream xdist hook. The unit tests built with ``SimpleNamespace`` fakes no longer satisfy the new typed signature and will be replaced in a follow-up commit.
Replace the SimpleNamespace-based fakes in the walker tests with real ``StateTest`` / ``BlockchainTest`` / ``BenchmarkTest`` Pydantic instances so that field renames or retypings on the spec models fail the unit tests instead of silently masking them. The storage and mixed-sink tests now use the ``taint_installed`` fixture because Pydantic ``Storage`` validation strips the ``GasTainted`` subclass without the ``FixedSizeHexNumber.__new__`` patch. Carrier tests (``TestGasTaintedCarrier``) and install tests (``TestTaintInstallation``) are unchanged — they already exercised the real types.
Add four end-to-end tests that load the real fill plugin chain into a pytester subprocess and exercise the bits the unit tests deliberately skip: - ``--detect-gas-checks`` / ``--gas-check-report`` appear in ``fill --help``. - A synthetic test using ``CodeGasMeasure`` is detected and recorded as a ``storage`` hit with gas-derived origins. - A synthetic test marked ``@pytest.mark.exception_test`` is excluded from the report. - Without the flag, no report file is produced. Co-located with ``test_benchmarking.py`` and added to ``--ignore`` in ``test-tests`` / ``test-tests-pypy`` so unit-test runs stay fast; runs under ``test-tests-bench`` alongside the existing pytester suite. Needs ``EVM_BIN`` (defaults to ``evm``) on the runner.
Add a negative integration test that exercises a synthetic state test which stores a literal value (no ``CodeGasMeasure``, no ``expected_receipt``, no header verify) and asserts the resulting report is empty. Guards against false positives: a test that simply runs successfully without touching any of the detector's triggers must not appear in the JSON output.
…ipes - ``just mark_gas_tests <path-or-selector>`` runs ``fill --detect-gas-checks`` against the given target, then pipes the report through ``scripts/mark_tests.py`` to apply ``@pytest.mark.gas_check`` to the detected tests. Accepts any pytest path/selector, so the workflow can be scoped to a single file or function. - ``just mark_gas_tests_test`` runs the pytester end-to-end tests for the gas_taint plugin in their own ``[gas check]`` group, instead of piggybacking on ``test-tests-bench`` (which is for benchmark framework tests, not plugin e2e).
|
@marioevz I have addressed your comments. I was thinking abotu it more like a temporary fix, but this way is more useful for the long run.
|
🗒️ Description
Adds a pytest plugin that identifies tests asserting specific gas values, and applies a
@pytest.mark.gas_checkmarker to the tests it finds.The plugin (
gas_taint.py) is enabled with--detect-gas-checksin fill mode only. It uses two detection modes:post[addr].storage[slot]— taint propagation. AGasTainted(int)carrier is returned byBytecode.gas_cost, per-forkgas_costs/opcode_gas_calculator/transaction_intrinsic_cost_calculator, and propagated through arithmetic.Number.__new__andFixedSizeHexNumber.__new__are patched so taint survivesHashInt/HexNumberconstruction.cumulative_gas_used,expected_gas_used,blob_gas_used, …) is the assertion signal; if set, flag.OOG-expecting tests (
tx.error,block_exception, or@pytest.mark.exception_test) are excluded — their gas value lives ontx.gas_limit, an input, not a sink.Worker results are aggregated to the master via
workeroutputandpytest_testnodedown. Output is a JSON report; default path isgas_check_report.json.Block.expected_gas_usedandBaseTest.expected_benchmark_gas_usedchange frominttoHexNumberso they pass through the patchedNumber.__new__. Both are internal assertion-only fields, not serialised to fixtures, so the type change has no fixture impact.scripts/mark_tests.pyis an idempotent script that applies a configurable@pytest.mark.<name>decorator to test functions listed in a JSON file. Three input shapes accepted: bare mapping, bare list of node IDs, or wrapped form with embedded marker name. Used here to apply@pytest.mark.gas_checkto the 54 functions identified by a full-suite scan; re-running is a no-op.Result on the current suite
Full-suite run (
fill --detect-gas-checks --until=Amsterdam -m "not slow" -n 8 tests/): 184,683 passed, 0 failures, 0 errors. 12,890 parametrized test cases flagged, collapsing to 54 unique functions across 31 files / 16 EIP directories.🔗 Related Issues or PRs
N/A.
✅ Checklist
just statictype(scope):.mkdocs servelocally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.@ported_frommarker.Cute Animal Picture