Skip to content

feat(ci): overhaul fixture releases#2888

Open
spencer-tb wants to merge 3 commits into
ethereum:forks/amsterdamfrom
spencer-tb:ci/overhaul-fixture-releases
Open

feat(ci): overhaul fixture releases#2888
spencer-tb wants to merge 3 commits into
ethereum:forks/amsterdamfrom
spencer-tb:ci/overhaul-fixture-releases

Conversation

@spencer-tb
Copy link
Copy Markdown
Contributor

@spencer-tb spencer-tb commented May 20, 2026

EL Client Dev Reviewers

Summary

Large heads up on how fixture releases will work going forward, we would appreciate your feedback on the consensus/devnet features before this lands.

The aim here is to consolidate to two release tracks that you'll consume:

  • Consensus (tests-consensus@vX.Y.Z) is the mainnet "must pass" set, all tests for all forks, up to the last mainnet fork, cut frequently (weekly) from the latest development branch. It replaces the old stable/develop split, and passing it means you won't break mainnet. Over time it'll absorb all of ethereum/legacy-tests, so the only tests you'll need will be from one repo: EELS.
  • Devnet (tests-<feat>-devnet@vX.Y.Z, e.g. tests-bal-devnet@v7.0.0) covers the fork/feature under active development, it fills all forks up to the development fork and is meant as an advisory/non-blocking gate on your devnet branch, passing it is the bar for inclusion in that devnet.

Both will be downloadable via gh release download <tag> or consume cache --input=<feature>@<version|latest>. For consensus, X = fork number (e.g. BPO2 = 20), Y = a behaviour/spec change, Z = new tests only; for devnet X = devnet number.

Below are some items we have discussed, and will not add or change:

  • We do not intend to create a test release repo, like the following: https://github.com/erigontech/eest-fixtures, with extracted tests. Why? Our fixtures are getting large, requiring file splits or LFS. A simple tarball extract we feel is reasonable and an easy step for all client to add within there CI.
  • Splitting tarballs by fixture type. We intend to keep all test types within one fixture release tarball.

Questions

A few specific things we'd like your feedback on:

  • Are you happy with the naming, i.e consensus for mainnet releases? Consensus could sound too CL like.
  • Is the tests release versioning scheme clear for each release type?
  • Cadence? Would weekly consensus releases be too noisy, bi-weekly? Twice a week?

EELS Reviewers

🗒️ Description

This PR outlines a detailed description and updates our test fixture release process and strategy following alignment with @danceratopz and other EL clients.

Key Changes

  • Release configs are cleaned up:
    • evm-impl.yaml is merged into evm.yaml; each entry now carries its impl, repo, ref, evm-bin and x-dist. Client aliases (consensus, benchmark) are added so feature.yaml/workflows can reference a client by name, and build-evm-base resolves which builder to run via the impl field.
    • The ethereumjs t8n tool is removed from the release/CI path (the local CLI wrapper is kept for local use only).
  • We now only have consensus, devnet and benchmark features (in forks/amsterdam).
    • devnet is specifically chosen so we don't ever need to update feature.yaml for future devnet features; the devnet keyword is used as a substring match on release tagging, and <feat>-devnet keeps its friendly name.
  • release_fixture_feature.yaml is replaced with release_fixtures.yaml:
    • Triggered via workflow_dispatch with explicit feature + version inputs; the git tag is tests-<feature>@<version> and the release title is <feature>@<version>.
    • Input validation fails fast: version must match vX.Y.Z, feature must be non-empty, *-devnet requires a branch, and an evm override must be a key in evm.yaml.
    • Optional evm / evm_repo / evm_ref inputs override the client impl and the t8n tool repo/ref for a one-off release.
    • Releases are tagged with the gh cli on success only, no tag push, no delete/recreate.
    • Releases are only drafted in EELS (EEST mirror is removed, EEST is now archived).
    • lllc and solc dependencies are removed.
  • Multi-runner split: BPO forks are filled within the osaka range (no standalone bpo split).
  • Benchmark: generates blockchain + blockchain_engine_x fixtures only (the default blockchain_engine fixture is dropped). The auto benchmark_fast push artifact is removed from benchmark.yaml.

Tagging A Release

To tag releases we must now use the github cli. This has the benefit of only creating tags in EELS if the fixture building process is successful. No more tag deletion and recreation, only workflow triggers.

The following can be ran locally or optionally triggered with the github actions website UI.

gh workflow run release_fixtures.yaml -f feature=consensus -f version=v20.3.1
# devnet releases additionally require the branch to release from:
gh workflow run release_fixtures.yaml -f feature=bal-devnet -f version=v7.0.0 -f branch=bal-devnet-7
# optional: override the client / t8n repo+ref for a one-off release
gh workflow run release_fixtures.yaml -f feature=consensus -f version=v20.3.1 \
  -f evm=geth -f evm_repo=ethereum/go-ethereum -f evm_ref=master

Downloading A Release

Fixtures can be downloaded by 2 seperate methods. gh release download for the raw tarball, or consume cache if you want tag resolution (@latest), local caching, and --input integration with consume subcommands.

# via gh cli
gh release download tests-consensus@v20.3.1 --repo ethereum/execution-specs
gh release download tests-bal-devnet@v7.0.0 --repo ethereum/execution-specs

# latest consensus by publish time:
LATEST=$(gh release list --repo ethereum/execution-specs --limit 100 \
  --json tagName --jq '[.[] | select(.tagName | startswith("tests-consensus@v")) | .tagName][0]')
gh release download "$LATEST" --repo ethereum/execution-specs

# via consume cache
uv run consume cache --input=consensus@v20.3.1
uv run consume cache --input=consensus@latest
uv run consume cache --input=bal-devnet@v7.0.0

Fixed Release Types

tests-consensus@vX.Y.Z

In the past (pre-Weld) we had the following release features: stable & develop, where stable was a subset for develop. To converge on these 2 features we now define the consensus feature. This is the invariant to the benchmark feature.

The consensus feature acts as our mainnet set of tests. That is for now fill --until BPO4, all tests for all forks until last mainnet fork. This will be released weekly on any change or addition to the consensus tests always from the latest development branch: forks/amsterdam currently. Clients will use this release on their main/master branches in CI, and eventually this release will contain all tests from ethereum/tests & ethereum/legacy-tests (allowing us to archive both of these repos). TLDR; one tag type to verify you will not break mainnet. Additionally this will be ran in our Hive CI (under what is currently labelled generic).

Here we define a new semantic versioning type for our consensus test releases:

  • X is the fork number, BPO2 is at index 20,
    • This makes it clear what fork the release is up to date with.
  • Y is used to determine any changes in behaviour, which translates to spec/test changes. Should rarely occur.
  • Z is simply a bump due to new test additions.

The first release of this type in EELS will be tests-consensus@v20.0.0, to catch up from the last fork. For the fork under development (Amsterdam) the first consensus release will be tagged once all CFI'd EIPs are deemed successful in a devnet (purposely ambiguous here, things change in ethereum). Typically this will resolve to the last devnet before the first testnet; this first release can be viewed as the testnet release. For Amsterdam we will tag tests-consensus@v24.0.0 likely after glamsterdam-devnet-6 is deemed successful. Spec changes can still occur and that is why we have Y in the new fixture versioning scheme.

tests-<feat>-devnet@vX.Y.Z

This release type will follow on from the current devnet release process but more explicitly. Today <feat> is bal and soon it will be glamsterdam. <feat> can essentially be any keyword but typically is the fork name or the headliner feature.

The devnet feature is entirely for test releases during the fork development process for the upcoming fork. Here we still fill for all forks so clients can make sure they do not introduce any regressions, currently fill --until Amsterdam, all tests for all forks until the development fork. Clients will use this release on their devnet branches in CI; they must pass all of these tests before being included in the devnet that the release is tagged for. This release will be ran in Hive CI under the same naming scheme as we do currently.

For the bal devnets today we use the tag tests-bal@vX.Y.Z. This PR will change the process to tests-bal-devnet@vX.Y.Z. As bal-devnet-7 is the last of the bal's we will start the new tagging scheme for glamsterdam-devnet-5. The devnet releases will additionally follow a new versioning scheme:

  • X is the devnet number, so for glamsterdam-devnet-5 this would be 5,
    • This makes it clear what devnet the release targets.
  • Y/Z follow the same semantics as the consensus test releases.

Following the latter, the first glamsterdam-devnet-5 release will be tagged as tests-glamsterdam-devnet@v5.0.0. All devnet releases must be tagged from a devnet branch in EELS, in this case glamsterdam-devnet-5. Here we are specifically choosing to diverge from the EELS / branch naming scheme to align every repo under the same devnet name.

tests-benchmark@vX.Y.Z / tests-zkevm@vX.Y.Z

Benchmark/zkevm releases are self explanatory here, and released only when required.

Here we only choose to change the versioning scheme to align with that of the consensus feature:

  • X is the fork number, Amsterdam is at index 24,
  • Y is used to determine any changes in behaviour, which translates to spec/test changes; this can occur more often if kept up to date with devnet related changes.
  • Z is simply a bump due to new test additions.

The benchmark/zkevm feature can be tagged from the related forks/amsterdam or the latest devnet branch in EELS. The next benchmark/zkevm release shall be tagged as tests-benchmark@v24.0.0/tests-zkevm@v24.0.0, cc @LouisTsai-Csie / @jsign.

🔗 Related Issues or PRs

✅ Checklist

  • All: Ran fast static checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    just static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

@spencer-tb spencer-tb added C-feat Category: an improvement or new feature P-medium A-ci Area: Continuous Integration labels May 20, 2026
@spencer-tb spencer-tb force-pushed the ci/overhaul-fixture-releases branch from abb005f to 8bd6b77 Compare May 20, 2026 14:00
@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.44%. Comparing base (62b914c) to head (b2ac847).
⚠️ Report is 11 commits behind head on forks/amsterdam.

Additional details and impacted files
@@                 Coverage Diff                 @@
##           forks/amsterdam    #2888      +/-   ##
===================================================
+ Coverage            87.16%   90.44%   +3.27%     
===================================================
  Files                  586      535      -51     
  Lines                35791    32439    -3352     
  Branches              3364     3012     -352     
===================================================
- Hits                 31198    29338    -1860     
+ Misses                3943     2573    -1370     
+ Partials               650      528     -122     
Flag Coverage Δ
unittests 90.44% <ø> (+3.27%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@spencer-tb spencer-tb force-pushed the ci/overhaul-fixture-releases branch 2 times, most recently from 7eac7b3 to d558bc7 Compare May 22, 2026 12:33
@spencer-tb
Copy link
Copy Markdown
Contributor Author

This PR needs to be merged to fully test out the new workflow, but I did some smaller smoke tests on my fork:

To keep the workflow runnable in minutes on a fork (no gigachungus runners), I scoped both features to a small Cancun test module and pointed build at ubuntu-latest. The split/combine/release wiring is identical to production, the only difference is the fill scope and runner diffs.

Smoke test releases can be found here: https://github.com/spencer-tb/execution-specs/releases

@spencer-tb spencer-tb force-pushed the ci/overhaul-fixture-releases branch from d558bc7 to b2ac847 Compare May 25, 2026 13:40
@spencer-tb spencer-tb marked this pull request as ready for review May 25, 2026 13:40
@chfast
Copy link
Copy Markdown
Member

chfast commented May 25, 2026

  1. "consensus" name not very good. I skip the section on the first read because I though this is for CL. I don't want very picky, but some of my suggestions: just "tests", "tests-main", "tests-stable" (in a sense for stable spec).
  2. I don't really care about the version number so you can chose any versioning. However, you are adding implicit matching rule: bal-devnet-7 → tests-bal-devnet@v7.x.y. This breaks semver (which I don't care about, as mentioned). Make sure you really want this.
  3. Having weekly / bi-weekly cadence would be big improvement for me. I often contribute new test cases for "tests-consensus" and want to integrate them to my CI as soon as possible. Do releases often. Skip if nothing new to release.

evm-type: benchmark
fill-params: --fork=Osaka --generate-all-formats --gas-benchmark-values 100 ./tests/benchmark/compute
feature_only: true
fill-params: --fork=Amsterdam --gas-benchmark-values 1,10,30,60,100,150 ./tests/benchmark
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using geth to fill the tests, but as we discussed, I don't think it has t8n tool support yet.

I did a partial review focusing mostly on benchmarking, and I don't see any other issues from my side.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Geth should have t8n support after this was merged, curious if there any issues

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Geth should have t8n support after this was merged, curious if there any issues

Oh thanks, that fixes something I pointed out in ethereum/go-ethereum#34972 (comment) :)

@marioevz
Copy link
Copy Markdown
Member

  1. "consensus" name not very good. I skip the section on the first read because I though this is for CL. I don't want very picky, but some of my suggestions: just "tests", "tests-main", "tests-stable" (in a sense for stable spec).

    1. I don't really care about the version number so you can chose any versioning. However, you are adding implicit matching rule: bal-devnet-7 → tests-bal-devnet@v7.x.y. This breaks semver (which I don't care about, as mentioned). Make sure you really want this.

    2. Having weekly / bi-weekly cadence would be big improvement for me. I often contribute new test cases for "tests-consensus" and want to integrate them to my CI as soon as possible. Do releases often. Skip if nothing new to release.

  1. I had not thought about, I think it's important. Maybe we should rollback to "stable" or similar?
  2. I agree that breaks semver but also think that it should not be a dealbreaker. I see these releases as more "ephemeral" in the sense that these are probably not going to be in client's CI workflows for more than a month or two, so this versioning scheme is ok for now IMO.
  3. I think this could be the next step: Once we have a good release process, we can start looking for automation for a certain release cadence (run a workflow that lists the stable tests for the current commit, compares against the list from the previous release, if any addition, release and list the changes).

@taratorio
Copy link
Copy Markdown
Contributor

This is great, thanks. Just one question about benchmark releases. Does it make sense for those to also have benchmark-consensus and benchmark-devnet variants? For example, recently ive been hitting a slight complication which ive had to work around - the benchmark fixtures dont have BALs in them so I couldnt use them locally for optimisation work on top of the devnet changes. I had to synthesise my own fixtures essentially. If we had benchmark-devnet variant for those I wouldve not had to do this workaround

@LouisTsai-Csie
Copy link
Copy Markdown
Collaborator

I support @taratorio 's idea if this does not complicate the workflow too much

Copy link
Copy Markdown
Member

@danceratopz danceratopz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this effort! Looking forward to getting all releases in execution-specs!

I feel a little bit weird about hijacking the MAJOR version for fork/devnet version (fork version is MINOR in EELS releases, so another slight inconsistency). We do have to hope PandaOps never launches feat-devnet-alphaone or similar to avoid invalid versions 😆

On face value, it seems to fit well. And i think one of your major motivations is to avoid hard-coding releases in the hive-tests runner config repos...
https://github.com/ethpandaops/hive-tests/blob/52cf2fcfa6bb698c11344994d8e223e69dd3a969/.github/workflows/hive-devnet-7.yaml#L161-L162

But it might be worth thinking through a little more. If we have two devnet versions (bal-devnet-3,bal-devnet-7) running at the same time, does it still simplify artifact download? I guess we never have hive runner configs for two devnets simultaneously?

As-is we could already do:

consume cache --input=bal-devnet-7@latest

and if we aim to deprecate consume cache then both versions are equally convenient with gh release download, I think?

Versioning scheme in this PR:

#!/usr/bin/env bash

TAG=$(gh release list --repo ethereum/execution-specs --limit 100 --json tagName \
    --jq 'map(select(.tagName | startswith("tests-bal-devnet@v7."))) | .[0].tagName')
gh release download "$TAG" --repo ethereum/execution-specs --pattern '*.tar.gz'

Versioning scheme currently:

#!/usr/bin/env bash
TAG=$(gh release list --repo ethereum/execution-specs --limit 100 --json tagName \
    --jq 'map(select(.tagName | startswith("tests-bal-devnet-7@v"))) | .[0].tagName')
gh release download "$TAG" --repo ethereum/execution-specs --pattern '*.tar.gz'

If this does not greatly help downstream convenience I would opt for explicit hard-coded release names here bal-devnet-7) and not add our own custom convention to versioning.

If we keep it, I wonder, at the risk of complicating the workflows, we should automate/hard-code the major version to avoid user error (see the docs corrections below) as suggested here. This would avoid duplicating this in the case of the devnet branch and put it close to the fill --until=<fork> config in the case of a fork branch. The comments on the invalid tag/version highlight that this is error prone.

Could you restructure the docs to keep all (or most of) the "Formats and Release Layout" section, but move it below the "Release Tracks" (or "Test Release Types" if we rename)?

I think removing blockchain_test_engine from the benchmark spec deserves its own PR as it gets a bit lost here.

consensus:
evm-type: eels
fill-params: --until=BPO2 --generate-all-formats
fill-params: --until=BPO4
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep --generate-all-formats please? :)

evm-type: benchmark
fill-params: --fork=Osaka --generate-all-formats --gas-benchmark-values 100 ./tests/benchmark/compute
feature_only: true
fill-params: --fork=Amsterdam --gas-benchmark-values 1,10,30,60,100,150 ./tests/benchmark
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we append 200M here to reflect the minimum glammie target value from the interop? Or other values @LouisTsai-Csie

Can also be a follow-up!

Comment thread .github/configs/evm.yaml
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that we configure a fill flag here is initially surprizing, a boolean supports-xdist instead of x-dist would show clearer intent. But probably not worth the overhead.

Comment thread .github/configs/evm.yaml
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we rename x-dist to xdist? I've never seen it hyphenated anywhere else )

Comment thread .github/configs/evm.yaml
Comment on lines +1 to +14
# Client aliases
benchmark:
impl: geth
repo: ethereum/go-ethereum
ref: master
evm-bin: evm
x-dist: auto
consensus:
impl: eels
repo: null
ref: null
evm-bin: null
x-dist: auto

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consensus this seems like an unnecessary redirct? I'd skip this obfuscation and just use eels, respectively geth in .github/configs/feature.yaml.

And, same for benchmark, assuming we only have one type of benchmark feature (as is the case in this PR - it removes benchmark_fast).

| Example dispatch | Git tag | Release title | Artifact |
| ---------------- | ------- | ------------- | -------- |
| `feature=consensus version=v1.2.3` | `tests-consensus@v1.2.3` | `consensus@v1.2.3` | `fixtures_consensus.tar.gz` |
| `feature=bal-devnet version=v1.0.0 branch=bal-devnet-7` | `tests-bal-devnet@v1.0.0` | `bal-devnet@v1.0.0` | `fixtures_bal-devnet.tar.gz` |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid version/tag I believe? It must be compatible with the target devnet version.

Suggested change
| `feature=bal-devnet version=v1.0.0 branch=bal-devnet-7` | `tests-bal-devnet@v1.0.0` | `bal-devnet@v1.0.0` | `fixtures_bal-devnet.tar.gz` |
| `feature=bal-devnet version=v7.0.0 branch=bal-devnet-7` | `tests-bal-devnet@v7.0.0` | `bal-devnet@v7.0.0` | `fixtures_bal-devnet.tar.gz` |

To cut a new release:

These releases are tagged using the format `<pre_release_name>@vX.Y.Z`.
1. **Pick the next version** per the [Versioning Scheme](#versioning-scheme) for the track you're releasing on (e.g. the next consensus release after `consensus@v20.3.0` is `consensus@v20.3.1` for a tests-only bump, or `consensus@v20.4.0` for a behaviour change).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be less ambiguous:

  • tests-only = new tests?
  • behavior change = fixed tests/specs?

```bash
gh workflow run release_fixtures.yaml -f feature=consensus -f version=v20.3.1
# devnet releases additionally require the branch to release from:
gh workflow run release_fixtures.yaml -f feature=bal-devnet -f version=v1.0.0 -f branch=bal-devnet-7
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should also be tested by the workflow that the branch "suffix" matches the target release major version?

Suggested change
gh workflow run release_fixtures.yaml -f feature=bal-devnet -f version=v1.0.0 -f branch=bal-devnet-7
gh workflow run release_fixtures.yaml -f feature=bal-devnet -f version=v7.0.0 -f branch=bal-devnet-7

Comment on lines +10 to +13
version:
description: "Release version, e.g. v20.0.0"
required: true
type: string
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making this Y.Z and:

  1. Devnet release type: Extracting X from devnets/bal/X (or in potentially in the future devnets-bal-X.
  2. Consensus release type: Hard-coding in .github/configs/feature.yaml (close to where --until=<fork> is defined).
  3. Benchmark release type: Hard-code in .github/configs/feature.yaml.

env:
INPUT_FEATURE: ${{ inputs.feature }}
INPUT_VERSION: ${{ inputs.version }}
INPUT_BRANCH: ${{ inputs.branch }}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat of an edge case, but it's valid and there's an easy fix that's probably worth adding.

From codex:

There's a potential race condition when INPUT_BRANCH is set, e.g. devnets/bal/7):

  • setup, build (matrix), combine, release each call actions/checkout with: ref: ${{ inputs.branch }} → resolves the current branch tip independently at each checkout (lines 42, 97, 118, 174).
  • Release step: TARGET="${INPUT_BRANCH:-$GITHUB_SHA}"INPUT_BRANCH (a branch name string) → gh release create --target bal-devnet-7 → resolves the current branch tip again at tag creation time (line 221).
  • That's 5+ independent snapshots of the branch tip, any of which can disagree if the branch advances during the (potentially 24h =job timeout) build.
  • GITHUB_SHA doesn't enter into this path at all — it's whatever the dispatch ref's tip was at dispatch time, which is typically a different ref (e.g. forks/amsterdam) and never gets consulted in this path.

No problem when INPUT_BRANCH is empty:

  • Every actions/checkout is called with ref: ${{ inputs.branch }} → empty → checkout falls back to GITHUB_SHA.
  • Release step: TARGET="${INPUT_BRANCH:-$GITHUB_SHA}"GITHUB_SHA.
  • GITHUB_SHA is immutable for the duration of the run.
  • Race-free. All jobs see the same SHA, tag lands on the same SHA. Done.

Suggested fix:

  # in `setup` job, after checkout:
  - name: Resolve target SHA
    id: target_sha
    run: echo "sha=$(git rev-parse HEAD)" >> "$GITHUB_OUTPUT"

Add target_sha to setup.outputs, then replace every ref: ${{ inputs.branch }} with ref: ${{ needs.setup.outputs.target_sha }} (in build, combine, release), and --target "$TARGET" with --target "${{ needs.setup.outputs.target_sha }}". Now there's exactly one tip resolution; everything downstream is SHA-pinned.

@danceratopz
Copy link
Copy Markdown
Member

This is great, thanks. Just one question about benchmark releases. Does it make sense for those to also have benchmark-consensus and benchmark-devnet variants?

Hey @taratorio, thanks for asking and the feedback! Yes, this def makes sense in general and we can add these as required.

For example, recently ive been hitting a slight complication which ive had to work around - the benchmark fixtures dont have BALs in them so I couldnt use them locally for optimisation work on top of the devnet changes. I had to synthesise my own fixtures essentially. If we had benchmark-devnet variant for those I wouldve not had to do this workaround

Were you filling these fixtures yourself? The latest release https://github.com/ethereum/execution-specs/releases/tag/tests-benchmark%40v0.0.9 is only filled for Osaka. There might have been a bit of flux here due to incompatible/incomplete t8n tools for EELS and geth (benchmark releases have been using geth), but as-is today 😆 on forks/amsterdam with ethereum/go-ethereum#35025 both regular and benchmark test fixtures have blockAccessList. They should be in the next benchmark release, which will target Amsterdam!

@taratorio
Copy link
Copy Markdown
Contributor

This is great, thanks. Just one question about benchmark releases. Does it make sense for those to also have benchmark-consensus and benchmark-devnet variants?

Hey @taratorio, thanks for asking and the feedback! Yes, this def makes sense in general and we can add these as required.

For example, recently ive been hitting a slight complication which ive had to work around - the benchmark fixtures dont have BALs in them so I couldnt use them locally for optimisation work on top of the devnet changes. I had to synthesise my own fixtures essentially. If we had benchmark-devnet variant for those I wouldve not had to do this workaround

Were you filling these fixtures yourself? The latest release https://github.com/ethereum/execution-specs/releases/tag/tests-benchmark%40v0.0.9 is only filled for Osaka. There might have been a bit of flux here due to incompatible/incomplete t8n tools for EELS and geth (benchmark releases have been using geth), but as-is today 😆 on forks/amsterdam with ethereum/go-ethereum#35025 both regular and benchmark test fixtures have blockAccessList. They should be in the next benchmark release, which will target Amsterdam!

yes, I filled a few that I was interested in with erigon but not via t8n but my own quick and hacky way

@danceratopz
Copy link
Copy Markdown
Member

danceratopz commented May 27, 2026

Just discussed with @spencer-tb and @LouisTsai-Csie, this is our suggestion going forward:

  1. Instead of changing what is now "mainnet" to "consensus" (in this PR), we just tag these releases as tests@vX.Y.Z. No stable or other labelling required, they're just the tests. We aim for a on-demand release schedule, which could be as frequently as weekly if tests are added/fixed. The forks covered/filled-for in these releases should be slightly ahead of clients' testnet/mainnet release schedules. I.e., we include Amsterdam in good time before for the first testnet releases.
  2. The benchmark changes will move to another PR. Benchmark devnet releases:
    • A: For devnet 7, merge 8037 PR, bump Osaka->Amsterdam, pick/decide on an EVM :), then create the release from forks/amsterdam.
    • B: In general, for now, if necessary just add a new entry to feature.yaml (Move benchmark changes to another PR). I.e., if we need a benchmark-glamsterdam-devnet-5@v1.0.0, we can create a new entry for that.
  3. We simplify release the versioning scheme that can be applied to any test fixture release to:
    X: fix spec (spec change)
    Y: fix test
    Z: new test
  4. The next/new release versions start at:
    • tests@v1.0.0
    • benchmark@v1.0.0 maybe from the devnet-7 release on / maybe the next one TBD ??
    • next devnet release - <feat>-devnet-<N>@v1.0.0

@marioevz
Copy link
Copy Markdown
Member

Just discussed with @spencer-tb and @LouisTsai-Csie, this is our suggestion going forward:

1. Instead of changing what is now "mainnet" to "consensus" (in this PR), we just tag these releases as `tests@vX.Y.Z`. No stable or other labelling required, they're just the tests. We aim for a on-demand release schedule, which could be as frequently as weekly if tests are added/fixed. The forks covered/filled-for in these releases should be slightly ahead of clients' testnet/mainnet release schedules. I.e., we include Amsterdam in good time before for the first testnet releases.

2. The benchmark changes will move to another PR. Benchmark devnet releases:
   
   * A: For devnet 7, merge 8037 PR, bump Osaka->Amsterdam, pick/decide on an EVM :), then create the release from forks/amsterdam.
   * B: In general, for now, if necessary just add a new entry to feature.yaml (Move benchmark changes to another PR). I.e., if we need a benchmark-glamsterdam-devnet-5@v1.0.0, we can create a new entry for that.

3. We simplify release the versioning scheme that can be applied to any test fixture release to:
   X: fix spec (spec change)
   Y: fix test
   Z: new test

4. The next/new release versions start at:
   
   * `tests@v1.0.0`
   * `benchmark@v1.0.0` maybe from the devnet-7 release on / maybe the next one TBD ??
   * next devnet release -` <feat>-devnet-<N>@v1.0.0`

I mostly agree with this comment except for points 3 and 4, since I think the MAJOR should refer to the devnet/fork:

My suggestion is as follows:

For tests@vX.Y.Z:

  • X: Fork-based number
  • Y: Consensus-breaking spec change targeting fork X
  • Z: Non-breaking spec change (refactoring), new tests, modified tests

For <feat>-devnet@vX.Y.Z:

  • X: Devnet number
  • Y: Consensus-breaking spec change targeting devnet X
  • Z: Non-breaking spec change (refactoring), new tests, modified tests

I.e. if a client is targeting to join fork/devnet X, it should target MAJOR equal to X, must take the latest MINOR, and should ideally take the latest PATCH.

With fork/devnet as MAJOR, the version number alone tells a client whether a release is relevant to them. Under the alternative (MAJOR tracks any spec change), you'd have to combine the version and the -N suffix in the release name to figure out whether a spec change targets your devnet or a different one. Devnet compatibility shouldn't require parsing two fields IMO.

Concretely, the mindset just from looking at the version (ignoring the name) should be: I'm a client dev targeting devnet 7, and I was passing tests contained inv7.0.0, but now v7.1.0 has been released, which means there was a spec change in the devnet my client was targeting, hence I should read the release notes to figure out if the spec change affects my client's ability to join the devnet.

On the topic of parallel maintenance of two different devnets, this scheme handles this naturally IMO. E.g. if we are maintaining devnets 3 & 7 for the same feature, releasing v3.0.1 after or alongside v7.0.0 is not a problem (think Python 2.x vs 3.x). It is a well-known Semver pattern, and Semver is good at this. We should simply make this rule clear and follow it so we are predictable.

On consume cache, it's a solvable problem. We will need to update our tooling, but it's not a big issue.

On benchmarking, major and minor should mirror the feature they are targeting, while the patch moves freely at its own pace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ci Area: Continuous Integration C-feat Category: an improvement or new feature P-medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants