Skip to content

feat(engine): introduce REST based + SSZ serialized new-payload-with-witness#11623

Open
Dyslex7c wants to merge 62 commits into
NethermindEth:masterfrom
Dyslex7c:new-payload-with-witness-ssz
Open

feat(engine): introduce REST based + SSZ serialized new-payload-with-witness#11623
Dyslex7c wants to merge 62 commits into
NethermindEth:masterfrom
Dyslex7c:new-payload-with-witness-ssz

Conversation

@Dyslex7c
Copy link
Copy Markdown
Contributor

@Dyslex7c Dyslex7c commented May 15, 2026

Summary

Implements the POST /new-payload-with-witness Engine API REST endpoint as specified in ethereum/execution-apis#773. This endpoint combines payload execution and witness generation into a single HTTP+SSZ call, replacing the previous two-step engine_newPayload + debug_executionWitness flow.

Also adds a engine_newPayloadWithWitness JSON-RPC method that exposes the same semantics over the standard JSON-RPC transport.

Changes

  • Added NewPayloadWithWitnessSszHandler — handles POST /new-payload-with-witness, accepts the same JSON params as engine_newPayloadV5, delegates to the engine_newPayloadWithWitness JSON-RPC handler, and returns an SSZ-encoded NewPayloadWithWitnessResponseV1
  • Added engine_newPayloadWithWitness JSON-RPC method on EngineRpcModule backed by NewPayloadWithWitnessHandler — same execution + witness generation logic available over JSON-RPC
  • Encoding/decoding of NewPayloadWithWitnessResponseV1 is hand-rolled in SszCodec (not auto-generated) to correctly implement the spec's Union[None, T] layout with explicit selector bytes (0x00 = None, 0x01 ++ T = Some). The generator's [SszCompatibleUnion] does not support selector byte 0x00, so this cannot go through the standard path; a comment in SszWireTypes.cs explains the constraint.
  • Added SszCodec.EncodeNewPayloadWithWitnessResponse and DecodeNewPayloadWithWitnessResponse — encode/decode the combined status + witness response. Non-VALID statuses encode witness as Union None (selector byte 0x00), not an empty list.
  • Added a dedicated fast-path in SszMiddleware for the non-versioned /new-payload-with-witness path; the versioned /engine/v{N}/new-payload-with-witness path correctly returns 404
  • Separated SszRestCapabilities from SszRestPaths; registered rest_engine_newPayloadWithWitness capability in EngineRpcCapabilitiesProvider, gated on IsEip7928Enabled

Behaviour changes affecting all SSZ-REST callers

SszEndpointHandlerBase.WriteErrorAsync now emits application/json with {"code": …, "message": "…"} for all SSZ-REST error responses — auth failures on /engine/v*/…, 404s, 413s, malformed-SSZ 400s, and 500s. Previously these were text/plain with a raw message string. Any tooling that parses the raw text body of error responses from these endpoints will need to be updated.

PayloadStatusWire.ValidationError SSZ list cap bumped from 1024 → 8192 bytes (spec correctness; affects all PayloadStatusV1 SSZ responses, not just the witness endpoint). This changes the merkleization parameters for PayloadStatusV1. Downstream tooling with hardcoded golden-byte test fixtures against the old 1024 cap will need updating.

Constructor signature change for EngineRpcModule

EngineRpcModule now requires one additional constructor parameter: INewPayloadWithWitnessHandler. It is needed unconditionally (the JSON-RPC method is registered on every network); the capability advertisement is still gated on IsEip7928Enabled.

In-tree subclasses affected: TaikoEngineRpcModule (updated in this PR). OptimismEngineRpcModule uses composition (IEngineRpcModule) and is unaffected. Downstream forks that subclass EngineRpcModule directly will need to update their constructor call sites.

Types of changes

  • Bugfix (a non-breaking change that fixes an issue)
  • New feature (a non-breaking change that adds functionality)
  • Breaking change (a change that causes existing functionality not to work as expected)
  • Optimization
  • Refactoring
  • Documentation update
  • Build-related changes
  • Other: Description

Testing

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

Notes on testing

Unit tests cover:

  • VALIDATION_ERROR_MAX = 8192 acceptance and truncation
  • UTF-8 truncation correctness (no split multi-byte code points)
  • EncodeNewPayloadWithWitnessResponse for all non-VALID statuses (witness field is Union None, selector byte 0x00)
  • EncodeNewPayloadWithWitnessResponse for VALID with witness (selector byte 0x01 + encoded body) and VALID with null witness (Union None)
  • Middleware: 200 for VALID and SYNCING statuses, 400 for malformed JSON, 405 for non-POST, 400 + correct error code for unsupported fork
  • Error responses use application/json with code and message fields
  • Container layout: byte-by-byte verification that the fixed header is 13 bytes and all three offsets are correct

JSON-RPC method (EngineModuleTests.Amsterdam):

  • VALID status → executionWitness populated
  • VALID with null parent header → executionWitness null, status still VALID
  • VALID with collector exception → executionWitness null, status still VALID
  • SYNCINGexecutionWitness null, witness factory not invoked
  • INVALID → status, LatestValidHash, and ValidationError round-trip correctly, witness factory not invoked
  • engine_newPayloadV5 RPC failure → error code and message propagated correctly

@LukaszRozmej
Copy link
Copy Markdown
Member

Deep review against the spec (execution-apis#773 / src/engine/rest.md) and the surrounding code. CI Build/Lint/Format pass; the two Hoodi sync-gate failures look unrelated.

High-priority

1. Wire format does not match the spec's Union type

Spec defines:

  • latest_valid_hash: Union[None, ByteVector[32]]
  • validation_error: Union[None, List[uint8, VALIDATION_ERROR_MAX]]
  • witness: Union[None, ExecutionWitnessV1]

NewPayloadWithWitnessResponseV1Wire encodes all three as [SszList(1)] T[]? (a max-1 list):

public byte Status { get; set; }
[SszList(1)] public Hash256[]? LatestValidHash { get; set; }
[SszList(8192)] public byte[]? ValidationError { get; set; }
[SszList(1)] public ExecutionWitnessV1Wire[]? Witness { get; set; }

SSZ Union[None, T] is selector_byte ++ variant (1 byte for None, 0x01 ++ T for T). List[T, 1] with 0/1 elements is 0 bytes / T bytes. These differ on the wire by the selector byte. The codebase already has [SszCompatibleUnion] in Nethermind.Serialization.Ssz / the generator — this PR doesn't use it.

This is the existing convention for PayloadStatusWire / NullableBlobAndProofV2Wire, so the PR is internally consistent — but #773 is brand-new so there's no installed-base reason to perpetuate the deviation. Please verify interop against another implementation of #773 (e.g. ethrex) before merging; a CL written against the spec text will reject this body.

2. Block is executed twice on every VALID payload

engine_newPayloadV5 runs BlockProcessor.ProcessOne once. TryGenerateWitness then calls WitnessCollector.GetWitnessForExistingBlock(parent, block) which does another blockProcessor.ProcessOne(block, ProcessingOptions.ReadOnlyChain, …) (WitnessCollector.cs:22-27). That's a full re-execution — exactly the cost #773 was created to eliminate. Its benchmark target is witness collection as part of the primary execution (~700ms total); a double-execution approach blows past the 8s engine_newPayload budget on heavy blocks. Correctness is fine; the perf claim of the parent spec is undelivered. Either call this out as a follow-up perf issue or wire witness collection into the primary processing path.

3. Witness-generation failure after VALID is unrecoverable for the CL

if (status.Status == PayloadStatus.Valid)
{
    witness = TryGenerateWitness(request.ExecutionPayload);
    if (witness is null)
    {
        await WriteErrorAsync(ctx, 500, "Payload executed with VALID status but … the block has been accepted.",);
        return;
    }
}

The block is already in the chain when we get here. If witness generation fails (parent missing, exception swallowed below), the CL sees HTTP 500 and can't distinguish "block accepted, witness failed" from "block rejected" — it may retry. Given the spec defines witness as Union[None, ExecutionWitnessV1] and explicitly allows None "when … no witness was produced", returning 200 + status=VALID + witness=None is more faithful than 500 — even though the spec prose then says VALID "MUST include the execution witness", the Union arm exists precisely for this case. At minimum, log the underlying cause.

4. The "happy-path" middleware test does not exercise the happy path

NewPayloadWithWitness_returns_200_for_valid_status:

await _middleware.InvokeAsync(ctx);ctx.Response.StatusCode.Should().NotBe(StatusCodes.Status404NotFound, "…");

_witnessEnvFactory is Substitute.For<…>() with no return configured, so CreateScope() returns null → NRE → caught in TryGenerateWitness → 500. The only assertion is "not 404", so the test passes with 500. The test name claims 200 for VALID but never asserts Be(200).

Configure the factory mock so CreateScope().Env.CreateExistingBlockWitnessCollector().GetWitnessForExistingBlock(...) returns a (stub) Witness, then assert Be(200), ContentType.Should().Contain(OctetStream), and that the body decodes via NewPayloadWithWitnessResponseV1Wire.Decode with Status == 0 and non-empty Witness. This is the single most important missing test in the PR.

5. TryGenerateWitness swallows all exceptions silently

Bare catch { return null; } with no logging — any internal bug becomes a generic 500 with no breadcrumb. Inject ILogManager, log via _logger.Error("…", ex), and prefer catching specific exception types.

Medium-priority

6. WriteErrorAsync change is a quiet wire change for all SSZ-REST error responses

SszEndpointHandlerBase.WriteErrorAsync now writes application/json with {"code":…,"message":…} for every error response across the middleware (auth failures on /engine/v1/payloads, 404s, 413s, malformed-SSZ 400s, 500s). Previously: text/plain with raw message. Tooling parsing text bodies on those endpoints will break. The new shape is correct per spec — but please call it out in the PR description as a behavior change for all SSZ-REST callers, not just witness. Minor: the code value isn't escaped via JsonSerializer; safe today because it's int, but brittle.

7. Wrong Content-Type POSTs to the witness path fall through to the next middleware

if (path.Equals(WitnessPath,))
{
    if (!POST) return true;
    string? ct = ctx.Request.ContentType;
    return ct is not null && ct.Contains("application/json",);
}

A POST to /new-payload-with-witness with Content-Type: text/plain (or missing) makes IsSszRequest return false, the request escapes to _next, and the user gets whatever comes next — probably a 404. Spec mandates application/json request and JSON-shaped error body. Natural response is 415 or 400, not a fall-through. Symmetrical with the 405 rationale in DispatchWitnessAsync.

8. Capability-name constant is filed under "paths"

SszRestPaths.RestEngineNewPayloadWithWitness = "rest_engine_newPayloadWithWitness" lives alongside path constants like PostV4Forkchoice = "POST /engine/v4/forkchoice". The capability advertisement string isn't a path. Either rename the file or hold the capability string in EngineRpcCapabilitiesProvider directly. Also NewPayloadWithWitness = "new-payload-with-witness" is a handler Resource identifier (no leading slash) while the URL is /new-payload-with-witness — workable but worth a comment.

9. PayloadStatusWire ValidationError cap bumped 1024 → 8192

This is a spec-correctness fix for all PayloadStatusV1 SSZ responses, not just the witness endpoint. Worth calling out separately in the PR description because (a) anyone relying on the old 1024 cap (test fixtures with golden bytes, downstream tooling) sees a change, and (b) it changes merkleization parameters. The new TruncateUtf8 is correct (verified by tracing the 8190×'a' + € case).

10. Witness.Keys is silently dropped

Nethermind's Witness has four collections (State, Codes, Keys, Headers); spec ExecutionWitnessV1 has three. BuildExecutionWitnessV1Wire correctly omits Keys per spec — fine — but Keys is still built and held in Witness.Keys until disposal. Consider a collector mode that skips Keys accumulation when only spec-shaped output is needed.

Low-priority / nits

  • Allocation footprint: BuildExecutionWitnessV1Wire allocates new SszWitnessItem[items.Count] and WriteSszNewPayloadWithWitnessAsync uses a fresh new ArrayBufferWriter<byte>(). For 300 MB witnesses (spec's benchmark scale) the LOH pressure is real. Existing endpoints use the same pattern, so not a regression — follow-up.
  • Dead 204 branch: minimum response is 13 bytes (status + 3 offsets); length == 0 cannot happen. Harmless, remove for clarity.
  • Auth 401 reports code: -32603 InternalError: spec just says "same code as JSON-RPC for Engine API errors" — there's no canonical JSON-RPC code for auth failure, but InternalError is misleading.
  • Resource registered into _postRoutes even though dispatched via the special _witnessHandler fast path: worth a unit test that /engine/v1/new-payload-with-witness (versioned variant) returns 404, not a stealth dispatch.
  • No symmetric test for the non-UnsupportedFork → 500 branch of the error switch (lines 57-66).

What's good

  • SSZ types cleanly mirror the spec's structure modulo the Union issue.
  • TruncateUtf8 is correct, properly tested for multi-byte truncation.
  • 405 handling for non-POST is thoughtful and well-commented.
  • EngineApiJsonContext correctly extended for ExecutionPayloadV4.
  • InvalidBlockHash = 4 mapping added in both PayloadStatus and EngineStatusToSsz (forward-compatible).
  • Capability registration in the right place; SszRestPathsAreAdvertised updated.
  • 8192 cap is a quiet spec fix bundled in correctly.

Recommendation

Before approval:

Acceptable as follow-ups: #2 (single-execution witness), #7 (wrong-CT handling), #10/#11 (allocations / key collection).

@github-actions github-actions Bot added the taiko related to the taiko alethia rollup label May 15, 2026
@LukaszRozmej
Copy link
Copy Markdown
Member

Re-review (commits c4028c5334 + 1e31a1f4f1)

Nice work on the responses. The big one (#1) is properly fixed; rounding out the rest.

Resolved

  • Feature/jsonrpc #1 Wire formatNewPayloadWithWitnessResponseV1Wire is gone; EncodeNewPayloadWithWitnessResponse now hand-emits proper SSZ Union[None, T] with explicit selector bytes and offset math. The new DecodeNewPayloadWithWitnessResponse and the byte-by-byte container test (…_container_header_is_13_bytes_and_offsets_are_correct) lock the layout down. The comment block in SszWireTypes.cs explaining why this can't go through the generator (selector 0x00 unsupported by [SszCompatibleUnion]) is exactly what a future reader needs. One nit: SSZ offsets are technically uint32; you're using WriteInt32LittleEndian/ReadInt32LittleEndian. Identical bytes for values < 2³¹, which they always will be under the 16 MiB body cap — fine, but a one-line comment noting the intentional choice would help.
  • Netcore #3 / Patricia alt 2 #5 — VALID-with-no-witness now returns 200 + witness=None per the Union's None arm, with a logged error including the block hash. Specific exception catches with messages. NewPayloadWithWitness_valid_status_but_witness_generation_fails_returns_200_with_null_witness covers the new behavior.
  • Hive #4 — The happy-path test now actually exercises the happy path: stubs the factory chain, asserts 200 + application/octet-stream, decodes the SSZ body, checks status byte, latest_valid_hash round-trip, and witness presence. 👍
  • Rlpstreams #7 — Wrong Content-Type → 415 with Accept: application/json header. Belt-and-braces (both in middleware DispatchWitnessAsync and inside the handler), which is fine.
  • Vmspans #8SszRestCapabilities separated from SszRestPaths. Clean.
  • EIP-145 #13 (auth code)401 now reports code: InvalidRequest instead of InternalError. Better.
  • EIP-210 #14 (versioned path)BuildRoutes now skips the witness handler, so _postRoutes/_getRoutes can't accidentally dispatch it. NewPayloadWithWitness_via_versioned_engine_path_returns_404 locks this. Also explicit test for the non-UnsupportedFork → 500 branch.
  • 204 dead branch — removed.

Still open from previous review

  • Feature/networking #2 (double execution) — unchanged. The REST handler and the new JSON-RPC method both call engine_newPayloadV5 then re-execute via WitnessCollector.GetWitnessForExistingBlock. Fine as a correctness-first impl, but the parent spec (NDM consumer service refactoring & testing #773) was created specifically to eliminate this. Please open a follow-up issue so it isn't lost.
  • Perf0602 #6 / Lazynodes #9 — PR description still doesn't call out (a) the SSZ-REST-wide text/plain → application/json error body change and (b) the PayloadStatusWire.ValidationError cap bump 1024 → 8192. Both touch consumers outside the witness endpoint; worth flagging in the description so reviewers don't have to find them in the diff.
  • committed on a wrong branch #10 (Witness.Keys) — still built and immediately dropped. Minor; follow-up.

New issues introduced by the JSON-RPC additions in 1e31a1f4f1

1. TryGenerateWitnessForBlock in EngineRpcModule.Amsterdam.cs still has the bare catch that #5 fixed in the SSZ handler.

try {return collector.GetWitnessForExistingBlock(parent, block); }
catch { return null; }

Same anti-pattern: any failure is silently converted to witness=null with no breadcrumb. Mirror the SSZ handler's split into OperationCanceledException (warn) and Exception (error with ex argument), and log the block hash for the null-decode/null-parent early returns too.

2. The "VALID but no witness" log on the JSON-RPC path lacks the block hash.

if (witness is null && _logger.IsWarn)
    _logger.Warn("engine_newPayloadWithWitness: payload is VALID but execution witness could not be generated.");

The SSZ handler includes request.ExecutionPayload.BlockHash in its log. Match it here — without the hash this line is useless for postmortem.

3. No test for the new JSON-RPC method.
Grep shows engine_newPayloadWithWitness referenced only in SszRestPaths.cs and the capabilities test in EngineModuleTests.V1.cs (capability advertisement). Nothing actually invokes the JSON-RPC method end-to-end. Add at minimum:

  • VALID → response has executionWitness populated.
  • VALID with witness failure → response has executionWitness omitted (WhenWritingNull) or null, status is still VALID.
  • Non-VALID statuses (SYNCING, INVALID) → executionWitness omitted, status correct.
  • Error result from engine_newPayloadV5 (e.g. unsupported fork) → propagates code/message correctly.

4. EngineRpcModule constructor gained two required parameters (IBlockTree, IWitnessGeneratingBlockProcessingEnvFactory).
Breaking constructor signature. I confirmed:

  • TaikoEngineRpcModule — updated in this PR ✓
  • OptimismEngineRpcModule — uses composition (IEngineRpcModule engineRpcModule), unaffected ✓
  • No other new EngineRpcModule(...) sites in the codebase (only EngineModuleTests.V3.cs, updated) ✓

So nothing actually breaks, but worth a note in the PR description for downstream forks. Also: IWitnessGeneratingBlockProcessingEnvFactory is now required for all networks (pre-Amsterdam too). Confirm the DI registration is unconditional, otherwise pre-Amsterdam test fixtures may fail to resolve EngineRpcModule.

5. NewPayloadWithWitnessV1Result — double blank line between LatestValidHash and ValidationError. Cosmetic.

6. IEnumerable<ISszEndpointHandler> handlers is iterated twice in the constructor (once in BuildRoutes, once to find _witnessHandler). In ASP.NET Core DI this is backed by a List so re-iteration is safe in practice, but a defensive IReadOnlyList<…> hs = handlers as IReadOnlyList<…> ?? [.. handlers] removes the implicit assumption. Or merge the two passes into one.

Recommendation

Address #1#3 above (logging parity on the JSON-RPC path + tests for the new method) before merge. The rest of the new findings are nits / PR-description hygiene. The original concerns I raised in the first review are properly resolved.

@LukaszRozmej
Copy link
Copy Markdown
Member

LukaszRozmej commented May 17, 2026

Round 3 — commits 6671f99979 + 1d382f43a4

All blocking concerns from the prior rounds are addressed. CI is fully green (541 success, 0 failures). Just a few residuals before LGTM.

Resolved this round

  • JSON-RPC catch loggingTryGenerateWitnessForBlock now mirrors the SSZ handler: OperationCanceledException → warn, Exception → error with the exception captured, plus block-hash breadcrumbs on the null-decode/null-parent early returns.
  • JSON-RPC log message — the "VALID but no witness" log now includes executionPayload.BlockHash and the full spec rationale.
  • engine_newPayloadWithWitness JSON-RPC test coverageEngineModuleTests.Amsterdam.cs adds six tests covering: VALID with witness, VALID with null-parent → null witness, VALID with thrown exception → null witness, SYNCING (no factory invocation), INVALID (status + LVH + ValidationError round-trip, no factory invocation), and UnsupportedFork error propagation. Coverage is exactly what I asked for; the DidNotReceive().CreateScope() assertions on non-VALID paths are a nice touch.
  • Constructor double iteration — handlers materialized to IReadOnlyList<…> once; BuildRoutes signature tightened to match.
  • NewPayloadWithWitnessV1Result double blank line removed.
  • Taiko test fixtures — both CertainBatchLookupTests and TxPoolContentListsTests updated for the new constructor signature.

Still worth fixing before merge

1. PR description is stale on several points. It still describes the original implementation and doesn't reflect the changes that landed:

  • Mentions NewPayloadWithWitnessResponseV1Wire — that struct was deleted; encoding/decoding is hand-rolled in SszCodec against the spec's Union[None, T] layout.
  • Says "witness field is set to [] for any non-VALID status" — it's now Union None (selector byte 0x00), not an empty list.
  • Doesn't mention the new engine_newPayloadWithWitness JSON-RPC method (separate from the REST endpoint).
  • Doesn't mention that SszEndpointHandlerBase.WriteErrorAsync was changed to emit application/json with {code, message} for all SSZ-REST error responses, not just the witness endpoint. This is consumer-visible behavior change for /engine/v*/... callers.
  • Doesn't mention PayloadStatusWire.ValidationError SSZ list cap bumped 1024 → 8192 (spec correctness; affects all PayloadStatusV1 responses, not just witness).
  • Doesn't mention that EngineRpcModule's constructor now requires IBlockTree and IWitnessGeneratingBlockProcessingEnvFactory. Confirmed in-tree: only Taiko subclasses, all updated. Worth a note for downstream forks.

These don't block merge but the changelog/release notes will be wrong without an update.

2. Test fixture duplication in EngineModuleTests.Amsterdam.cs. StubbedEngineRpcModule and FailingNewPayloadEngineRpcModule differ only in their override of engine_newPayloadV5 — yet each repeats the same ~20-argument constructor base call. Per the project's own AGENTS.md test guidelines:

"When parts of tests are similar (shared setup, common assertions, recurring scenarios), factor those parts into helper methods or helper types …"

Suggest a single helper that takes a Func<ExecutionPayloadV4, byte[]?[], Hash256?, byte[][]?, Task<ResultWrapper<PayloadStatusV1>>> and use it for both success and failure cases. The witness factory/scope/env/collector stubbing chain is also copy-pasted across tests — a small WitnessFactoryBuilder would let each test read in 3 lines instead of 15. Strictly a refactor for maintainability; the coverage is correct.

3. engine_newPayloadV5 made virtual purely for test overriding. This widens a production type's contract to support test subclassing. Acceptable, but if the V5 handler were injectable as a delegate or strategy on the module, you'd avoid the virtual modifier and the giant constructor-arg duplication in the test stubs. Follow-up.

Carried over from round 1 (unchanged)

Nothing here blocks merge once the PR description is refreshed. Good iteration.

@LukaszRozmej
Copy link
Copy Markdown
Member

LukaszRozmej commented May 17, 2026

Don't make things virtual just to override them in test, use decorator pattern instead, tests shouldn't shape production code API's

@Dyslex7c
Copy link
Copy Markdown
Contributor Author

addressed all the reviews, opened a follow-up issue for double execution and updated PR description, @LukaszRozmej could you please re-review?

@LukaszRozmej
Copy link
Copy Markdown
Member

LukaszRozmej commented May 18, 2026

Isn't avoiding double-execution sole purpose of this endpoint? Why not do that here? Why create issues?

Copy link
Copy Markdown
Member

@LukaszRozmej LukaszRozmej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

Implements POST /new-payload-with-witness per execution-apis#773, plus a parallel engine_newPayloadWithWitness JSON-RPC method. Core design correctly decouples the V5 execution step via a delegate and hand-rolls the Union[None, T] SSZ encoding the spec requires.

Spec compliance — ✅ matches the spec

Verified against src/engine/rest.md from execution-apis#773:

  • NewPayloadWithWitnessResponseV1 container: 1 byte status + 3 × 4-byte offsets for the three Union fields → 13-byte fixed header. Implementation matches (FixedHeaderBytes = 1 + 4 + 4 + 4).
  • Status byte mapping: VALID=0, INVALID=1, SYNCING=2, ACCEPTED=3, INVALID_BLOCK_HASH=4 — all present in EngineStatusToSsz, and PayloadStatus.InvalidBlockHash was added correctly.
  • VALIDATION_ERROR_MAX = 8192: PayloadStatusWire.ValidationError cap bumped, and EncodeNewPayloadWithWitnessResponse uses the same constant.
  • UTF-8 truncation: TruncateUtf8 correctly walks backward over continuation bytes (0x80..0xBF) and drops the partial codepoint. Covered by EncodePayloadStatus_truncation_does_not_split_multibyte_utf8_codepoint.
  • Union[None, T] hand-rolled: 0x00 for None, 0x01 + variant body for Some. The [SszCompatibleUnion] generator's inability to encode the 0x00 selector is the right reason to hand-roll, and the comment in SszWireTypes.cs documents it well.
  • Witness emptiness rule: hasWitness = witness is not null && ps.Status == PayloadStatus.Valid — covers both spec conditions ("None when status is not VALID or no witness was produced").
  • ExecutionWitnessV1 wire: state/codes/headers (3 fields). Correctly drops Nethermind's internal Witness.Keys from the wire — spec defines only 3 fields.
  • HTTP semantics: 200+octet-stream on VALID/INVALID/SYNCING, 400 for malformed JSON / UnsupportedFork (-38005), 401 auth failure, 405 non-POST, 500 for other engine errors, 415 for wrong content type. All errors use application/json.
  • Capability rest_engine_newPayloadWithWitness registered in EngineRpcCapabilitiesProvider.

Spec interpretation note: spec text in §Specification point 3 says "the witness field MUST be empty ([])" for non-VALID, which is awkward given the field's type is Union[None, ExecutionWitnessV1]. The PR treats this as the None variant, which is the only sensible reading consistent with the structure definition higher in the same spec.

Issues

1. Dead code: IBlockTree constructor param on EngineRpcModule

public partial class EngineRpcModule(... IBlockTree blockTree, ILogManager logManager)
{
    protected readonly IBlockTree _blockTree = blockTree ?? throw new ArgumentNullException(nameof(blockTree));

_blockTree is declared protected readonly but never referenced anywhere in EngineRpcModule.cs or any partial (.Paris/Cancun/Prague/Osaka/Amsterdam). The witness handler already takes its own IBlockTree directly via the DI factory in MergePlugin.cs. Removing this would:

  • Drop one unused public constructor parameter (still a breaking change for downstream forks subclassing EngineRpcModule, but a smaller one).
  • Revert the noise added to TaikoEngineRpcModule.cs, CertainBatchLookupTests.cs, TxPoolContentListsTests.cs, and the manual-construction test in EngineModuleTests.V3.cs.

The PR description claims EngineRpcModule gains two params (IBlockTree and IWitnessGeneratingBlockProcessingEnvFactory), but only IBlockTree is added — and it isn't used. Either wire the field in for some forthcoming purpose or remove it.

2. Witness-generation logic duplicated across the two handlers

NewPayloadWithWitnessSszHandler.TryGenerateWitness and NewPayloadWithWitnessHandler.TryGenerateWitnessForBlock are byte-for-byte the same:

  1. TryGetBlock() + log on failure
  2. BlockTree.FindHeader(parentHash, DoNotCreateLevelIfMissing) + log on null
  3. using scope; collector.GetWitnessForExistingBlock(parent, block) with identical OperationCanceledException / Exception handling and identical log wording (the only diff is a prefix string).

Cleanest fix: have NewPayloadWithWitnessSszHandler call _engineModule.engine_newPayloadWithWitness(...) instead of engine_newPayloadV5(...), then SSZ-encode the resulting NewPayloadWithWitnessV1Result. That collapses both handlers' witness paths into one, and removes the SSZ handler's dependencies on IBlockTree and IWitnessGeneratingBlockProcessingEnvFactory — simplifying SszMiddlewareConfigurer (two Bridge<> calls go away) and SszMiddlewareTests setup.

3. Duplicate / inconsistent log severities for witness-null on VALID

In NewPayloadWithWitnessHandler.HandleAsync (and mirrored in the SSZ handler): when witness generation returns null, TryGenerate* already logs the specific reason at Warn ("could not decode block", "parent header not found") or Error ("witness generation failed: …"). Then the outer code logs again at Error with a generic "could not be generated" message.

Result: a benign re-org (missing parent) emits Warn + Error for the same event, which will trip alerting on noise. Drop the outer log — the inner sites already cover all branches with appropriate severities.

4. Hand-rolled JSON for error bodies (SszEndpointHandlerBase.WriteErrorAsync)

string json = $"{{\"code\":{jsonRpcCode},\"message\":{System.Text.Json.JsonSerializer.Serialize(message)}}}";

JsonSerializer.Serialize(message) correctly escapes the message, but the outer object is string-interpolated. Negligible but inconsistent — JsonSerializer.Serialize(new { code = jsonRpcCode, message }) is one allocation, no manual brace matching, and reads better.

5. WriteErrorAsync content-type change applies to all SSZ-REST endpoints (heads-up)

The PR description acknowledges this but it's worth re-confirming with CL teams: every previously-text/plain error response (auth 401s on /engine/v*/…, 404s for unknown methods, 413s, 400 malformed-SSZ, 500s) now emits application/json. The new REST spec mandates this format for /new-payload-with-witness only; the older /engine/v{N}/… paths aren't governed by this spec.

6. Minor: ownership of Witness? in WriteSszNewPayloadWithWitnessAsync is non-obvious

private static async Task WriteSszNewPayloadWithWitnessAsync(HttpContext ctx, PayloadStatusV1 status, Witness? witness)
{
    using Witness? w = witness;
    ...

Not a bug — the witness is created in TryGenerateWitness and not owned by the outer result's using — but a one-line comment on TryGenerateWitness saying "caller takes ownership" would help the reader.

Tests

  • EngineModuleTests.Amsterdam.cs: solid coverage — VALID with witness, VALID with witness-gen failure (null parent), VALID with collector throw, SYNCING/INVALID skipping witness, engine-error propagation. The WitnessHandlerBuilder is a nice testable seam.
  • SszMiddlewareTests: covers 200/VALID, 200/SYNCING, 200/null-witness-on-VALID, 400 malformed JSON, 405 non-POST, 415 wrong content-type, 404 versioned path, 400 UnsupportedFork, 500 internal error. Good breadth.
  • SszCodecTests: bytewise verification of the 13-byte header and offsets is exactly the right test for hand-rolled encoding — would catch any regression to the offset layout.

Risks

  • Double execution on VALID payloads — well-documented (PR body + TODO + #11636). Eliminates real-time attestation benefits the spec promised, but tracked as follow-up.
  • engine_newPayloadWithWitness JSON-RPC is Nethermind-only — spec defines only the REST endpoint. Adding the JSON-RPC method is fine (matches the spec author's earlier prototype phase) but is not portable across other clients.
  • PayloadStatusWire ValidationError cap 1024→8192 affects all PayloadStatusV1 SSZ responses (not just witness). Spec-correct, but changes merkleization of PayloadStatusV1 if any tooling has golden-byte fixtures.

Summary

Spec compliance is correct and the SSZ encoding has the right tests. Main asks: drop the unused IBlockTree from EngineRpcModule, collapse the duplicate witness-generation logic by having the SSZ handler delegate to engine_newPayloadWithWitness, and fix the duplicate Warn+Error logging when witness generation legitimately fails.

@Dyslex7c
Copy link
Copy Markdown
Contributor Author

Isn't avoiding double-execution sole purpose of this endpoint? Why not do that here? Why create issues?

yes actually this needs a bigger refactor so in your review it asked to create a follow-up issue. Will implement this here and close the issue

Double execution — still re-executes via WitnessCollector.GetWitnessForExistingBlock after engine_newPayloadV5. The parent spec (ethereum/execution-apis#773) was designed to eliminate this; please open a follow-up issue so the perf goal isn't forgotten.

LukaszRozmej and others added 4 commits May 22, 2026 21:17
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Snapshot and IWorldStateScopeProvider live in Nethermind.Evm.State, already
imported. CI's IDE0005 caught this; full-solution lint reproduction now clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the evm label May 26, 2026
Comment thread src/Nethermind/Ethereum.Blockchain.Pyspec.Zkevm.Test/Tests.cs Outdated
@Dyslex7c Dyslex7c requested a review from rubo as a code owner May 27, 2026 09:25
Dyslex7c and others added 2 commits May 27, 2026 15:03
…tness-ssz

# Conflicts:
#	src/Nethermind/Ethereum.Test.Base/BlockchainTestBase.cs
#	src/Nethermind/Nethermind.Merge.Plugin/Data/EngineApiJsonContext.cs
#	src/Nethermind/Nethermind.Merge.Plugin/EngineRpcModule.Amsterdam.cs
#	src/Nethermind/Nethermind.Merge.Plugin/SszRest/SszMiddleware.cs
@hudem1
Copy link
Copy Markdown
Contributor

hudem1 commented May 27, 2026

I've got a few first questions:

  1. I expected we could reuse the reuse the env-factory pattern (like in WitnessGeneratingBlockProcessingEnvFactory) and adapt it / build an extension of it, instead of the proxy/rendezvous machinery. Did you maybe look into that ? I was wondering because this PR's solution introduces several new witness capturing files that seem a bit redundant with existing recording mechanisms, but i guess your proxy solution might still be necessary given that we wanna use the main processing pipeline/context with process-lifetime singletons and not a spawned env.

  2. This PR also brings in the zkEVM / EEST test integration (new EEST fixtures, WitnessBlockchainTestBase, pyspec wiring, known-failing lists). That's independent of the endpoint, so, i was wondering if we could divide this PR in 2 ideally, one for the endpoint and one for adding the tests ? Also because i've been working on the witness comparison for EELS tests and i've used some different approaches for fixing missing/excess preimages recorded in witness that seem to require less changes. If dividing this PR in 2 is too difficult, i can also push my fixes to a draft PR and we can somehow merge it into yours and keep the best of both PRs.

  3. Why doesn't WitnessGeneratingWorldState use TouchedNodesRlp from the witness capturing trie store ?

@LukaszRozmej
Copy link
Copy Markdown
Member

  1. I expected we could reuse the reuse the env-factory pattern (like in WitnessGeneratingBlockProcessingEnvFactory) and adapt it / build an extension of it, instead of the proxy/rendezvous machinery. Did you maybe look into that ? I was wondering because this PR's solution introduces several new witness capturing files that seem a bit redundant with existing recording mechanisms, but i guess your proxy solution might still be necessary given that we wanna use the main processing pipeline/context with process-lifetime singletons and not a spawned env.

I'm all for reusing, but we want single processing. Can you propose/push required changes?

  1. This PR also brings in the zkEVM / EEST test integration (new EEST fixtures, WitnessBlockchainTestBase, pyspec wiring, known-failing lists). That's independent of the endpoint, so, i was wondering if we could divide this PR in 2 ideally, one for the endpoint and one for adding the tests ? Also because i've been working on the witness comparison for EELS tests and i've used some different approaches for fixing missing/excess preimages recorded in witness that seem to require less changes. If dividing this PR in 2 is too difficult, i can also push my fixes to a draft PR and we can somehow merge it into yours and keep the best of both PRs.

Let's merge PR's.

LukaszRozmej and others added 3 commits May 27, 2026 16:16
@Dyslex7c
Copy link
Copy Markdown
Contributor Author

I've got a few first questions:

  1. I expected we could reuse the reuse the env-factory pattern (like in WitnessGeneratingBlockProcessingEnvFactory) and adapt it / build an extension of it, instead of the proxy/rendezvous machinery. Did you maybe look into that ? I was wondering because this PR's solution introduces several new witness capturing files that seem a bit redundant with existing recording mechanisms, but i guess your proxy solution might still be necessary given that we wanna use the main processing pipeline/context with process-lifetime singletons and not a spawned env.

Haven't looked into that, will do surely. I think it is necessary for single block execution.

  1. This PR also brings in the zkEVM / EEST test integration (new EEST fixtures, WitnessBlockchainTestBase, pyspec wiring, known-failing lists). That's independent of the endpoint, so, i was wondering if we could divide this PR in 2 ideally, one for the endpoint and one for adding the tests ? Also because i've been working on the witness comparison for EELS tests and i've used some different approaches for fixing missing/excess preimages recorded in witness that seem to require less changes. If dividing this PR in 2 is too difficult, i can also push my fixes to a draft PR and we can somehow merge it into yours and keep the best of both PRs.

Let's merge both PRs then

  1. Why doesn't WitnessGeneratingWorldState use TouchedNodesRlp from the witness capturing trie store ?

It is used during re-execution but not for our endpoint new-payload-with-witness where we are doing a single block execution (including witness capturing together). TouchedNodesRlp slows down our process so we are using a lightweight decorator instead

@hudem1
Copy link
Copy Markdown
Contributor

hudem1 commented May 28, 2026

I've got a few first questions:

  1. I expected we could reuse the reuse the env-factory pattern (like in WitnessGeneratingBlockProcessingEnvFactory) and adapt it / build an extension of it, instead of the proxy/rendezvous machinery. Did you maybe look into that ? I was wondering because this PR's solution introduces several new witness capturing files that seem a bit redundant with existing recording mechanisms, but i guess your proxy solution might still be necessary given that we wanna use the main processing pipeline/context with process-lifetime singletons and not a spawned env.

Haven't looked into that, will do surely. I think it is necessary for single block execution.

My initial idea was to do something similar to WitnessGeneratingBlockProcessingEnvFactory where we could have another function CreateScope or potential a separate class like WitnessGeneratingBlockProducingEnvFactory, in which we could register:

  1. the foundational witness-specific components:
  • WitnessGeneratingWorldState
  • WitnessCapturingCodeDb (from my draft PR for example)
  • WitnessCapturingTrieStore
  • WitnessGeneratingHeaderFinder
  1. and then re-register the components using those "basic" components such as:
  • StateReader taking a codeDb in its constructor
  • CodeInfoRepository taking worldState in its constructor
  • etc
    But if those are already declared as AddScoped where they are registered in DI, not necessarily needed to redeclare them all, as a new scoped instance should be redeclared and use our "foundational" registered components.
  1. and forcing some components that don't use caching:
  • CodeInfoRepository instead of CacheCodeInfoRepository (by the way, i don't think we need a WitnessCapturingCodeInfoRepository)
  • etc

I think that would:

  • mimic the existing witness-specific implementation WitnessGeneratingBlockProcessingEnvFactory
  • concentrate all witness-specific components registration inside that new WitnessGeneratingBlockProducingEnvFactory, ie in 1 single place, avoid as much as possible modifying core components and putting witness-registration logic here and there
  • and here specifically, i think we could remove WitnessCapturingCodeInfoRepository (CodeInfoRepository should be enough), WitnessCapturingWorldStateProxy (as now world state witness-generating or not will be decided from DI), WitnessCapturingMainProcessingModule, and WitnessCapturingBlockProcessor. The WitnessCapturingBlockProcessor logic would go into a file similar to WitnessCollector where we would call the blockProcessor.ProcessOne and gather the witness there with rendezvous logic etc.

So, as an overview, when new_payload_with_witness is called, we would spawn a scope using our witness-specific components and execution would happen there. That does not mean the scope would be readonly like current WitnessGeneratingBlockProcessingEnvFactory (because there i just explicitly registered read only trie store, readOnlyDbProvider.CodeDb etc). It would just be a scope where witness-specific components are registered and execution happens still modifying main chain. But that being said, such scope would need to be spawned for every call to new_payload_with_witness, so, not sure whether that endpoint will get called a lot and whether that impacts performances also.

Also i am not fully familiar with the constraints associated to engine API and main chain processing. I mean, main chain registers for example BlockProcessor in BlockProcessingModule as scoped. So, would having another BlockProcessor instance executing the new_payload_with_witness endpoint cause some issue for the next engine API calls ? Like, will main chain processing effectively wait for the new_payload_with_witness 's block processor finish processing the block and add it to the main chain ? I guess so, but i'm not sure. This kind of constraints i mean.

@LukaszRozmej @Dyslex7c Let me know whether my idea makes sense and is preferrable or not, and if so, i can implement it. It can also be as a refactor if we wanna merge this PR quickly !


  1. This PR also brings in the zkEVM / EEST test integration (new EEST fixtures, WitnessBlockchainTestBase, pyspec wiring, known-failing lists). That's independent of the endpoint, so, i was wondering if we could divide this PR in 2 ideally, one for the endpoint and one for adding the tests ? Also because i've been working on the witness comparison for EELS tests and i've used some different approaches for fixing missing/excess preimages recorded in witness that seem to require less changes. If dividing this PR in 2 is too difficult, i can also push my fixes to a draft PR and we can somehow merge it into yours and keep the best of both PRs.

Let's merge both PRs then

I opened my PR as draft. This PR fixes most of the tests in witness comparison for in-process block processing (i mean, the non-engine API path, but from what i saw in fixtures, tests are almost all equal for both paths), some of my tests are still producing a few excess state nodes preimages for some tests (looking into that).
So, in this draft PR, I created a WitnessCapturingCodeDb, i thought it was conceptually cleaner and similar to WitnessCapturingTrieStore and WitnessGeneratingHeaderFinder (used for recording trie nodes and headers). Also having a wrapper over the code Db helped fixed some excess bytecodes in witness issues (such as test scenario test_witness_codes_reverted_create_same_hash_then_read) but i had to slightly modify StateProvider to have a block-level cache for deployed bytecodes where bytecodes get removed during restore (tx revert), like EELS python client. Because the current existing block-level cache for bytecodes does not remove bytecodes on restore, and that might create missing/excess bytecode for test scenarios such as test_witness_codes_reverted_create_same_hash_then_read. So, i thought this solution was conceptually cleaner and made sense also to remove from the block-level bytecode cache on tx revert, and that avoids any complex-logic inside WitnessGeneratingWorldState to keep track of bytecodes deployed then reverted then read etc. But that touches a bit StateProvider, so, maybe doing that logic fully inside WitnessGeneratingWorldState is preferrable even if a bit not clean i feel like conceptually and maintenance-wise, but thats my personal preference.


  1. Why doesn't WitnessGeneratingWorldState use TouchedNodesRlp from the witness capturing trie store ?

It is used during re-execution but not for our endpoint new-payload-with-witness where we are doing a single block execution (including witness capturing together). TouchedNodesRlp slows down our process so we are using a lightweight decorator instead

Okay thats strange because this was useful during state root recomputation after execution where some trie nodes could get deleted and the branch node would need to read some sibling node to be able collapse etc. And those cases could not get caught at the world state level through the WitnessGeneratingWorldState alone and thats why i added this trie store wrapper originally. Didn't you get any issue with that if you are not using it for the new_payload_with_witness path ?

Comment thread src/Nethermind/Nethermind.State/TracedAccessWorldState.cs Outdated
@hudem1
Copy link
Copy Markdown
Contributor

hudem1 commented May 28, 2026

Also another question: there are some tests especially useful for stateless validation which have a field executionWitnessMutated such as test_validation_codes_missing_current_frame_code among others. For example in that test, they explicitly removed a bytecode from the witness. So, they expect stateless execution to fail. But for us, not running stateless execution but witness assertion, we cannot really assert witness because our witness should contain that explicit-missing bytecode. But you mentioned only like 5 tests are known failures, so, what about those executionWitnessMutated tests ? Did you ignore them somewhere (didnt find that in the PR) ?

So, personally, locally i ignored those few tests for now, i'll ask in the discord what to do with them for the witness assertion, whether we should just ignore them or smth else

@Dyslex7c
Copy link
Copy Markdown
Contributor Author

Also i am not fully familiar with the constraints associated to engine API and main chain processing. I mean, main chain registers for example BlockProcessor in BlockProcessingModule as scoped. So, would having another BlockProcessor instance executing the new_payload_with_witness endpoint cause some issue for the next engine API calls ? Like, will main chain processing effectively wait for the new_payload_with_witness 's block processor finish processing the block and add it to the main chain ? I guess so, but i'm not sure.

In the spawning env approach, both the env we create and the main block-processing pipeline would be able to write to the same DB simultaneously so we have to avoid concurrent DB writes and any form of out-of-sync problems. (because new_payload exists here which can mutate the state). The env should be in sync with the main chain and we would also have to implement thread-safety and lock serialization. If all these can be taken care of I guess we could implement this refactor

I opened my #11812. This PR fixes most of the tests in witness comparison for in-process block processing (i mean, the non-engine API path, but from what i saw in fixtures, tests are almost all equal for both paths), some of my tests are still producing a few excess state nodes preimages for some tests (looking into that).
So, in this draft PR, I created a WitnessCapturingCodeDb, i thought it was conceptually cleaner and similar to WitnessCapturingTrieStore and WitnessGeneratingHeaderFinder (used for recording trie nodes and headers). Also having a wrapper over the code Db helped fixed some excess bytecodes in witness issues (such as test scenario test_witness_codes_reverted_create_same_hash_then_read) but i had to slightly modify StateProvider to have a block-level cache for deployed bytecodes where bytecodes get removed during restore (tx revert), like EELS python client. Because the current existing block-level cache for bytecodes does not remove bytecodes on restore, and that might create missing/excess bytecode for test scenarios such as test_witness_codes_reverted_create_same_hash_then_read. So, i thought this solution was conceptually cleaner and made sense also to remove from the block-level bytecode cache on tx revert, and that avoids any complex-logic inside WitnessGeneratingWorldState to keep track of bytecodes deployed then reverted then read etc. But that touches a bit StateProvider, so, maybe doing that logic fully inside WitnessGeneratingWorldState is preferrable even if a bit not clean i feel like conceptually and maintenance-wise, but thats my personal preference.

Sounds good to me. you can check my PR for the tests I was also getting similar errors before. could you share the error logs?

Okay thats strange because this was useful during state root recomputation after execution where some trie nodes could get deleted and the branch node would need to read some sibling node to be able collapse etc. And those cases could not get caught at the world state level through the WitnessGeneratingWorldState alone and thats why i added this trie store wrapper originally. Didn't you get any issue with that if you are not using it for the new_payload_with_witness path ?

I'll add this now and check if it interferes with our goal and get back to you

@Dyslex7c
Copy link
Copy Markdown
Contributor Author

Also another question: there are some tests especially useful for stateless validation which have a field executionWitnessMutated such as test_validation_codes_missing_current_frame_code among others. For example in that test, they explicitly removed a bytecode from the witness. So, they expect stateless execution to fail. But for us, not running stateless execution but witness assertion, we cannot really assert witness because our witness should contain that explicit-missing bytecode. But you mentioned only like 5 tests are known failures, so, what about those executionWitnessMutated tests ? Did you ignore them somewhere (didnt find that in the PR) ?

So, personally, locally i ignored those few tests for now, i'll ask in the discord what to do with them for the witness assertion, whether we should just ignore them or smth else

We aren't ignoring those tests. The executionWitnessMutated tests run and pass because of the check I have pasted below. When a payload has executionWitnessMutated=true, it falls back to the plain endpoint (engine_newPayloadV{N}) instead of engine_newPayloadWithWitness, and skips witness comparison.

// Payloads without an executionWitness field fall back to the standard path.
if (enginePayload.ExecutionWitness is null || enginePayload.ExecutionWitnessMutated)
{
    return await base.SendPayloadAsync(rpcService, rpcContext, enginePayload, newPayloadVersion, paramsJson);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

evm new feature rpc state+storage taiko related to the taiko alethia rollup

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants