sanity(payload-builder): atomic cancel + flashblock publish by julio4 · Pull Request #513 · flashbots/op-rbuilder

julio4 · 2026-05-14T07:10:22Z

Wraps per-flashblock publish side effects (ws_pub.publish, built_fb_payload_tx.try_send, built_payload_tx.try_send) and payload cancellation token in a shared publish_lock.
This is preventive against a suspected race between cancel check and publish where a "cancelled" flashblock could still be emitted, inserting an orphan sibling into the engine tree via Events::BuiltPayload.

Wraps `cancel_with` and the per-flashblock publish side effects (`ws_pub.publish`, `built_fb_payload_tx.try_send`, `built_payload_tx.try_send`) in a shared `publish_lock`. Closes the race window between the cancel check and publish where a "cancelled" flashblock could still be emitted, inserting an orphan sibling into the engine tree via `Events::BuiltPayload`.

avalonche · 2026-05-14T15:29:48Z

will this mutex be blocking cancellations (cancel_resolved, cancel_new_fcu, or cancel_deadline)? Why not move the side effects to the outer async loop and make the operations async?
we can introduce retries on error also to improve reliability, but it shouldn't block the main loop

julio4 · 2026-05-15T09:29:31Z

will this mutex be blocking cancellations (cancel_resolved, cancel_new_fcu, or cancel_deadline)?

Yes it will be blocking, so it can delay cancellation; but this is to mark the publish section as a critical section so that if we send to websocket then we must send in the internal channels as well. Publish is already the "last" step of building so this just enforce that during publish the job is uncancellable.

Why not move the side effects to the outer async loop and make the operations async?

I have an idea for this, will open a pr shortly. But moving side effects as async ops don't enforce that publish must be a critical section.

we can introduce retries on error also to improve reliability, but it shouldn't block the main loop

on which error?

avalonche · 2026-05-15T17:53:15Z

if we send to websocket then we must send in the internal channels

Why is this strictly necessary? I think publishing to the websocket and failing to send to the local engine will increase latency (increases cache miss for storage slots) but we'll receive the block either way via newPayload so an error on sending to internal channels is acceptable. They do not need to be atomic, but improving reliability via retries will be good enough.

on which error?

We'll have to inspect whether the error is recoverable or not (e.g. just disconnect etc) but generally are there downsides to retry publishing on any error? rollup-boost should be able to handle it

julio4 requested review from 0x416e746f6e, SozinM and avalonche as code owners May 14, 2026 07:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sanity(payload-builder): atomic cancel + flashblock publish#513

sanity(payload-builder): atomic cancel + flashblock publish#513
julio4 wants to merge 1 commit into
mainfrom
fix/flashblock-publish-cancel-race

julio4 commented May 14, 2026

Uh oh!

avalonche commented May 14, 2026

Uh oh!

julio4 commented May 15, 2026 •

edited

Loading

Uh oh!

avalonche commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

julio4 commented May 14, 2026

Uh oh!

avalonche commented May 14, 2026

Uh oh!

julio4 commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avalonche commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

julio4 commented May 15, 2026 •

edited

Loading