Skip to content

sanity(payload-builder): atomic cancel + flashblock publish#513

Open
julio4 wants to merge 1 commit into
mainfrom
fix/flashblock-publish-cancel-race
Open

sanity(payload-builder): atomic cancel + flashblock publish#513
julio4 wants to merge 1 commit into
mainfrom
fix/flashblock-publish-cancel-race

Conversation

@julio4
Copy link
Copy Markdown
Member

@julio4 julio4 commented May 14, 2026

Wraps per-flashblock publish side effects (ws_pub.publish, built_fb_payload_tx.try_send, built_payload_tx.try_send) and payload cancellation token in a shared publish_lock.
This is preventive against a suspected race between cancel check and publish where a "cancelled" flashblock could still be emitted, inserting an orphan sibling into the engine tree via Events::BuiltPayload.

Wraps `cancel_with` and the per-flashblock publish side effects
(`ws_pub.publish`, `built_fb_payload_tx.try_send`, `built_payload_tx.try_send`)
in a shared `publish_lock`. Closes the race window between the cancel check
and publish where a "cancelled" flashblock could still be emitted, inserting
an orphan sibling into the engine tree via `Events::BuiltPayload`.
@avalonche
Copy link
Copy Markdown
Collaborator

will this mutex be blocking cancellations (cancel_resolved, cancel_new_fcu, or cancel_deadline)? Why not move the side effects to the outer async loop and make the operations async?
we can introduce retries on error also to improve reliability, but it shouldn't block the main loop

@julio4
Copy link
Copy Markdown
Member Author

julio4 commented May 15, 2026

will this mutex be blocking cancellations (cancel_resolved, cancel_new_fcu, or cancel_deadline)?

Yes it will be blocking, so it can delay cancellation; but this is to mark the publish section as a critical section so that if we send to websocket then we must send in the internal channels as well. Publish is already the "last" step of building so this just enforce that during publish the job is uncancellable.

Why not move the side effects to the outer async loop and make the operations async?

I have an idea for this, will open a pr shortly. But moving side effects as async ops don't enforce that publish must be a critical section.

we can introduce retries on error also to improve reliability, but it shouldn't block the main loop

on which error?

@avalonche
Copy link
Copy Markdown
Collaborator

if we send to websocket then we must send in the internal channels

Why is this strictly necessary? I think publishing to the websocket and failing to send to the local engine will increase latency (increases cache miss for storage slots) but we'll receive the block either way via newPayload so an error on sending to internal channels is acceptable. They do not need to be atomic, but improving reliability via retries will be good enough.

on which error?

We'll have to inspect whether the error is recoverable or not (e.g. just disconnect etc) but generally are there downsides to retry publishing on any error? rollup-boost should be able to handle it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants