Dashboard rework, GLOAS beacon-API compatibility playbook & cron scheduling#179
Merged
Conversation
Generic per-client beacon-API probe used as the building block for
cross-client API-compatibility playbooks. The task hits a single HTTP
endpoint (or subscribes to one SSE topic) on every connected consensus
client, classifies each response (pass / partial / fail / skipped), and
emits both per-client results and a 'matrixRow' output collapsed by
client-type for downstream aggregation.
Key behaviours:
- Path placeholders ({slot}, {epoch}, {block_id}, {beacon_block_root},
{builder_index}, {validator_index}) are auto-resolved from chain state;
offsets like {slot+5} / {epoch-1} are supported.
- HTTP classification splits expectStatuses into successStatuses (run
responseSchema) and errorStatuses (run errorSchema = ErrorMessage
shape). Schema-valid 4xx still counts as 'pass' because it proves the
endpoint exists and parses the request.
- SSE mode subscribes to /eth/v1/events?topics=..., waits a configurable
window for at least minEvents matching events, optionally validates
each event payload against eventSchema.
- requireForkActive skips clients whose head hasn't reached the named
fork's activation epoch.
- Per-client probes run concurrently up to the configured concurrency
cap; the task aggregates pass/partial/fail counts and either succeeds,
fails-on-any, or fails-on-all-error depending on config.
Adds github.com/santhosh-tekuri/jsonschema/v5 for inline JSON-Schema
validation of response / event bodies.
New aggregator task that walks the test run, picks up every check_consensus_api task's matrixRow output, and renders a markdown compatibility table. The same markdown is stored both as the task's 'summary' result (so it renders inline at the top of the task pane in the UI) and as a downloadable matrix.md result file, alongside a matrix.json dump for machine consumption. Columns are ordered by client-type (Lighthouse, Teku, Prysm, Grandine, Nimbus, Lodestar, Caplin) with two-letter labels matching the human-readable style; unused client-type columns are dropped by default (toggle with showAllClientTypes). Notes on partial/fail cells are emitted as numbered footnotes with deduplication. Also extends the task UI to render summary files and any .md result file inline using ReactMarkdown + remark-gfm, so the matrix shows up in the UI without manual download.
New playbooks/api-compatibility/ folder for cross-client beacon-API
compatibility checks. The folder is intentionally fork-agnostic so
future forks can drop in a new file alongside.
gloas-api-check.yaml exercises every beacon-API endpoint / SSE topic
introduced or changed by the GLOAS / ePBS spec PRs (#552, #580, #586,
#587, #588, #592) against every connected consensus client:
- 11 HTTP endpoints (publish/produce/retrieve blocks, bids, envelopes,
PTC duties, payload attestation data + pool ops).
- 5 SSE topics (execution_payload_bid, execution_payload_available,
payload_attestation_message, execution_payload,
execution_payload_gossip), subscribed concurrently to share the
same wait window.
Each row uses an inline JSON Schema derived from the spec to validate
either the success body or the documented ErrorMessage body. The final
step calls generate_api_compatibility_matrix to render the markdown
matrix as a result artifact.
Playbook index is regenerated to include the new folder.
…accepts them run_shell's ::set-output stores values as plain strings; downstream tasks that consume the value as uint64 (minSlotNumber, etc.) fail yaml-unmarshal. Switch to ::set-output-json so the integers are stored typed.
End-to-end polish surfaced by the first kurtosis test run:
- matrixCell now has explicit JSON tags so matrix.json uses camelCase
('result' / 'note' / 'httpStatus') instead of the Go-default
capitalised field names.
- Row titles in the rendered table wrap only the HTTP method+path (or
SSE topic) in code-ticks; the optional trailing descriptor (e.g.
'(Gloas)') stays outside the code span, matching the requested
matrix layout.
- /api/v1/test_run/{runId}/details now includes result_files for each
task, so the UI's TaskList -> TaskDetails Overview pane has the
metadata it needs to render the inline markdown summary. Previously
only /api/v1/test_run/{runId} carried the result-file headers and
the UI's per-task fetch path was missing them.
New utility task that runs a JavaScript snippet via Node.js. Modeled
on run_shell with the same stdout-marker protocol (::set-output,
::set-output-json, ::set-var, ::set-json) and the same
\$ASSERTOOR_SUMMARY / \$ASSERTOOR_RESULT_DIR semantics, but built for
shapes that are awkward in shell/jq — collecting and rendering tabular
data from sibling task outputs, generating JSON configs from chain
state, etc.
Highlights:
- envVars are JSON-decoded by a preamble and exposed as a single 'env'
object so user code does not have to JSON.parse(process.env.X)
manually.
- Helpers: setOutput[JSON], setVar[JSON], writeResultFile,
writeSummary.
- The user script is wrapped in '(async () => { ... })()' so top-level
'await' works.
- The wrapped source is written to a file before invoking node, so
stack traces refer to the user's line numbers.
Dockerfile / Dockerfile-local install Node.js 20 from NodeSource so
the task works in the published image.
…ript
The matrix-rendering aggregator is no longer a Go task — the playbook
now produces the markdown directly with a run_javascript step that
walks every sibling task's matrixRow output. ~600 lines of Go +
README disappear in favor of ~80 lines of self-contained JavaScript
inside the playbook.
In passing:
- Rename check_consensus_api outputs/config 'checkId'/'checkTitle' to
generic 'rowId'/'rowTitle'. The matrix consumer can target any task
conforming to the {rowId, rowTitle, referenceUrl, matrixRow}
convention — not just check_* tasks.
- Tighten lint pass on check_consensus_api: extract methodGet /
clientTypeUnk / maxResponseLen / defaultSSETimeoutSeconds constants,
pass Config by pointer to classifyHTTPResult, name resolvePath
results, and add the wsl_v5 blank-line padding the project requires.
ResolveQuery prepends '.' before parsing, so '| tasks | to_entries[]' expands to '. | tasks | to_entries[]' — and gojq has no global 'tasks' function so the iter yielded nothing, env.ROWS arrived undefined, and the renderer threw 'filter is not a function'. Existing playbooks (builder-lifecycle.yaml etc.) consistently use the '.tasks.<id>.outputs.<field>' form. Match it.
…urrent are collected Top-level '.tasks' only contains direct children of the test; child tasks of run_tasks_concurrent (etc.) register their status vars in the wrapper's scope. Use jq's recursive '..' to walk every nested map and pick out matrixRow-bearing objects, then unique_by(rowId) to drop any duplicates from the walk.
…ics on time.Time values
The 'tasks | to_entries[]' query crashed gojq with a panic on time.Time values nested in get_consensus_specs' spec map (MIN_GENESIS_TIME etc.). yaml roundtrip in ResolveQuery turns the iso8601 strings back into time.Time, and gojq's 'type' builtin (and friends) panic on that type. Switch to explicit per-row env vars — each query is now a simple path expression that never iterates a heterogeneous map. The JS side glues them back together. More verbose but robust.
Children of run_tasks_concurrent register their ids under the wrapper's variable scope, not the root — so the matrix renderer (running at root) couldn't resolve 'tasks.row12_sse_*.outputs'. Flat layout: sequential SSE subscription per topic. Adds ~4 min to the playbook (5 × 48s) but lets the matrix collector find every row by id. Also drops the temporary DEBUG dump task.
If the SSE subscription was accepted (route exists, topic name recognized) but no events arrive in the window, we've still proved the endpoint syntax — that's what 'api-compatibility' is about. Whether or not events actually fire during the window is a chain-state question, not a client compatibility one. Status moves from partial -> pass; note rewrites to drop the 'but'.
Previously the only way for a playbook to surface a top-level
artefact like a compatibility matrix was to drop it under a specific
task's result files — three clicks deep, hard to find unless you
already know which task to look in.
This patch adds a centralised run-level Result:
- New env var $ASSERTOOR_TEST_RESULT exposed by run_shell and
run_javascript. Points at a shared markdown file owned by the
test-run's scheduler; every task in the run sees the same path.
Multiple tasks can write to / append to it.
- TaskScheduler owns the file (lazy-created in a per-run temp dir,
cleaned up at end of RunTasks). After every task finishes, the
current file contents are upserted into the existing task_results
table under sentinel TaskID=0 / Type="test_result".
- run_javascript preamble grows two helpers: writeTestResult(s) and
appendTestResult(s) so user scripts don't have to fs.writeFileSync
by hand. (Execute is split into Execute + buildCommandEnv to keep
cyclomatic complexity in line.)
- New API endpoint GET /api/v1/test_run/{runId}/result returns the
markdown (text/markdown) or 204 when nothing was written.
- New web-ui component RunResultPanel renders the markdown inline on
the test-run page, just below the timeline. Polls every 5s while
the run is in flight, stops once it ends. A "view raw" link points
at the API for direct download.
- get_test_run_details_api.go already excluded TaskID=0 from
result_file headers (the task-level pane); fine-tunes a wsl_v5
whitespace nit while there.
Keeps the per-task matrix.md too (still downloadable + inline-rendered in the task pane), but also publishes it to the run-level Result panel — one click from the test-run page.
Walks recent runs of the test (newest first) and returns the first $ASSERTOOR_TEST_RESULT markdown blob found, so the dashboard can surface the latest meaningful result with one round trip instead of fanning out to fetch every run individually. Two response modes: raw markdown (default) plus metadata in X-Run-* headers, or a JSON envelope with run_id/status/start_time/markdown when ?meta=1 is set.
Adds LatestResultResponse type, getLatestTestResult() client call, and useLatestTestResult() react-query hook. The hook is used by the new dashboard's latest-result tile.
Adds a fully configurable tile grid that backs the new dashboard: - types.ts: tile model (4 types — success_rate, latest_result, recent_runs, text), persisted config schema (versioned, localStorage) - useDashboardConfig: hook owning the config + imperative API (add/remove/move/resize/reset) - TileGrid: 12-col responsive layout, edit-mode action strip - SuccessRateTile: rate ring + per-run swatches over last N runs - LatestResultTile: renders $ASSERTOOR_TEST_RESULT markdown of the most-recent run that produced one - RecentRunsTile: compact live-updating list of runs - TextTile: free-form markdown - AddTileModal: type-picker for new tiles - TileEditorModal: per-tile configuration (test picker, window size, markdown body, etc.)
The old Dashboard was 500+ lines that did one thing: paginate the test-runs table. That table now lives on the new /runs page; the home page becomes a real overview. The new Dashboard wires up the tile system: an edit-mode toggle, an Add-tile modal, a per-tile editor modal, and grid mutations (add/remove/move/resize). Out of edit mode it is pure read-only output.
New page hosts the test-runs table that used to live on /. It's a split pane: - left: registered tests with search; clicking a test scopes the right pane to its runs - right: paginated runs table (status, started, duration, actions), with bulk delete + 'Start test' from the active scope The selected test is reflected in the URL (?testId=…) so deep links and reloads work. Routing & nav are updated: 'Runs' sits between Dashboard and Library.
The 'SUCCESS RATE · DEMO · RUN-LEVEL RESULT · width-picker · ↑↓✎✕'
strip was wrapping awkwardly on small (3/12) tiles, occluding the
tile body. Two changes fix it:
- drop the title from the strip; it's already shown inside the tile,
and the type label alone is enough context
- replace the wordy width-picker ('small (3/12)') with a single-letter
variant (S / M / L / XL); full label is in the title attribute
Now the entire strip fits on one line at every supported width.
- All three Dockerfiles now pull node 24 from NodeSource (20 is EOL). - Dockerfile-stub (the CI stub image that copies a pre-built binary) was missing node entirely — added it the same way as the real multi-stage images, so playbooks using the run_javascript task can run inside the stub image too. - Removed the now-redundant 'nodejs is required by run_javascript' comments from the Dockerfiles; the dependency is self-evident.
Adds two new endpoints powering the reworked dashboard: - GET /api/v1/network_status — aggregated chain & orchestrator snapshot (head slot/epoch, finalized + justified checkpoints, EL head, per-layer client readiness counts, queue depth). Cheap; served entirely from the in-process pool. - GET /api/v1/dashboard_config (public) PUT /api/v1/dashboard_config (auth-required) The dashboard layout now lives on the server in a small key/value table (sqlite + pgsql migrations), so it can be edited only by authenticated users — anonymous viewers see the same dashboard but can't mutate it. The body is treated as an opaque JSON blob; schema validation lives in the client.
Major rework of the dashboard editing experience: Data model - Tiles now live inside named **rows** (horizontal containers). Rows visually segment the dashboard and can be reordered as units. - Config schema bumped to version 2; no migration from v1 (per user request) — invalid blobs reset to the default dashboard. - Config is server-backed (replaces localStorage). The hook is a react-query mutation that PUTs after a 250ms debounce so rapid drag-and-drop edits coalesce into one round-trip. Edit mode - Each tile is now visually framed in edit mode (chip header + bordered body), so the type label can't be confused with the tile above it — the bug from the previous edit-mode screenshot. - Each row gets its own dashed-border frame with row-level controls (rename, move up/down, remove). - Add-tile modal is gone. Replaced with a sidebar palette (like the builder) of draggable tile types — drop them on any row's drop zone, or click to append to the last row. - Tiles are draggable too: rearrange within a row, move across rows. Powered by @dnd-kit/core. - Export / Import buttons round-trip the config as JSON (download + file picker). - Edit-mode controls are hidden when the user isn't authenticated; the PUT endpoint requires auth, so anonymous editing would silently fail to persist. New tile types - client_status — per-client EL/CL liveness dots + heads, polls the existing /clients endpoint. - network_status — head slot/epoch, finalized + justified checkpoints, EL head, queue depth, ready-client counts; polls the new /api/v1/network_status endpoint.
Edits during edit-mode now stage to a local 'draft' state. The react-query cache still mirrors the server-side baseline; the draft only flushes to the server when the user clicks 'Save changes'. This fixes the regression where rapid mutations (add row, drag tile from palette, etc.) would appear to disappear: the previous debounced PUT raced with re-renders and was overwriting optimistic updates on some auth states. With an explicit save we never touch the server during edits, so there's no race. UX changes: - 'UNSAVED' badge in the title when the draft differs from the server config. - New toolbar buttons: 'Save changes' (primary, enabled when dirty) and 'Discard' (only shown when dirty). - 'Done' confirms before exiting with unsaved changes. - beforeunload guard so tab close / refresh prompts when dirty. - Reset and Import both stage into the draft — neither hits the server until the user saves.
The latest_result, recent_runs, client_status and text tiles can all grow with their content. On a dashboard with a long matrix or a 30-client fleet, that pushes everything else off-screen. Add an optional heightPx field on those four tile configs. When set, the tile's outer card gets max-height: <px>px and the inner body becomes scrollable (the flex-col layout was already in place for three of them; TextTile got refactored to match). The tile editor exposes heightPx as a 'Max height (px, optional)' input — leave it blank for the previous behaviour (size to content). Also tidies the editor: client_status now has its own editor body (showExecution checkbox + height), and network_status renders a short 'no extra settings' note instead of an empty form.
Three new HTTP endpoints powering the schedule rework, plus an
extended schedule API that preserves the legacy shape:
- PUT /api/v1/test/{testId}/schedule (auth)
Replace a registered test's schedule (startup, cron[], skipQueue).
Cron expressions are validated up-front; invalid input rejects
the whole request.
- GET /api/v1/test/{testId}/next_run
Walks each cron expression and returns the next firing time per
expression plus the overall earliest. Drives the 'next in 7m'
hint shown on the runs page banner.
- GET /api/v1/test_queue
Returns the runner's running + pending queue in execution order.
Feeds the StartTestModal's QueuePicker.
POST /api/v1/test_runs/schedule gains a new optional 'queue'
field (mode: immediate|end|after, after_run_id). The legacy
'skip_queue' boolean stays as a deprecated fallback — when both
are supplied 'queue' wins.
Internally, types.Coordinator gains ScheduleTestWithOptions; the
4-arg ScheduleTest is preserved and now forwards to it. The
testrunner can splice into c.testQueue at an arbitrary position
(after the named RunID); falls back to 'append' silently if the
target has already started/finished.
TestRegistry.UpdateTestSchedule mutates the descriptor under a
schedule mutex and re-upserts the test_configs row so changes
survive restarts.
Adds a CronEditor and a ScheduleCard component that surface a
test's schedule wherever it matters:
- Library → Local Playbooks → expanded test row: card variant
with the full breakdown (Startup / Skip-queue toggles, each
cron expression + cronstrue-rendered description).
- Runs page → above the runs table when a test is selected:
banner variant — one-line summary + 'next in 7m (HH:MM)'.
The editor itself (opened from either surface) supports:
- 7 one-click presets (every minute, hourly, daily, weekly, …)
- Free-form crontab input validated by cron-parser
- cronstrue translation under each expression
- The next 3 firings rendered inline
- Per-row remove + 'add another expression'
- 'Clear schedule' to wipe everything
Edits are auth-gated end-to-end: the modal's Save button calls the
new auth-required PUT endpoint; anonymous users see the card in
read-only mode (no Edit button).
The Run dialog gets two clearly distinct sections:
- 'Test configuration' — the per-test knobs (now in a bordered
card with an uppercase header + config icon).
- 'Run options' — bordered card with a settings icon, holding:
- 'Queue placement': the new QueuePicker dropdown
- 'Allow duplicate' checkbox
QueuePicker is a rich, 2-level nested dropdown:
Level 1:
- Run immediately (mode='immediate'; parallel slot)
- Add to queue · End (mode='end'; append)
- Add to queue · After… (opens level 2)
Level 2:
- One entry per currently running or queued test, each with a
status dot + run id + name. Selecting one sets mode='after',
after_run_id=<run id>.
The dropdown renders in-flow (not absolutely positioned) so the
containing modal scrolls naturally when the menu overflows.
Also fixes an unrelated 'renders 0' bug — when a test's default
timeout was exactly 0, React rendered the falsy 0 inside the
test-info card.
No need for a dedicated table + migration just to hold one row.
The existing assertoor_state KV store (key TEXT primary, value TEXT)
already serves exactly this purpose for other singletons in the
codebase, so reuse it.
Changes:
- Drop pkg/db/dashboard_config.go and the two 20260518150000_*.sql
migrations (sqlite + pgsql).
- Rewrite GetDashboardConfig / PutDashboardConfig to round-trip the
JSON blob through GetAssertoorState / SetAssertoorState under the
fixed key 'dashboard_config'. json.RawMessage is the carrier type
so the bytes pass through verbatim (RawMessage.MarshalJSON is
identity), avoiding the double-encoding the KV's generic
interface{} path would otherwise do.
Also gofmt-fixes a couple of struct-tag tables in the new API files
and tidies an exhaustive-switch warning by listing the terminal
TestStatuses explicitly.
Verified end-to-end:
- GET (no config) → 204
- PUT valid JSON → 200, persisted
- GET → returns the same JSON verbatim (not double-encoded)
- PUT invalid JSON → 400 with a clear error
- migration log shows only the 4 pre-existing migrations
`make lint` now passes with 0 issues. None of the touched code paths change behaviour — every fix is annotation-only. Three buckets: - **errcheck on crypto/rand.Read** in the three spam-transaction generators (random-target address paths). crypto/rand never fails on Linux; the zero fallback would be a harmless spam target if it ever did. Now explicitly discarded with `_, _ = rand.Read(...)` and a one-line note. - **gosec G115 integer-overflow conversions** (12 sites across pkg/clients, pkg/tasks/generate_*, pkg/txmgr). Each is suppressed with `//nolint:gosec` and a short reason — most are guarded by a `> 0` check on the line above, or operate on non-negative slice indices. - **nolintlint 'unused directive'** in pkg/web/api/get_test_yaml_api.go. Three stale `//nolint:gosec` comments left over from a refactor — the underlying gosec rules no longer fire here, so the directives themselves are now lint findings. Removed.
My earlier 'pre-existing lint cleanup' commit was run against local
golangci-lint v2.6.0, which flags G115 integer-overflow conversions.
CI runs v2.11.3, which does NOT flag those — so every G115 nolint
became an 'unused directive' nolintlint error in CI.
Two corrections:
- Restore the three //nolint:gosec annotations on
pkg/web/api/get_test_yaml_api.go that v2.11.3 still flags (G704
SSRF on the HTTP request paths, G703 path-traversal on the local
file read). Both inputs come from configured test sources, never
from request bodies.
- Drop the 13 //nolint:gosec annotations on G115 integer-overflow
conversions across pkg/clients/, pkg/tasks/generate_*, and
pkg/txmgr/spamoor.go. v2.11.3 doesn't fire on these in the first
place, so the directives are unused.
Verified clean under golangci-lint v2.11.3 (the version pinned in
.github/workflows/_shared-check.yaml):
$ golangci-lint run --timeout 5m ./...
0 issues.
Resolves to a stable point a few slots back from head (default 4 slots
≈ 48s) plus the canonical block root at that slot. This lets the
GLOAS API playbook target a slot where derived state like
execution-payload envelopes and PTC committee data is reliably
available across every client, instead of racing against head.
New placeholders, all offset-arithmetic capable:
- {recent_slot}, {recent_slot+N}, {recent_slot-N}
- {recent_epoch}, {recent_epoch+N}, {recent_epoch-N}
- {recent_block_root} (root fetched via GetBlockHeaderBySlot)
Implementation details:
- resolvePath() builds a pathContext once per task execution, doing
a single best-effort GetBlockHeaderBySlot RPC for the recent root.
- resolvePlaceholder() now takes the pathContext value-struct
instead of an ever-growing tuple of head{Slot,Epoch,Root}.
- The signed-offset parser (+N/-N) is unchanged, so '_' in
'recent_slot' is correctly part of the keyword.
- On lookup failure the recent root falls back to head, so the
task still produces something usable even if /beacon/headers
is briefly unavailable.
The original matrix was dominated by 🟡 footnotes saying 'error status
404 but body does not match ErrorMessage schema'. Those reflected our
strict OpenAPI ErrorMessage requirement (`required: [code, message]`)
more than they reflected anything about endpoint compatibility. CL
implementations diverge wildly on error-body shape (some omit `code`,
Lodestar wraps differently, Nimbus/Grandine return extra fields,
etc.).
Two changes:
- **Permissive errorSchema everywhere**: `errorSchema: { type: object }`.
We only require that the body parses as a JSON object — which is
enough to distinguish 'route registered, returned a structured
error' from 'connection refused / HTML 404 / random text'. The
spec-correct `{code, message}` schema would still flag legitimate
outliers, but for a cross-client compatibility matrix it produced
more noise than signal.
- **Stable inputs via {recent_*}**: row 7 (envelope/{slot}/{root})
was using head-slot/head-root, but the envelope for the very head
may not be produced yet on every client. Now it uses
{recent_slot}/{recent_block_root} (head - 4 with the canonical
root fetched via /beacon/headers), so each client has a stable,
agreed-upon target. Same idea for rows 2, 4, 8, 9.
- **Expanded expectStatuses**: rows 2/4 now accept 500 and 501. A
500 from a stub handler on synthetic input still tells us the
route is registered (which is the question we're answering); 501
with a structured body is a partial-implementation signal worth
capturing as 🟡-or-better, not ❌.
- **Loosened success schema on row 6**: clients return the envelope
in slightly different wrappings (some {version,data}, some bare).
Accept any JSON object — the structural validation belongs in
spec-conformance tests, not the compatibility matrix.
Description, version (1.0.0 → 1.1.0) and emoji-legend updated to
match the new contract.
…research/dashboard-rework
The eventstream library's Stream.Ready is an unbuffered channel that the stream goroutine writes to before entering its read loop (see eventstream.go:169 — `stream.Ready <- true`). If no consumer drains it that send blocks forever, the receive goroutine never starts, and every SSE topic appears to produce zero events even when the server is actively pushing them. That was the root cause of the SSE matrix rows always showing 'subscription opened (no events within window)' — including for high-volume topics like execution_payload where events are guaranteed under a healthy GLOAS chain. Drain Ready immediately after Subscribe returns, gated on the same context the rest of the SSE loop uses so a stalled subscription still terminates the task cleanly.
Two focused, generic tasks playbooks can compose with configVars to feed realistic inputs into other tasks: - get_consensus_block_header — fetch one beacon-block header by slot, root, or 'head - headOffset' (walking past missed slots). Outputs slot, root, proposerIndex, parentRoot, stateRoot. - get_consensus_proposer_duties — fetch the proposer schedule for one epoch (absolute, or current + epochOffset). Outputs the duties array plus convenience fields: the first duty strictly after the current head (proposerSlot + validator index for produceBlockV4 inputs) and up to N deduped real validator indices for body-payloaded endpoints. These replace the earlier monolithic gather_gloas_context concept: they're small and orthogonal, can be used independently in any playbook, and don't carry domain assumptions about which fields a particular check needs.
…l SSE
The matrix now actually verifies the response shape: each GET row
declares a responseSchema listing every required field from the
spec PR (#552, #580, etc.), so a client returning the wrong wrapper
or missing a GLOAS-new field (execution_payload_bid, payload_attestations,
the envelope's payload sub-fields, etc.) earns a 🟡 instead of a
false ✅.
Inputs come from two new generic tasks rather than synthetic
placeholders:
- tasks.recent -- get_consensus_block_header(headOffset=4): a
canonical slot/root pair guaranteed to have a
committed envelope on every client.
- tasks.duties -- get_consensus_proposer_duties(epochOffset=1):
the next-epoch proposer schedule, which gives
us a real future-slot proposer for
produceBlockV4 and a list of live validator
indices for PTC duties.
Per-row wiring via configVars/pathParams:
- row02 (produceBlockV4): slot = first future proposer slot
- row04 (get_bid): slot + builder_index = future proposer pair
- row06 (envelope/{id}): block_id = recent root
- row07 (envelope/{s}/{r}): slot + beacon_block_root = recent pair
- row08 (ptc duties): epoch = duties.epoch, body = real validator indices
- row09 (payload_attestation_data): slot = recent canonical slot
SSE rows (12-16) now run concurrently inside one run_tasks_concurrent
wrapper with newVariableScope:false — children still register at the
root tasks scope so the matrix renderer can find them by id. That
cuts the SSE wallclock from ~4 minutes (5 × 48s sequential) to ~48s.
Permissive errorSchema everywhere ({type:object}) is unchanged from
the previous pass — error-body shapes diverge legitimately and that
divergence isn't what the matrix is trying to measure.
vars.Variables.ConsumeVars prefixes every query with '.' unless it
already starts with one, so '{ key: val }' becomes '.{ key: val }'
which is a syntax error.
Three-fold fix:
- Lead each query with a '|' so the auto-prefix produces '.| { ... }'
(identity | object constructor — valid jq).
- Inside the object constructor, field accesses need a leading '.'
(jq's bare 'foo.bar' shorthand only works at the very start of an
expression). 'tasks.X.Y' becomes '.tasks.X.Y'.
- 'body' uses no constructor so just keeps the canonical
'tasks.duties.outputs.validatorIndices | map(tostring)' form
(which the auto-prefix turns into a valid '.tasks…').
Every query parsed with gojq before commit; all seven now compile.
…assify empty body as missing
Two bugs the live e2e against glamsterdam-devnet-3 surfaced:
- resolvePath had an early return when every placeholder came from
explicit pathParams. The substitution loop was only hit when the
function had to walk chain state, so requests like
'GET /eth/v1/beacon/execution_payload_envelope/{block_id}' with
block_id supplied via configVars went out with the literal
{block_id} URL-encoded as %7Bblock_id%7D and clients responded
with 400 'invalid block ID'. The matrix then graded the test as
'well-formed error' (✅) — a false positive across the board.
- Empty (or non-JSON) bodies on a documented error status were
silently being scored as schema-mismatch partial (🟡). Empty 4xx
bodies are how most web frameworks signal a fully-missing route
(the global not-found handler), so classify them as ❌ 'route
likely missing' instead. Same for non-JSON bodies — every client
that actually implements an endpoint sends a structured error
body.
Also rewrites every responseSchema and eventSchema in the playbook
to match the canonical beacon-APIs types (~/repos/beacon-APIs):
- Row 2 (produceBlockV4): body uses 'signed_execution_payload_bid'
(NOT 'execution_payload_bid'); 'blob_kzg_commitments' and
'execution_requests' are NOT body fields in gloas.
- Row 4 / SSE 12 (bid): full ExecutionPayloadBid required-field list
(parent_block_hash, parent_block_root, block_hash, prev_randao,
fee_recipient, gas_limit, builder_index, slot, value,
execution_payment, blob_kzg_commitments). The previous
'blob_kzg_commitments_root' field doesn't exist in the spec.
- Row 6/7 (envelope GETs): wrapper is {version, execution_optimistic,
finalized, data}; ExecutionPayloadEnvelope required fields are
exactly 5: payload, execution_requests, builder_index,
beacon_block_root, parent_beacon_block_root (no slot,
state_root, blob_kzg_commitments — those live elsewhere).
- Rows 9/11 (payload attestation data, pool/payload_attestations):
PayloadAttestationData uses 'payload_present' + 'blob_data_available'
booleans (not 'payload_status'); pool response is wrapped {version,
data}.
- SSE 13 (execution_payload_available): field is 'block_root' (not
'beacon_block_root').
- SSE 14 (payload_attestation_message): wrapped {version, data},
inner data carries 'payload_present'/'blob_data_available'.
- SSE 15 (execution_payload): flat {slot, builder_index, block_hash,
block_root, execution_optimistic}.
- SSE 16 (execution_payload_gossip): same as 15 minus
execution_optimistic.
Also fixes row 10's POST body to use the correct PayloadAttestationMessage
shape (validator_index outside, nested data with the four
PayloadAttestationData fields).
Previous run revealed false-positive ✅s on routes that DON'T exist
but the framework returns a structured-JSON 404. Lighthouse's global
not-found handler returns `{code,message:NOT_FOUND}`; Lodestar's
returns `{code,message:"Route POST:/eth/v1/foo not found"}` — both
are technically valid JSON objects that pass an open schema, but
neither indicates that the route is registered.
Add a tiny content-aware classifier that scans the error `message`
for generic framework-level phrases and treats those as 'route
likely missing' instead of 'well-formed error'. Patterns covered:
NOT_FOUND, route X not found, unsupported endpoint version,
endpoint not found, unknown endpoint, unknown route,
no matching route, method not allowed, no handler
The patterns are tight enough to ignore domain-specific 404s
('Execution payload envelope not found', 'block not found at slot
100', 'Currently syncing', etc.). Verified on 9 sample messages —
all classify correctly.
…work vs domain 404s
Previous run flipped Lighthouse's row 6 (envelope/{block_id}) to a
false ❌ because its body was 'NOT_FOUND: execution payload envelope
for block root 0xe77…' — the route IS registered, the specific
resource just isn't available. LH happens to prefix every error
message with 'NOT_FOUND:' (its global error format), so the
substring match was overreaching.
Tighten: only treat the literal 'NOT_FOUND' / 'not found' message
(no detail) as a framework miss. Anything with a colon and detail
('NOT_FOUND: execution payload envelope for block …') stays as a
real domain 404 and counts as ✅ for endpoint existence.
Lodestar's 'Route POST:/eth/v1/foo not found' continues to match
via the dedicated 'route … not found' branch; the other patterns
(unsupported endpoint, unknown route, method not allowed, …) are
unchanged.
Verified on 11 sample messages — all classify correctly including
the new LH domain case.
…ents)
beacon-APIs PR #580 was revised mid-flight: the response shape now uses
`execution_payload_included` (boolean flag) instead of the originally
proposed `execution_payload_value` (string), and `data` is anyOf
{BeaconBlock, BlockContents} where BlockContents = {block,
execution_payload_envelope, kzg_proofs, blobs}. Top-level fields are
now { version, consensus_block_value, execution_payload_included,
data }.
Prysm already implements the revised shape (returns
{version, consensus_block_value, execution_payload_included, data:
{block,...}} for self-built blocks). The old schema required
`execution_payload_value` and put block fields directly under `data`,
which incorrectly flagged Prysm's spec-conformant response as 🟡
"response body does not match success schema".
Rewrite the schema with $defs + anyOf to accept either shape — clients
that diverge from BOTH variants still surface mismatches.
Lighthouse strictly requires `skip_randao_verification` to be a
presence-only flag — the underlying SkipRandaoVerification type
accepts None (= No) and Some("") (= Yes) but rejects any other value
with `Invalid query string`. Sending `skip_randao_verification=true`
made LH return 400 at the query-parse stage, so we never reached the
handler and never validated the success response body.
Switching to the spec-conformant empty-value form (`?skip_randao_verification=`)
keeps Prysm and Lodestar happy (both accept either form) and lets LH
actually produce a block, so its response is validated against the
PR #580 schema instead of being credited as a "well-formed 400".
…it rate The envelope-by-block-id row (row 6) probes each client with a single root picked from the first online consensus client. With headOffset=4 the chosen slot was often past Nimbus's envelope retention window, so Nimbus returned 404 instead of 200 and we never validated its actual response schema (which lacks the spec-required `version` field). Drop the offset to 1 so the root we probe is the most-recent slot that's almost guaranteed to be canonical on every client AND still have its envelope cached. Keeps the existing maxLookback fallback (now 4) for the rare case where head-1 was a missed slot.
A stability run failed when slots 190..193 on the devnet were all
missed in quick succession (the devnet briefly desynced and recovered
through a small run of empty slots). With maxLookback=4 the recent
block resolver only looked at slot{N..N-3} and bailed, taking the
whole playbook down before it ever reached the matrix.
16 slots is roughly half an epoch — enough to absorb any realistic
run of misses on a healthy devnet while still picking a recent root.
…y param
Per beacon-APIs PR #580, the canonical endpoint is
GET /eth/v1/validator/execution_payload_envelope/{slot}
with `beacon_block_root` as a *query parameter* (re-org resistance:
the BN returns 404 when its cached envelope is for a different
block root). Response data is the unwrapped ExecutionPayloadEnvelope
struct — no `message`/`signature` wrap; the VC signs the envelope.
The previous test queried `{slot}/{beacon_block_root}` as path
segments and validated a `{message, signature}` shape inside `data`.
That layout matches Lodestar's (non-spec) implementation but missed
on Lighthouse and Prysm, both of which correctly register the
spec-conformant 1-segment path. Net effect: the matrix flipped the
verdicts — Lodestar showed ✅, LH/Prysm showed ❌ — exactly opposite
of reality.
Fix: probe the spec-conformant URL with query param and validate the
spec-conformant response shape. LH/Prysm should now correctly resolve
the route (status depends on cache state), Lodestar should 404 (no
route for the 1-segment form on Lodestar).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This branch is a chunky rework of the assertoor UX. It bundles three independent threads of work that ended up touching the same surfaces (test runs, results, scheduling), so they ship as one PR.
TL;DR
GLOAS beacon-API compatibility playbook — a generic
check_consensus_apitask + a 16-row compatibility matrix playbook that probes every endpoint introduced or changed by the GLOAS / ePBS spec PRs.Configurable dashboard — the static "latest test runs" home page is replaced with a fully configurable tile grid (rows + drag-and-drop + 6 tile types). The runs list moves to its own
/runspage with a split-pane Tests | Runs view.Cron scheduling, surfaced — schedules are now first-class: editable inline in the library page and on the runs page, powered by a nice cron editor (presets, validation, next-3-firings preview). The Run dialog gets a rich two-level Queue picker.
Every existing public API endpoint keeps working unchanged; new shapes are added alongside the old ones as additive fields.
1. GLOAS beacon-API compatibility playbook
A reusable per-client API probe + the playbook that wires 16 of them into a compatibility matrix.
check_consensus_api— generic per-client API probe. Resolves placeholders, makes the HTTP call (or opens an SSE topic), validates the response shape against an inline JSON-Schema, classifies the result (pass/partial/fail/skipped), and emits a structuredmatrixRowoutput.run_javascript— runs a JS snippet via system Node.js with a small preamble exposingenv,setOutput,setVar,writeResultFile,writeSummaryand the newwriteTestResult/appendTestResulthelpers. Replaces the shell+jq approach that turned out to be the wrong language for collating structured task outputs.\$ASSERTOOR_TEST_RESULTenv var exposed byrun_shellandrun_javascriptpoints at a shared markdown file. After every task the scheduler syncs the file to the DB (sentineltask_id=0,type=test_result). The result is served as text/markdown via the newGET /api/v1/test_run/{runId}/resultendpoint and rendered prominently at the top of the test-run page.playbooks/api-compatibility/gloas-api-check.yaml— 11 HTTP + 5 SSE probes, each carrying its own schema. A finalrun_javascriptstep formats the results into the matrix and writes it to\$ASSERTOOR_TEST_RESULT.Dockerfile-stub(the CI stub image) gains Node too sorun_javascript-using playbooks work in CI.2. Configurable dashboard
The old home page was a single paginated test-runs table. It is now:
GET / PUT /api/v1/dashboard_config; auth-gated PUT).success_rate— per-test success ring + per-run swatches.latest_result— markdown blob of the most recent run that produced one (uses\$ASSERTOOR_TEST_RESULT).recent_runs— live-updating list (all tests or scoped).client_status— per-client EL/CL liveness + heads.network_status— head/finalized/justified checkpoints, queue depth, ready-client counts (via newGET /api/v1/network_status).text— free-form markdown.latest_result,recent_runs,client_status,text) accept an optionalheightPx; the body scrolls when capped.@dnd-kit, with a sidebar palette (replaces an earlier modal), explicit Save changes / Discard flow (no live PUT), JSON import/export, beforeunload guard.3. Split-view Runs page
The runs list moves off the home page to its own /runs route:
?testId=…so deep links and reloads work.4. Cron scheduling, surfaced
Schedules used to be hidden inside YAML and never editable from the UI. Now:
PUT /api/v1/test/{testId}/schedule(auth) — set/replace a test's schedule. Cron expressions are validated up-front.GET /api/v1/test/{testId}/next_run— next firing per expression + the overall earliest.GET /api/v1/test_queue— live runner queue (running + pending) used by the Queue picker.cron-parser, cronstrue translations, next-3-firings preview.ScheduleTestWithOptionsadds support for splicing into the pending queue at an arbitrary position (after_run_id). Schedule updates mutate the descriptor under a mutex and re-upsert thetest_configsrow so they survive restarts.Reworked Run dialog
skip_queuecheckbox is replaced with a QueuePicker — a rich two-level nested dropdown:API compatibility
POST /api/v1/test_runs/schedulekeepsskip_queueworking — the newqueuefield is additive and wins when both are supplied. The 4-argCoordinator.ScheduleTestGo signature is kept and now forwards toScheduleTestWithOptions.Plumbing changes
assertoor_stateKV store (keydashboard_config), so no new DB migrations.Coordinatorinterface gainsScheduleTestWithOptionsandTestRegistry.UpdateTestSchedule.Descriptorgains a mutex-guardedGetSchedule/SetSchedulepair so cron mutations are race-free.Testing
Verified end-to-end against a live kurtosis enclave with 5 CL clients (lighthouse-supernode, prysm, grandine, nimbus, lodestar):
/next_runupdates the bannernext in 7m (5:45 PM)text on the runs page.POST .../schedulewith{"queue":{"mode":"after","after_run_id":N}}splices a new run directly behind#Nin the pending queue; legacy{"skip_queue":true}callers still work.