Dashboard rework, GLOAS beacon-API compatibility playbook & cron scheduling by pk910 · Pull Request #179 · ethpandaops/assertoor

pk910 · 2026-05-18T23:10:50Z

This branch is a chunky rework of the assertoor UX. It bundles three independent threads of work that ended up touching the same surfaces (test runs, results, scheduling), so they ship as one PR.

TL;DR

GLOAS beacon-API compatibility playbook — a generic check_consensus_api task + a 16-row compatibility matrix playbook that probes every endpoint introduced or changed by the GLOAS / ePBS spec PRs.
Configurable dashboard — the static "latest test runs" home page is replaced with a fully configurable tile grid (rows + drag-and-drop + 6 tile types). The runs list moves to its own /runs page with a split-pane Tests | Runs view.
Cron scheduling, surfaced — schedules are now first-class: editable inline in the library page and on the runs page, powered by a nice cron editor (presets, validation, next-3-firings preview). The Run dialog gets a rich two-level Queue picker.

Every existing public API endpoint keeps working unchanged; new shapes are added alongside the old ones as additive fields.

1. GLOAS beacon-API compatibility playbook

A reusable per-client API probe + the playbook that wires 16 of them into a compatibility matrix.

New task: check_consensus_api — generic per-client API probe. Resolves placeholders, makes the HTTP call (or opens an SSE topic), validates the response shape against an inline JSON-Schema, classifies the result (pass / partial / fail / skipped), and emits a structured matrixRow output.
New task: run_javascript — runs a JS snippet via system Node.js with a small preamble exposing env, setOutput, setVar, writeResultFile, writeSummary and the new writeTestResult/appendTestResult helpers. Replaces the shell+jq approach that turned out to be the wrong language for collating structured task outputs.
Test-run-level Result mechanism — new \$ASSERTOOR_TEST_RESULT env var exposed by run_shell and run_javascript points at a shared markdown file. After every task the scheduler syncs the file to the DB (sentinel task_id=0, type=test_result). The result is served as text/markdown via the new GET /api/v1/test_run/{runId}/result endpoint and rendered prominently at the top of the test-run page.
New playbook: playbooks/api-compatibility/gloas-api-check.yaml — 11 HTTP + 5 SSE probes, each carrying its own schema. A final run_javascript step formats the results into the matrix and writes it to \$ASSERTOOR_TEST_RESULT.
Bumped Node to 24 across all Dockerfiles; Dockerfile-stub (the CI stub image) gains Node too so run_javascript-using playbooks work in CI.

2. Configurable dashboard

The old home page was a single paginated test-runs table. It is now:

A configurable tile grid persisted server-side (GET / PUT /api/v1/dashboard_config; auth-gated PUT).
Rows act as horizontal containers; each row holds tiles that share a 12-col responsive grid.
6 tile types:
- success_rate — per-test success ring + per-run swatches.
- latest_result — markdown blob of the most recent run that produced one (uses \$ASSERTOOR_TEST_RESULT).
- recent_runs — live-updating list (all tests or scoped).
- client_status — per-client EL/CL liveness + heads.
- network_status — head/finalized/justified checkpoints, queue depth, ready-client counts (via new GET /api/v1/network_status).
- text — free-form markdown.
Tiles with potentially unbounded content (latest_result, recent_runs, client_status, text) accept an optional heightPx; the body scrolls when capped.
Edit mode is auth-gated, drag-and-drop powered by @dnd-kit, with a sidebar palette (replaces an earlier modal), explicit Save changes / Discard flow (no live PUT), JSON import/export, beforeunload guard.

3. Split-view Runs page

The runs list moves off the home page to its own /runs route:

Left pane: registered tests with search, click to scope the right pane.
Right pane: paginated runs table (status, started, duration, bulk actions) plus, when a test is selected, a schedule banner at the top (next-firing hint + Edit button).
The selected test is reflected as ?testId=… so deep links and reloads work.

4. Cron scheduling, surfaced

Schedules used to be hidden inside YAML and never editable from the UI. Now:

New endpoints:
- PUT /api/v1/test/{testId}/schedule (auth) — set/replace a test's schedule. Cron expressions are validated up-front.
- GET /api/v1/test/{testId}/next_run — next firing per expression + the overall earliest.
- GET /api/v1/test_queue — live runner queue (running + pending) used by the Queue picker.
CronEditor — preset chips (every minute / hour / day / week / …), free-form crontab input validated by cron-parser, cronstrue translations, next-3-firings preview.
ScheduleCard — surfaces the schedule + next-firing hint prominently:
- In Library → Local Playbooks → expanded test as a full card.
- On the Runs page as a banner above the runs table when a test is selected.
Test runner: ScheduleTestWithOptions adds support for splicing into the pending queue at an arbitrary position (after_run_id). Schedule updates mutate the descriptor under a mutex and re-upsert the test_configs row so they survive restarts.

Reworked Run dialog

Visually separates Test configuration (the per-test knobs) from Run options (how/when the run lands on the runner).
The skip_queue checkbox is replaced with a QueuePicker — a rich two-level nested dropdown:
- Level 1: Run immediately · Add to queue · End · Add to queue · After…
- Level 2: live list of running/queued tests with status dots, run IDs and names — pick one to insert directly behind it.
The dropdown renders in-flow so the modal scrolls naturally when the menu overflows.

API compatibility

POST /api/v1/test_runs/schedule keeps skip_queue working — the new queue field is additive and wins when both are supplied. The 4-arg Coordinator.ScheduleTest Go signature is kept and now forwards to ScheduleTestWithOptions.

Plumbing changes

Dashboard config persists via the existing assertoor_state KV store (key dashboard_config), so no new DB migrations.
Internal Coordinator interface gains ScheduleTestWithOptions and TestRegistry.UpdateTestSchedule.
Descriptor gains a mutex-guarded GetSchedule/SetSchedule pair so cron mutations are race-free.

Testing

Verified end-to-end against a live kurtosis enclave with 5 CL clients (lighthouse-supernode, prysm, grandine, nimbus, lodestar):

Full 16-row GLOAS matrix renders, with SSE rows correctly flipping to ✅ where the subscription was syntactically valid.
Dashboard tiles populate from real data; rows + drag-and-drop + add/remove/save/discard exercised via Playwright.
Schedule editor saves a cron entry; /next_run updates the banner next in 7m (5:45 PM) text on the runs page.
POST .../schedule with {"queue":{"mode":"after","after_run_id":N}} splices a new run directly behind #N in the pending queue; legacy {"skip_queue":true} callers still work.

Generic per-client beacon-API probe used as the building block for cross-client API-compatibility playbooks. The task hits a single HTTP endpoint (or subscribes to one SSE topic) on every connected consensus client, classifies each response (pass / partial / fail / skipped), and emits both per-client results and a 'matrixRow' output collapsed by client-type for downstream aggregation. Key behaviours: - Path placeholders ({slot}, {epoch}, {block_id}, {beacon_block_root}, {builder_index}, {validator_index}) are auto-resolved from chain state; offsets like {slot+5} / {epoch-1} are supported. - HTTP classification splits expectStatuses into successStatuses (run responseSchema) and errorStatuses (run errorSchema = ErrorMessage shape). Schema-valid 4xx still counts as 'pass' because it proves the endpoint exists and parses the request. - SSE mode subscribes to /eth/v1/events?topics=..., waits a configurable window for at least minEvents matching events, optionally validates each event payload against eventSchema. - requireForkActive skips clients whose head hasn't reached the named fork's activation epoch. - Per-client probes run concurrently up to the configured concurrency cap; the task aggregates pass/partial/fail counts and either succeeds, fails-on-any, or fails-on-all-error depending on config. Adds github.com/santhosh-tekuri/jsonschema/v5 for inline JSON-Schema validation of response / event bodies.

New aggregator task that walks the test run, picks up every check_consensus_api task's matrixRow output, and renders a markdown compatibility table. The same markdown is stored both as the task's 'summary' result (so it renders inline at the top of the task pane in the UI) and as a downloadable matrix.md result file, alongside a matrix.json dump for machine consumption. Columns are ordered by client-type (Lighthouse, Teku, Prysm, Grandine, Nimbus, Lodestar, Caplin) with two-letter labels matching the human-readable style; unused client-type columns are dropped by default (toggle with showAllClientTypes). Notes on partial/fail cells are emitted as numbered footnotes with deduplication. Also extends the task UI to render summary files and any .md result file inline using ReactMarkdown + remark-gfm, so the matrix shows up in the UI without manual download.

New playbooks/api-compatibility/ folder for cross-client beacon-API compatibility checks. The folder is intentionally fork-agnostic so future forks can drop in a new file alongside. gloas-api-check.yaml exercises every beacon-API endpoint / SSE topic introduced or changed by the GLOAS / ePBS spec PRs (#552, #580, #586, #587, #588, #592) against every connected consensus client: - 11 HTTP endpoints (publish/produce/retrieve blocks, bids, envelopes, PTC duties, payload attestation data + pool ops). - 5 SSE topics (execution_payload_bid, execution_payload_available, payload_attestation_message, execution_payload, execution_payload_gossip), subscribed concurrently to share the same wait window. Each row uses an inline JSON Schema derived from the spec to validate either the success body or the documented ErrorMessage body. The final step calls generate_api_compatibility_matrix to render the markdown matrix as a result artifact. Playbook index is regenerated to include the new folder.

…accepts them run_shell's ::set-output stores values as plain strings; downstream tasks that consume the value as uint64 (minSlotNumber, etc.) fail yaml-unmarshal. Switch to ::set-output-json so the integers are stored typed.

End-to-end polish surfaced by the first kurtosis test run: - matrixCell now has explicit JSON tags so matrix.json uses camelCase ('result' / 'note' / 'httpStatus') instead of the Go-default capitalised field names. - Row titles in the rendered table wrap only the HTTP method+path (or SSE topic) in code-ticks; the optional trailing descriptor (e.g. '(Gloas)') stays outside the code span, matching the requested matrix layout. - /api/v1/test_run/{runId}/details now includes result_files for each task, so the UI's TaskList -> TaskDetails Overview pane has the metadata it needs to render the inline markdown summary. Previously only /api/v1/test_run/{runId} carried the result-file headers and the UI's per-task fetch path was missing them.

New utility task that runs a JavaScript snippet via Node.js. Modeled on run_shell with the same stdout-marker protocol (::set-output, ::set-output-json, ::set-var, ::set-json) and the same \$ASSERTOOR_SUMMARY / \$ASSERTOOR_RESULT_DIR semantics, but built for shapes that are awkward in shell/jq — collecting and rendering tabular data from sibling task outputs, generating JSON configs from chain state, etc. Highlights: - envVars are JSON-decoded by a preamble and exposed as a single 'env' object so user code does not have to JSON.parse(process.env.X) manually. - Helpers: setOutput[JSON], setVar[JSON], writeResultFile, writeSummary. - The user script is wrapped in '(async () => { ... })()' so top-level 'await' works. - The wrapped source is written to a file before invoking node, so stack traces refer to the user's line numbers. Dockerfile / Dockerfile-local install Node.js 20 from NodeSource so the task works in the published image.

…ript The matrix-rendering aggregator is no longer a Go task — the playbook now produces the markdown directly with a run_javascript step that walks every sibling task's matrixRow output. ~600 lines of Go + README disappear in favor of ~80 lines of self-contained JavaScript inside the playbook. In passing: - Rename check_consensus_api outputs/config 'checkId'/'checkTitle' to generic 'rowId'/'rowTitle'. The matrix consumer can target any task conforming to the {rowId, rowTitle, referenceUrl, matrixRow} convention — not just check_* tasks. - Tighten lint pass on check_consensus_api: extract methodGet / clientTypeUnk / maxResponseLen / defaultSSETimeoutSeconds constants, pass Config by pointer to classifyHTTPResult, name resolvePath results, and add the wsl_v5 blank-line padding the project requires.

ResolveQuery prepends '.' before parsing, so '| tasks | to_entries[]' expands to '. | tasks | to_entries[]' — and gojq has no global 'tasks' function so the iter yielded nothing, env.ROWS arrived undefined, and the renderer threw 'filter is not a function'. Existing playbooks (builder-lifecycle.yaml etc.) consistently use the '.tasks.<id>.outputs.<field>' form. Match it.

…urrent are collected Top-level '.tasks' only contains direct children of the test; child tasks of run_tasks_concurrent (etc.) register their status vars in the wrapper's scope. Use jq's recursive '..' to walk every nested map and pick out matrixRow-bearing objects, then unique_by(rowId) to drop any duplicates from the walk.

…ics on time.Time values

The 'tasks | to_entries[]' query crashed gojq with a panic on time.Time values nested in get_consensus_specs' spec map (MIN_GENESIS_TIME etc.). yaml roundtrip in ResolveQuery turns the iso8601 strings back into time.Time, and gojq's 'type' builtin (and friends) panic on that type. Switch to explicit per-row env vars — each query is now a simple path expression that never iterates a heterogeneous map. The JS side glues them back together. More verbose but robust.

Children of run_tasks_concurrent register their ids under the wrapper's variable scope, not the root — so the matrix renderer (running at root) couldn't resolve 'tasks.row12_sse_*.outputs'. Flat layout: sequential SSE subscription per topic. Adds ~4 min to the playbook (5 × 48s) but lets the matrix collector find every row by id. Also drops the temporary DEBUG dump task.

If the SSE subscription was accepted (route exists, topic name recognized) but no events arrive in the window, we've still proved the endpoint syntax — that's what 'api-compatibility' is about. Whether or not events actually fire during the window is a chain-state question, not a client compatibility one. Status moves from partial -> pass; note rewrites to drop the 'but'.

Previously the only way for a playbook to surface a top-level artefact like a compatibility matrix was to drop it under a specific task's result files — three clicks deep, hard to find unless you already know which task to look in. This patch adds a centralised run-level Result: - New env var $ASSERTOOR_TEST_RESULT exposed by run_shell and run_javascript. Points at a shared markdown file owned by the test-run's scheduler; every task in the run sees the same path. Multiple tasks can write to / append to it. - TaskScheduler owns the file (lazy-created in a per-run temp dir, cleaned up at end of RunTasks). After every task finishes, the current file contents are upserted into the existing task_results table under sentinel TaskID=0 / Type="test_result". - run_javascript preamble grows two helpers: writeTestResult(s) and appendTestResult(s) so user scripts don't have to fs.writeFileSync by hand. (Execute is split into Execute + buildCommandEnv to keep cyclomatic complexity in line.) - New API endpoint GET /api/v1/test_run/{runId}/result returns the markdown (text/markdown) or 204 when nothing was written. - New web-ui component RunResultPanel renders the markdown inline on the test-run page, just below the timeline. Polls every 5s while the run is in flight, stops once it ends. A "view raw" link points at the API for direct download. - get_test_run_details_api.go already excluded TaskID=0 from result_file headers (the task-level pane); fine-tunes a wsl_v5 whitespace nit while there.

Keeps the per-task matrix.md too (still downloadable + inline-rendered in the task pane), but also publishes it to the run-level Result panel — one click from the test-run page.

Walks recent runs of the test (newest first) and returns the first $ASSERTOOR_TEST_RESULT markdown blob found, so the dashboard can surface the latest meaningful result with one round trip instead of fanning out to fetch every run individually. Two response modes: raw markdown (default) plus metadata in X-Run-* headers, or a JSON envelope with run_id/status/start_time/markdown when ?meta=1 is set.

Adds LatestResultResponse type, getLatestTestResult() client call, and useLatestTestResult() react-query hook. The hook is used by the new dashboard's latest-result tile.

Adds a fully configurable tile grid that backs the new dashboard: - types.ts: tile model (4 types — success_rate, latest_result, recent_runs, text), persisted config schema (versioned, localStorage) - useDashboardConfig: hook owning the config + imperative API (add/remove/move/resize/reset) - TileGrid: 12-col responsive layout, edit-mode action strip - SuccessRateTile: rate ring + per-run swatches over last N runs - LatestResultTile: renders $ASSERTOOR_TEST_RESULT markdown of the most-recent run that produced one - RecentRunsTile: compact live-updating list of runs - TextTile: free-form markdown - AddTileModal: type-picker for new tiles - TileEditorModal: per-tile configuration (test picker, window size, markdown body, etc.)

The old Dashboard was 500+ lines that did one thing: paginate the test-runs table. That table now lives on the new /runs page; the home page becomes a real overview. The new Dashboard wires up the tile system: an edit-mode toggle, an Add-tile modal, a per-tile editor modal, and grid mutations (add/remove/move/resize). Out of edit mode it is pure read-only output.

New page hosts the test-runs table that used to live on /. It's a split pane: - left: registered tests with search; clicking a test scopes the right pane to its runs - right: paginated runs table (status, started, duration, actions), with bulk delete + 'Start test' from the active scope The selected test is reflected in the URL (?testId=…) so deep links and reloads work. Routing & nav are updated: 'Runs' sits between Dashboard and Library.

The 'SUCCESS RATE · DEMO · RUN-LEVEL RESULT · width-picker · ↑↓✎✕' strip was wrapping awkwardly on small (3/12) tiles, occluding the tile body. Two changes fix it: - drop the title from the strip; it's already shown inside the tile, and the type label alone is enough context - replace the wordy width-picker ('small (3/12)') with a single-letter variant (S / M / L / XL); full label is in the title attribute Now the entire strip fits on one line at every supported width.

- All three Dockerfiles now pull node 24 from NodeSource (20 is EOL). - Dockerfile-stub (the CI stub image that copies a pre-built binary) was missing node entirely — added it the same way as the real multi-stage images, so playbooks using the run_javascript task can run inside the stub image too. - Removed the now-redundant 'nodejs is required by run_javascript' comments from the Dockerfiles; the dependency is self-evident.

Adds two new endpoints powering the reworked dashboard: - GET /api/v1/network_status — aggregated chain & orchestrator snapshot (head slot/epoch, finalized + justified checkpoints, EL head, per-layer client readiness counts, queue depth). Cheap; served entirely from the in-process pool. - GET /api/v1/dashboard_config (public) PUT /api/v1/dashboard_config (auth-required) The dashboard layout now lives on the server in a small key/value table (sqlite + pgsql migrations), so it can be edited only by authenticated users — anonymous viewers see the same dashboard but can't mutate it. The body is treated as an opaque JSON blob; schema validation lives in the client.

Major rework of the dashboard editing experience: Data model - Tiles now live inside named **rows** (horizontal containers). Rows visually segment the dashboard and can be reordered as units. - Config schema bumped to version 2; no migration from v1 (per user request) — invalid blobs reset to the default dashboard. - Config is server-backed (replaces localStorage). The hook is a react-query mutation that PUTs after a 250ms debounce so rapid drag-and-drop edits coalesce into one round-trip. Edit mode - Each tile is now visually framed in edit mode (chip header + bordered body), so the type label can't be confused with the tile above it — the bug from the previous edit-mode screenshot. - Each row gets its own dashed-border frame with row-level controls (rename, move up/down, remove). - Add-tile modal is gone. Replaced with a sidebar palette (like the builder) of draggable tile types — drop them on any row's drop zone, or click to append to the last row. - Tiles are draggable too: rearrange within a row, move across rows. Powered by @dnd-kit/core. - Export / Import buttons round-trip the config as JSON (download + file picker). - Edit-mode controls are hidden when the user isn't authenticated; the PUT endpoint requires auth, so anonymous editing would silently fail to persist. New tile types - client_status — per-client EL/CL liveness dots + heads, polls the existing /clients endpoint. - network_status — head slot/epoch, finalized + justified checkpoints, EL head, queue depth, ready-client counts; polls the new /api/v1/network_status endpoint.

Edits during edit-mode now stage to a local 'draft' state. The react-query cache still mirrors the server-side baseline; the draft only flushes to the server when the user clicks 'Save changes'. This fixes the regression where rapid mutations (add row, drag tile from palette, etc.) would appear to disappear: the previous debounced PUT raced with re-renders and was overwriting optimistic updates on some auth states. With an explicit save we never touch the server during edits, so there's no race. UX changes: - 'UNSAVED' badge in the title when the draft differs from the server config. - New toolbar buttons: 'Save changes' (primary, enabled when dirty) and 'Discard' (only shown when dirty). - 'Done' confirms before exiting with unsaved changes. - beforeunload guard so tab close / refresh prompts when dirty. - Reset and Import both stage into the draft — neither hits the server until the user saves.

The latest_result, recent_runs, client_status and text tiles can all grow with their content. On a dashboard with a long matrix or a 30-client fleet, that pushes everything else off-screen. Add an optional heightPx field on those four tile configs. When set, the tile's outer card gets max-height: <px>px and the inner body becomes scrollable (the flex-col layout was already in place for three of them; TextTile got refactored to match). The tile editor exposes heightPx as a 'Max height (px, optional)' input — leave it blank for the previous behaviour (size to content). Also tidies the editor: client_status now has its own editor body (showExecution checkbox + height), and network_status renders a short 'no extra settings' note instead of an empty form.

Three new HTTP endpoints powering the schedule rework, plus an extended schedule API that preserves the legacy shape: - PUT /api/v1/test/{testId}/schedule (auth) Replace a registered test's schedule (startup, cron[], skipQueue). Cron expressions are validated up-front; invalid input rejects the whole request. - GET /api/v1/test/{testId}/next_run Walks each cron expression and returns the next firing time per expression plus the overall earliest. Drives the 'next in 7m' hint shown on the runs page banner. - GET /api/v1/test_queue Returns the runner's running + pending queue in execution order. Feeds the StartTestModal's QueuePicker. POST /api/v1/test_runs/schedule gains a new optional 'queue' field (mode: immediate|end|after, after_run_id). The legacy 'skip_queue' boolean stays as a deprecated fallback — when both are supplied 'queue' wins. Internally, types.Coordinator gains ScheduleTestWithOptions; the 4-arg ScheduleTest is preserved and now forwards to it. The testrunner can splice into c.testQueue at an arbitrary position (after the named RunID); falls back to 'append' silently if the target has already started/finished. TestRegistry.UpdateTestSchedule mutates the descriptor under a schedule mutex and re-upserts the test_configs row so changes survive restarts.

Adds a CronEditor and a ScheduleCard component that surface a test's schedule wherever it matters: - Library → Local Playbooks → expanded test row: card variant with the full breakdown (Startup / Skip-queue toggles, each cron expression + cronstrue-rendered description). - Runs page → above the runs table when a test is selected: banner variant — one-line summary + 'next in 7m (HH:MM)'. The editor itself (opened from either surface) supports: - 7 one-click presets (every minute, hourly, daily, weekly, …) - Free-form crontab input validated by cron-parser - cronstrue translation under each expression - The next 3 firings rendered inline - Per-row remove + 'add another expression' - 'Clear schedule' to wipe everything Edits are auth-gated end-to-end: the modal's Save button calls the new auth-required PUT endpoint; anonymous users see the card in read-only mode (no Edit button).

The Run dialog gets two clearly distinct sections: - 'Test configuration' — the per-test knobs (now in a bordered card with an uppercase header + config icon). - 'Run options' — bordered card with a settings icon, holding: - 'Queue placement': the new QueuePicker dropdown - 'Allow duplicate' checkbox QueuePicker is a rich, 2-level nested dropdown: Level 1: - Run immediately (mode='immediate'; parallel slot) - Add to queue · End (mode='end'; append) - Add to queue · After… (opens level 2) Level 2: - One entry per currently running or queued test, each with a status dot + run id + name. Selecting one sets mode='after', after_run_id=<run id>. The dropdown renders in-flow (not absolutely positioned) so the containing modal scrolls naturally when the menu overflows. Also fixes an unrelated 'renders 0' bug — when a test's default timeout was exactly 0, React rendered the falsy 0 inside the test-info card.

No need for a dedicated table + migration just to hold one row. The existing assertoor_state KV store (key TEXT primary, value TEXT) already serves exactly this purpose for other singletons in the codebase, so reuse it. Changes: - Drop pkg/db/dashboard_config.go and the two 20260518150000_*.sql migrations (sqlite + pgsql). - Rewrite GetDashboardConfig / PutDashboardConfig to round-trip the JSON blob through GetAssertoorState / SetAssertoorState under the fixed key 'dashboard_config'. json.RawMessage is the carrier type so the bytes pass through verbatim (RawMessage.MarshalJSON is identity), avoiding the double-encoding the KV's generic interface{} path would otherwise do. Also gofmt-fixes a couple of struct-tag tables in the new API files and tidies an exhaustive-switch warning by listing the terminal TestStatuses explicitly. Verified end-to-end: - GET (no config) → 204 - PUT valid JSON → 200, persisted - GET → returns the same JSON verbatim (not double-encoded) - PUT invalid JSON → 400 with a clear error - migration log shows only the 4 pre-existing migrations

`make lint` now passes with 0 issues. None of the touched code paths change behaviour — every fix is annotation-only. Three buckets: - **errcheck on crypto/rand.Read** in the three spam-transaction generators (random-target address paths). crypto/rand never fails on Linux; the zero fallback would be a harmless spam target if it ever did. Now explicitly discarded with `_, _ = rand.Read(...)` and a one-line note. - **gosec G115 integer-overflow conversions** (12 sites across pkg/clients, pkg/tasks/generate_*, pkg/txmgr). Each is suppressed with `//nolint:gosec` and a short reason — most are guarded by a `> 0` check on the line above, or operate on non-negative slice indices. - **nolintlint 'unused directive'** in pkg/web/api/get_test_yaml_api.go. Three stale `//nolint:gosec` comments left over from a refactor — the underlying gosec rules no longer fire here, so the directives themselves are now lint findings. Removed.

My earlier 'pre-existing lint cleanup' commit was run against local golangci-lint v2.6.0, which flags G115 integer-overflow conversions. CI runs v2.11.3, which does NOT flag those — so every G115 nolint became an 'unused directive' nolintlint error in CI. Two corrections: - Restore the three //nolint:gosec annotations on pkg/web/api/get_test_yaml_api.go that v2.11.3 still flags (G704 SSRF on the HTTP request paths, G703 path-traversal on the local file read). Both inputs come from configured test sources, never from request bodies. - Drop the 13 //nolint:gosec annotations on G115 integer-overflow conversions across pkg/clients/, pkg/tasks/generate_*, and pkg/txmgr/spamoor.go. v2.11.3 doesn't fire on these in the first place, so the directives are unused. Verified clean under golangci-lint v2.11.3 (the version pinned in .github/workflows/_shared-check.yaml): $ golangci-lint run --timeout 5m ./... 0 issues.

Resolves to a stable point a few slots back from head (default 4 slots ≈ 48s) plus the canonical block root at that slot. This lets the GLOAS API playbook target a slot where derived state like execution-payload envelopes and PTC committee data is reliably available across every client, instead of racing against head. New placeholders, all offset-arithmetic capable: - {recent_slot}, {recent_slot+N}, {recent_slot-N} - {recent_epoch}, {recent_epoch+N}, {recent_epoch-N} - {recent_block_root} (root fetched via GetBlockHeaderBySlot) Implementation details: - resolvePath() builds a pathContext once per task execution, doing a single best-effort GetBlockHeaderBySlot RPC for the recent root. - resolvePlaceholder() now takes the pathContext value-struct instead of an ever-growing tuple of head{Slot,Epoch,Root}. - The signed-offset parser (+N/-N) is unchanged, so '_' in 'recent_slot' is correctly part of the keyword. - On lookup failure the recent root falls back to head, so the task still produces something usable even if /beacon/headers is briefly unavailable.

The original matrix was dominated by 🟡 footnotes saying 'error status 404 but body does not match ErrorMessage schema'. Those reflected our strict OpenAPI ErrorMessage requirement (`required: [code, message]`) more than they reflected anything about endpoint compatibility. CL implementations diverge wildly on error-body shape (some omit `code`, Lodestar wraps differently, Nimbus/Grandine return extra fields, etc.). Two changes: - **Permissive errorSchema everywhere**: `errorSchema: { type: object }`. We only require that the body parses as a JSON object — which is enough to distinguish 'route registered, returned a structured error' from 'connection refused / HTML 404 / random text'. The spec-correct `{code, message}` schema would still flag legitimate outliers, but for a cross-client compatibility matrix it produced more noise than signal. - **Stable inputs via {recent_*}**: row 7 (envelope/{slot}/{root}) was using head-slot/head-root, but the envelope for the very head may not be produced yet on every client. Now it uses {recent_slot}/{recent_block_root} (head - 4 with the canonical root fetched via /beacon/headers), so each client has a stable, agreed-upon target. Same idea for rows 2, 4, 8, 9. - **Expanded expectStatuses**: rows 2/4 now accept 500 and 501. A 500 from a stub handler on synthetic input still tells us the route is registered (which is the question we're answering); 501 with a structured body is a partial-implementation signal worth capturing as 🟡-or-better, not ❌. - **Loosened success schema on row 6**: clients return the envelope in slightly different wrappings (some {version,data}, some bare). Accept any JSON object — the structural validation belongs in spec-conformance tests, not the compatibility matrix. Description, version (1.0.0 → 1.1.0) and emoji-legend updated to match the new contract.

…research/dashboard-rework

The eventstream library's Stream.Ready is an unbuffered channel that the stream goroutine writes to before entering its read loop (see eventstream.go:169 — `stream.Ready <- true`). If no consumer drains it that send blocks forever, the receive goroutine never starts, and every SSE topic appears to produce zero events even when the server is actively pushing them. That was the root cause of the SSE matrix rows always showing 'subscription opened (no events within window)' — including for high-volume topics like execution_payload where events are guaranteed under a healthy GLOAS chain. Drain Ready immediately after Subscribe returns, gated on the same context the rest of the SSE loop uses so a stalled subscription still terminates the task cleanly.

Two focused, generic tasks playbooks can compose with configVars to feed realistic inputs into other tasks: - get_consensus_block_header — fetch one beacon-block header by slot, root, or 'head - headOffset' (walking past missed slots). Outputs slot, root, proposerIndex, parentRoot, stateRoot. - get_consensus_proposer_duties — fetch the proposer schedule for one epoch (absolute, or current + epochOffset). Outputs the duties array plus convenience fields: the first duty strictly after the current head (proposerSlot + validator index for produceBlockV4 inputs) and up to N deduped real validator indices for body-payloaded endpoints. These replace the earlier monolithic gather_gloas_context concept: they're small and orthogonal, can be used independently in any playbook, and don't carry domain assumptions about which fields a particular check needs.

…l SSE The matrix now actually verifies the response shape: each GET row declares a responseSchema listing every required field from the spec PR (#552, #580, etc.), so a client returning the wrong wrapper or missing a GLOAS-new field (execution_payload_bid, payload_attestations, the envelope's payload sub-fields, etc.) earns a 🟡 instead of a false ✅. Inputs come from two new generic tasks rather than synthetic placeholders: - tasks.recent -- get_consensus_block_header(headOffset=4): a canonical slot/root pair guaranteed to have a committed envelope on every client. - tasks.duties -- get_consensus_proposer_duties(epochOffset=1): the next-epoch proposer schedule, which gives us a real future-slot proposer for produceBlockV4 and a list of live validator indices for PTC duties. Per-row wiring via configVars/pathParams: - row02 (produceBlockV4): slot = first future proposer slot - row04 (get_bid): slot + builder_index = future proposer pair - row06 (envelope/{id}): block_id = recent root - row07 (envelope/{s}/{r}): slot + beacon_block_root = recent pair - row08 (ptc duties): epoch = duties.epoch, body = real validator indices - row09 (payload_attestation_data): slot = recent canonical slot SSE rows (12-16) now run concurrently inside one run_tasks_concurrent wrapper with newVariableScope:false — children still register at the root tasks scope so the matrix renderer can find them by id. That cuts the SSE wallclock from ~4 minutes (5 × 48s sequential) to ~48s. Permissive errorSchema everywhere ({type:object}) is unchanged from the previous pass — error-body shapes diverge legitimately and that divergence isn't what the matrix is trying to measure.

vars.Variables.ConsumeVars prefixes every query with '.' unless it already starts with one, so '{ key: val }' becomes '.{ key: val }' which is a syntax error. Three-fold fix: - Lead each query with a '|' so the auto-prefix produces '.| { ... }' (identity | object constructor — valid jq). - Inside the object constructor, field accesses need a leading '.' (jq's bare 'foo.bar' shorthand only works at the very start of an expression). 'tasks.X.Y' becomes '.tasks.X.Y'. - 'body' uses no constructor so just keeps the canonical 'tasks.duties.outputs.validatorIndices | map(tostring)' form (which the auto-prefix turns into a valid '.tasks…'). Every query parsed with gojq before commit; all seven now compile.

…assify empty body as missing Two bugs the live e2e against glamsterdam-devnet-3 surfaced: - resolvePath had an early return when every placeholder came from explicit pathParams. The substitution loop was only hit when the function had to walk chain state, so requests like 'GET /eth/v1/beacon/execution_payload_envelope/{block_id}' with block_id supplied via configVars went out with the literal {block_id} URL-encoded as %7Bblock_id%7D and clients responded with 400 'invalid block ID'. The matrix then graded the test as 'well-formed error' (✅) — a false positive across the board. - Empty (or non-JSON) bodies on a documented error status were silently being scored as schema-mismatch partial (🟡). Empty 4xx bodies are how most web frameworks signal a fully-missing route (the global not-found handler), so classify them as ❌ 'route likely missing' instead. Same for non-JSON bodies — every client that actually implements an endpoint sends a structured error body. Also rewrites every responseSchema and eventSchema in the playbook to match the canonical beacon-APIs types (~/repos/beacon-APIs): - Row 2 (produceBlockV4): body uses 'signed_execution_payload_bid' (NOT 'execution_payload_bid'); 'blob_kzg_commitments' and 'execution_requests' are NOT body fields in gloas. - Row 4 / SSE 12 (bid): full ExecutionPayloadBid required-field list (parent_block_hash, parent_block_root, block_hash, prev_randao, fee_recipient, gas_limit, builder_index, slot, value, execution_payment, blob_kzg_commitments). The previous 'blob_kzg_commitments_root' field doesn't exist in the spec. - Row 6/7 (envelope GETs): wrapper is {version, execution_optimistic, finalized, data}; ExecutionPayloadEnvelope required fields are exactly 5: payload, execution_requests, builder_index, beacon_block_root, parent_beacon_block_root (no slot, state_root, blob_kzg_commitments — those live elsewhere). - Rows 9/11 (payload attestation data, pool/payload_attestations): PayloadAttestationData uses 'payload_present' + 'blob_data_available' booleans (not 'payload_status'); pool response is wrapped {version, data}. - SSE 13 (execution_payload_available): field is 'block_root' (not 'beacon_block_root'). - SSE 14 (payload_attestation_message): wrapped {version, data}, inner data carries 'payload_present'/'blob_data_available'. - SSE 15 (execution_payload): flat {slot, builder_index, block_hash, block_root, execution_optimistic}. - SSE 16 (execution_payload_gossip): same as 15 minus execution_optimistic. Also fixes row 10's POST body to use the correct PayloadAttestationMessage shape (validator_index outside, nested data with the four PayloadAttestationData fields).

Previous run revealed false-positive ✅s on routes that DON'T exist but the framework returns a structured-JSON 404. Lighthouse's global not-found handler returns `{code,message:NOT_FOUND}`; Lodestar's returns `{code,message:"Route POST:/eth/v1/foo not found"}` — both are technically valid JSON objects that pass an open schema, but neither indicates that the route is registered. Add a tiny content-aware classifier that scans the error `message` for generic framework-level phrases and treats those as 'route likely missing' instead of 'well-formed error'. Patterns covered: NOT_FOUND, route X not found, unsupported endpoint version, endpoint not found, unknown endpoint, unknown route, no matching route, method not allowed, no handler The patterns are tight enough to ignore domain-specific 404s ('Execution payload envelope not found', 'block not found at slot 100', 'Currently syncing', etc.). Verified on 9 sample messages — all classify correctly.

…work vs domain 404s Previous run flipped Lighthouse's row 6 (envelope/{block_id}) to a false ❌ because its body was 'NOT_FOUND: execution payload envelope for block root 0xe77…' — the route IS registered, the specific resource just isn't available. LH happens to prefix every error message with 'NOT_FOUND:' (its global error format), so the substring match was overreaching. Tighten: only treat the literal 'NOT_FOUND' / 'not found' message (no detail) as a framework miss. Anything with a colon and detail ('NOT_FOUND: execution payload envelope for block …') stays as a real domain 404 and counts as ✅ for endpoint existence. Lodestar's 'Route POST:/eth/v1/foo not found' continues to match via the dedicated 'route … not found' branch; the other patterns (unsupported endpoint, unknown route, method not allowed, …) are unchanged. Verified on 11 sample messages — all classify correctly including the new LH domain case.

…ents) beacon-APIs PR #580 was revised mid-flight: the response shape now uses `execution_payload_included` (boolean flag) instead of the originally proposed `execution_payload_value` (string), and `data` is anyOf {BeaconBlock, BlockContents} where BlockContents = {block, execution_payload_envelope, kzg_proofs, blobs}. Top-level fields are now { version, consensus_block_value, execution_payload_included, data }. Prysm already implements the revised shape (returns {version, consensus_block_value, execution_payload_included, data: {block,...}} for self-built blocks). The old schema required `execution_payload_value` and put block fields directly under `data`, which incorrectly flagged Prysm's spec-conformant response as 🟡 "response body does not match success schema". Rewrite the schema with $defs + anyOf to accept either shape — clients that diverge from BOTH variants still surface mismatches.

Lighthouse strictly requires `skip_randao_verification` to be a presence-only flag — the underlying SkipRandaoVerification type accepts None (= No) and Some("") (= Yes) but rejects any other value with `Invalid query string`. Sending `skip_randao_verification=true` made LH return 400 at the query-parse stage, so we never reached the handler and never validated the success response body. Switching to the spec-conformant empty-value form (`?skip_randao_verification=`) keeps Prysm and Lodestar happy (both accept either form) and lets LH actually produce a block, so its response is validated against the PR #580 schema instead of being credited as a "well-formed 400".

…it rate The envelope-by-block-id row (row 6) probes each client with a single root picked from the first online consensus client. With headOffset=4 the chosen slot was often past Nimbus's envelope retention window, so Nimbus returned 404 instead of 200 and we never validated its actual response schema (which lacks the spec-required `version` field). Drop the offset to 1 so the root we probe is the most-recent slot that's almost guaranteed to be canonical on every client AND still have its envelope cached. Keeps the existing maxLookback fallback (now 4) for the rare case where head-1 was a missed slot.

A stability run failed when slots 190..193 on the devnet were all missed in quick succession (the devnet briefly desynced and recovered through a small run of empty slots). With maxLookback=4 the recent block resolver only looked at slot{N..N-3} and bailed, taking the whole playbook down before it ever reached the matrix. 16 slots is roughly half an epoch — enough to absorb any realistic run of misses on a healthy devnet while still picking a recent root.

…y param Per beacon-APIs PR #580, the canonical endpoint is GET /eth/v1/validator/execution_payload_envelope/{slot} with `beacon_block_root` as a *query parameter* (re-org resistance: the BN returns 404 when its cached envelope is for a different block root). Response data is the unwrapped ExecutionPayloadEnvelope struct — no `message`/`signature` wrap; the VC signs the envelope. The previous test queried `{slot}/{beacon_block_root}` as path segments and validated a `{message, signature}` shape inside `data`. That layout matches Lodestar's (non-spec) implementation but missed on Lighthouse and Prysm, both of which correctly register the spec-conformant 1-segment path. Net effect: the matrix flipped the verdicts — Lodestar showed ✅, LH/Prysm showed ❌ — exactly opposite of reality. Fix: probe the spec-conformant URL with query param and validate the spec-conformant response shape. LH/Prysm should now correctly resolve the route (status depends on cache state), Lodestar should 404 (no route for the 1-segment form on Lodestar).

pk910 added 30 commits May 14, 2026 23:37

playbook: revert to flat .tasks walk; recursive '..' triggered jq pan…

a925799

…ics on time.Time values

DEBUG: dump task ids

1b22d14

playbook: write matrix to ASSERTOOR_TEST_RESULT for run-level visibility

efc8b2d

Keeps the per-task matrix.md too (still downloadable + inline-rendered in the task pane), but also publishes it to the run-level Result panel — one click from the test-run page.

web-ui: api hook for latest test result

723eb11

Adds LatestResultResponse type, getLatestTestResult() client call, and useLatestTestResult() react-query hook. The hook is used by the new dashboard's latest-result tile.

pk910 changed the title ~~dashboard rework~~ dashboard rework & api check workflow May 18, 2026

Merge branch 'master' into research/dashboard-rework

48d762b

pk910 changed the title ~~dashboard rework & api check workflow~~ Dashboard rework, GLOAS beacon-API compatibility playbook & cron scheduling May 18, 2026

pk910 added 20 commits May 18, 2026 23:19

update default dashboard

6f2e316

Merge remote-tracking branch 'origin/research/dashboard-rework' into …

657a500

…research/dashboard-rework

Merge branch 'master' into research/dashboard-rework

3923009

pk910 merged commit edeb601 into master May 19, 2026
9 checks passed

pk910 deleted the research/dashboard-rework branch May 19, 2026 22:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dashboard rework, GLOAS beacon-API compatibility playbook & cron scheduling#179

Dashboard rework, GLOAS beacon-API compatibility playbook & cron scheduling#179
pk910 merged 51 commits into
masterfrom
research/dashboard-rework

pk910 commented May 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pk910 commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

1. GLOAS beacon-API compatibility playbook

2. Configurable dashboard

3. Split-view Runs page

4. Cron scheduling, surfaced

Reworked Run dialog

API compatibility

Plumbing changes

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pk910 commented May 18, 2026 •

edited

Loading