Skip to content

Dashboard rework, GLOAS beacon-API compatibility playbook & cron scheduling#179

Merged
pk910 merged 51 commits into
masterfrom
research/dashboard-rework
May 19, 2026
Merged

Dashboard rework, GLOAS beacon-API compatibility playbook & cron scheduling#179
pk910 merged 51 commits into
masterfrom
research/dashboard-rework

Conversation

@pk910
Copy link
Copy Markdown
Member

@pk910 pk910 commented May 18, 2026

This branch is a chunky rework of the assertoor UX. It bundles three independent threads of work that ended up touching the same surfaces (test runs, results, scheduling), so they ship as one PR.

TL;DR

  1. GLOAS beacon-API compatibility playbook — a generic check_consensus_api task + a 16-row compatibility matrix playbook that probes every endpoint introduced or changed by the GLOAS / ePBS spec PRs.

  2. Configurable dashboard — the static "latest test runs" home page is replaced with a fully configurable tile grid (rows + drag-and-drop + 6 tile types). The runs list moves to its own /runs page with a split-pane Tests | Runs view.

  3. Cron scheduling, surfaced — schedules are now first-class: editable inline in the library page and on the runs page, powered by a nice cron editor (presets, validation, next-3-firings preview). The Run dialog gets a rich two-level Queue picker.

Every existing public API endpoint keeps working unchanged; new shapes are added alongside the old ones as additive fields.

1. GLOAS beacon-API compatibility playbook

image

A reusable per-client API probe + the playbook that wires 16 of them into a compatibility matrix.

  • New task: check_consensus_api — generic per-client API probe. Resolves placeholders, makes the HTTP call (or opens an SSE topic), validates the response shape against an inline JSON-Schema, classifies the result (pass / partial / fail / skipped), and emits a structured matrixRow output.
  • New task: run_javascript — runs a JS snippet via system Node.js with a small preamble exposing env, setOutput, setVar, writeResultFile, writeSummary and the new writeTestResult/appendTestResult helpers. Replaces the shell+jq approach that turned out to be the wrong language for collating structured task outputs.
  • Test-run-level Result mechanism — new \$ASSERTOOR_TEST_RESULT env var exposed by run_shell and run_javascript points at a shared markdown file. After every task the scheduler syncs the file to the DB (sentinel task_id=0, type=test_result). The result is served as text/markdown via the new GET /api/v1/test_run/{runId}/result endpoint and rendered prominently at the top of the test-run page.
  • New playbook: playbooks/api-compatibility/gloas-api-check.yaml — 11 HTTP + 5 SSE probes, each carrying its own schema. A final run_javascript step formats the results into the matrix and writes it to \$ASSERTOOR_TEST_RESULT.
  • Bumped Node to 24 across all Dockerfiles; Dockerfile-stub (the CI stub image) gains Node too so run_javascript-using playbooks work in CI.

2. Configurable dashboard

image

The old home page was a single paginated test-runs table. It is now:

  • A configurable tile grid persisted server-side (GET / PUT /api/v1/dashboard_config; auth-gated PUT).
  • Rows act as horizontal containers; each row holds tiles that share a 12-col responsive grid.
  • 6 tile types:
    • success_rate — per-test success ring + per-run swatches.
    • latest_result — markdown blob of the most recent run that produced one (uses \$ASSERTOOR_TEST_RESULT).
    • recent_runs — live-updating list (all tests or scoped).
    • client_status — per-client EL/CL liveness + heads.
    • network_status — head/finalized/justified checkpoints, queue depth, ready-client counts (via new GET /api/v1/network_status).
    • text — free-form markdown.
  • Tiles with potentially unbounded content (latest_result, recent_runs, client_status, text) accept an optional heightPx; the body scrolls when capped.
  • Edit mode is auth-gated, drag-and-drop powered by @dnd-kit, with a sidebar palette (replaces an earlier modal), explicit Save changes / Discard flow (no live PUT), JSON import/export, beforeunload guard.

3. Split-view Runs page

The runs list moves off the home page to its own /runs route:

  • Left pane: registered tests with search, click to scope the right pane.
  • Right pane: paginated runs table (status, started, duration, bulk actions) plus, when a test is selected, a schedule banner at the top (next-firing hint + Edit button).
  • The selected test is reflected as ?testId=… so deep links and reloads work.

4. Cron scheduling, surfaced

Schedules used to be hidden inside YAML and never editable from the UI. Now:

  • New endpoints:
    • PUT /api/v1/test/{testId}/schedule (auth) — set/replace a test's schedule. Cron expressions are validated up-front.
    • GET /api/v1/test/{testId}/next_run — next firing per expression + the overall earliest.
    • GET /api/v1/test_queue — live runner queue (running + pending) used by the Queue picker.
  • CronEditor — preset chips (every minute / hour / day / week / …), free-form crontab input validated by cron-parser, cronstrue translations, next-3-firings preview.
  • ScheduleCard — surfaces the schedule + next-firing hint prominently:
    • In Library → Local Playbooks → expanded test as a full card.
    • On the Runs page as a banner above the runs table when a test is selected.
  • Test runner: ScheduleTestWithOptions adds support for splicing into the pending queue at an arbitrary position (after_run_id). Schedule updates mutate the descriptor under a mutex and re-upsert the test_configs row so they survive restarts.

Reworked Run dialog

  • Visually separates Test configuration (the per-test knobs) from Run options (how/when the run lands on the runner).
  • The skip_queue checkbox is replaced with a QueuePicker — a rich two-level nested dropdown:
    • Level 1: Run immediately · Add to queue · End · Add to queue · After…
    • Level 2: live list of running/queued tests with status dots, run IDs and names — pick one to insert directly behind it.
  • The dropdown renders in-flow so the modal scrolls naturally when the menu overflows.

API compatibility

POST /api/v1/test_runs/schedule keeps skip_queue working — the new queue field is additive and wins when both are supplied. The 4-arg Coordinator.ScheduleTest Go signature is kept and now forwards to ScheduleTestWithOptions.

Plumbing changes

  • Dashboard config persists via the existing assertoor_state KV store (key dashboard_config), so no new DB migrations.
  • Internal Coordinator interface gains ScheduleTestWithOptions and TestRegistry.UpdateTestSchedule.
  • Descriptor gains a mutex-guarded GetSchedule/SetSchedule pair so cron mutations are race-free.

Testing

Verified end-to-end against a live kurtosis enclave with 5 CL clients (lighthouse-supernode, prysm, grandine, nimbus, lodestar):

  • Full 16-row GLOAS matrix renders, with SSE rows correctly flipping to ✅ where the subscription was syntactically valid.
  • Dashboard tiles populate from real data; rows + drag-and-drop + add/remove/save/discard exercised via Playwright.
  • Schedule editor saves a cron entry; /next_run updates the banner next in 7m (5:45 PM) text on the runs page.
  • POST .../schedule with {"queue":{"mode":"after","after_run_id":N}} splices a new run directly behind #N in the pending queue; legacy {"skip_queue":true} callers still work.

pk910 added 30 commits May 14, 2026 23:37
Generic per-client beacon-API probe used as the building block for
cross-client API-compatibility playbooks. The task hits a single HTTP
endpoint (or subscribes to one SSE topic) on every connected consensus
client, classifies each response (pass / partial / fail / skipped), and
emits both per-client results and a 'matrixRow' output collapsed by
client-type for downstream aggregation.

Key behaviours:
- Path placeholders ({slot}, {epoch}, {block_id}, {beacon_block_root},
  {builder_index}, {validator_index}) are auto-resolved from chain state;
  offsets like {slot+5} / {epoch-1} are supported.
- HTTP classification splits expectStatuses into successStatuses (run
  responseSchema) and errorStatuses (run errorSchema = ErrorMessage
  shape). Schema-valid 4xx still counts as 'pass' because it proves the
  endpoint exists and parses the request.
- SSE mode subscribes to /eth/v1/events?topics=..., waits a configurable
  window for at least minEvents matching events, optionally validates
  each event payload against eventSchema.
- requireForkActive skips clients whose head hasn't reached the named
  fork's activation epoch.
- Per-client probes run concurrently up to the configured concurrency
  cap; the task aggregates pass/partial/fail counts and either succeeds,
  fails-on-any, or fails-on-all-error depending on config.

Adds github.com/santhosh-tekuri/jsonschema/v5 for inline JSON-Schema
validation of response / event bodies.
New aggregator task that walks the test run, picks up every
check_consensus_api task's matrixRow output, and renders a markdown
compatibility table. The same markdown is stored both as the task's
'summary' result (so it renders inline at the top of the task pane in
the UI) and as a downloadable matrix.md result file, alongside a
matrix.json dump for machine consumption.

Columns are ordered by client-type (Lighthouse, Teku, Prysm, Grandine,
Nimbus, Lodestar, Caplin) with two-letter labels matching the
human-readable style; unused client-type columns are dropped by default
(toggle with showAllClientTypes). Notes on partial/fail cells are
emitted as numbered footnotes with deduplication.

Also extends the task UI to render summary files and any .md result
file inline using ReactMarkdown + remark-gfm, so the matrix shows up in
the UI without manual download.
New playbooks/api-compatibility/ folder for cross-client beacon-API
compatibility checks. The folder is intentionally fork-agnostic so
future forks can drop in a new file alongside.

gloas-api-check.yaml exercises every beacon-API endpoint / SSE topic
introduced or changed by the GLOAS / ePBS spec PRs (#552, #580, #586,
#587, #588, #592) against every connected consensus client:

  - 11 HTTP endpoints (publish/produce/retrieve blocks, bids, envelopes,
    PTC duties, payload attestation data + pool ops).
  - 5 SSE topics (execution_payload_bid, execution_payload_available,
    payload_attestation_message, execution_payload,
    execution_payload_gossip), subscribed concurrently to share the
    same wait window.

Each row uses an inline JSON Schema derived from the spec to validate
either the success body or the documented ErrorMessage body. The final
step calls generate_api_compatibility_matrix to render the markdown
matrix as a result artifact.

Playbook index is regenerated to include the new folder.
…accepts them

run_shell's ::set-output stores values as plain strings; downstream
tasks that consume the value as uint64 (minSlotNumber, etc.) fail
yaml-unmarshal. Switch to ::set-output-json so the integers are stored
typed.
End-to-end polish surfaced by the first kurtosis test run:

- matrixCell now has explicit JSON tags so matrix.json uses camelCase
  ('result' / 'note' / 'httpStatus') instead of the Go-default
  capitalised field names.
- Row titles in the rendered table wrap only the HTTP method+path (or
  SSE topic) in code-ticks; the optional trailing descriptor (e.g.
  '(Gloas)') stays outside the code span, matching the requested
  matrix layout.
- /api/v1/test_run/{runId}/details now includes result_files for each
  task, so the UI's TaskList -> TaskDetails Overview pane has the
  metadata it needs to render the inline markdown summary. Previously
  only /api/v1/test_run/{runId} carried the result-file headers and
  the UI's per-task fetch path was missing them.
New utility task that runs a JavaScript snippet via Node.js. Modeled
on run_shell with the same stdout-marker protocol (::set-output,
::set-output-json, ::set-var, ::set-json) and the same
\$ASSERTOOR_SUMMARY / \$ASSERTOOR_RESULT_DIR semantics, but built for
shapes that are awkward in shell/jq — collecting and rendering tabular
data from sibling task outputs, generating JSON configs from chain
state, etc.

Highlights:
- envVars are JSON-decoded by a preamble and exposed as a single 'env'
  object so user code does not have to JSON.parse(process.env.X)
  manually.
- Helpers: setOutput[JSON], setVar[JSON], writeResultFile,
  writeSummary.
- The user script is wrapped in '(async () => { ... })()' so top-level
  'await' works.
- The wrapped source is written to a file before invoking node, so
  stack traces refer to the user's line numbers.

Dockerfile / Dockerfile-local install Node.js 20 from NodeSource so
the task works in the published image.
…ript

The matrix-rendering aggregator is no longer a Go task — the playbook
now produces the markdown directly with a run_javascript step that
walks every sibling task's matrixRow output. ~600 lines of Go +
README disappear in favor of ~80 lines of self-contained JavaScript
inside the playbook.

In passing:
- Rename check_consensus_api outputs/config 'checkId'/'checkTitle' to
  generic 'rowId'/'rowTitle'. The matrix consumer can target any task
  conforming to the {rowId, rowTitle, referenceUrl, matrixRow}
  convention — not just check_* tasks.
- Tighten lint pass on check_consensus_api: extract methodGet /
  clientTypeUnk / maxResponseLen / defaultSSETimeoutSeconds constants,
  pass Config by pointer to classifyHTTPResult, name resolvePath
  results, and add the wsl_v5 blank-line padding the project requires.
ResolveQuery prepends '.' before parsing, so '| tasks | to_entries[]'
expands to '. | tasks | to_entries[]' — and gojq has no global 'tasks'
function so the iter yielded nothing, env.ROWS arrived undefined, and
the renderer threw 'filter is not a function'.

Existing playbooks (builder-lifecycle.yaml etc.) consistently use the
'.tasks.<id>.outputs.<field>' form. Match it.
…urrent are collected

Top-level '.tasks' only contains direct children of the test; child
tasks of run_tasks_concurrent (etc.) register their status vars in the
wrapper's scope. Use jq's recursive '..' to walk every nested map and
pick out matrixRow-bearing objects, then unique_by(rowId) to drop any
duplicates from the walk.
The 'tasks | to_entries[]' query crashed gojq with a panic on time.Time
values nested in get_consensus_specs' spec map (MIN_GENESIS_TIME etc.).
yaml roundtrip in ResolveQuery turns the iso8601 strings back into
time.Time, and gojq's 'type' builtin (and friends) panic on that type.

Switch to explicit per-row env vars — each query is now a simple path
expression that never iterates a heterogeneous map. The JS side glues
them back together. More verbose but robust.
Children of run_tasks_concurrent register their ids under the
wrapper's variable scope, not the root — so the matrix renderer
(running at root) couldn't resolve 'tasks.row12_sse_*.outputs'.

Flat layout: sequential SSE subscription per topic. Adds ~4 min to the
playbook (5 × 48s) but lets the matrix collector find every row by id.
Also drops the temporary DEBUG dump task.
If the SSE subscription was accepted (route exists, topic name
recognized) but no events arrive in the window, we've still proved
the endpoint syntax — that's what 'api-compatibility' is about.
Whether or not events actually fire during the window is a chain-state
question, not a client compatibility one.

Status moves from partial -> pass; note rewrites to drop the 'but'.
Previously the only way for a playbook to surface a top-level
artefact like a compatibility matrix was to drop it under a specific
task's result files — three clicks deep, hard to find unless you
already know which task to look in.

This patch adds a centralised run-level Result:

- New env var $ASSERTOOR_TEST_RESULT exposed by run_shell and
  run_javascript. Points at a shared markdown file owned by the
  test-run's scheduler; every task in the run sees the same path.
  Multiple tasks can write to / append to it.
- TaskScheduler owns the file (lazy-created in a per-run temp dir,
  cleaned up at end of RunTasks). After every task finishes, the
  current file contents are upserted into the existing task_results
  table under sentinel TaskID=0 / Type="test_result".
- run_javascript preamble grows two helpers: writeTestResult(s) and
  appendTestResult(s) so user scripts don't have to fs.writeFileSync
  by hand. (Execute is split into Execute + buildCommandEnv to keep
  cyclomatic complexity in line.)
- New API endpoint GET /api/v1/test_run/{runId}/result returns the
  markdown (text/markdown) or 204 when nothing was written.
- New web-ui component RunResultPanel renders the markdown inline on
  the test-run page, just below the timeline. Polls every 5s while
  the run is in flight, stops once it ends. A "view raw" link points
  at the API for direct download.
- get_test_run_details_api.go already excluded TaskID=0 from
  result_file headers (the task-level pane); fine-tunes a wsl_v5
  whitespace nit while there.
Keeps the per-task matrix.md too (still downloadable + inline-rendered
in the task pane), but also publishes it to the run-level Result
panel — one click from the test-run page.
Walks recent runs of the test (newest first) and returns the first
$ASSERTOOR_TEST_RESULT markdown blob found, so the dashboard can
surface the latest meaningful result with one round trip instead of
fanning out to fetch every run individually.

Two response modes: raw markdown (default) plus metadata in X-Run-*
headers, or a JSON envelope with run_id/status/start_time/markdown
when ?meta=1 is set.
Adds LatestResultResponse type, getLatestTestResult() client call,
and useLatestTestResult() react-query hook. The hook is used by the
new dashboard's latest-result tile.
Adds a fully configurable tile grid that backs the new dashboard:

- types.ts: tile model (4 types — success_rate, latest_result,
  recent_runs, text), persisted config schema (versioned, localStorage)
- useDashboardConfig: hook owning the config + imperative API
  (add/remove/move/resize/reset)
- TileGrid: 12-col responsive layout, edit-mode action strip
- SuccessRateTile: rate ring + per-run swatches over last N runs
- LatestResultTile: renders $ASSERTOOR_TEST_RESULT markdown of the
  most-recent run that produced one
- RecentRunsTile: compact live-updating list of runs
- TextTile: free-form markdown
- AddTileModal: type-picker for new tiles
- TileEditorModal: per-tile configuration (test picker, window size,
  markdown body, etc.)
The old Dashboard was 500+ lines that did one thing: paginate the
test-runs table. That table now lives on the new /runs page; the
home page becomes a real overview.

The new Dashboard wires up the tile system: an edit-mode toggle, an
Add-tile modal, a per-tile editor modal, and grid mutations
(add/remove/move/resize). Out of edit mode it is pure read-only
output.
New page hosts the test-runs table that used to live on /. It's a
split pane:

- left: registered tests with search; clicking a test scopes the
  right pane to its runs
- right: paginated runs table (status, started, duration, actions),
  with bulk delete + 'Start test' from the active scope

The selected test is reflected in the URL (?testId=…) so deep links
and reloads work. Routing & nav are updated: 'Runs' sits between
Dashboard and Library.
The 'SUCCESS RATE · DEMO · RUN-LEVEL RESULT · width-picker · ↑↓✎✕'
strip was wrapping awkwardly on small (3/12) tiles, occluding the
tile body. Two changes fix it:

- drop the title from the strip; it's already shown inside the tile,
  and the type label alone is enough context
- replace the wordy width-picker ('small (3/12)') with a single-letter
  variant (S / M / L / XL); full label is in the title attribute

Now the entire strip fits on one line at every supported width.
- All three Dockerfiles now pull node 24 from NodeSource (20 is EOL).
- Dockerfile-stub (the CI stub image that copies a pre-built binary)
  was missing node entirely — added it the same way as the real
  multi-stage images, so playbooks using the run_javascript task can
  run inside the stub image too.
- Removed the now-redundant 'nodejs is required by run_javascript'
  comments from the Dockerfiles; the dependency is self-evident.
Adds two new endpoints powering the reworked dashboard:

- GET  /api/v1/network_status — aggregated chain & orchestrator
  snapshot (head slot/epoch, finalized + justified checkpoints,
  EL head, per-layer client readiness counts, queue depth). Cheap;
  served entirely from the in-process pool.

- GET  /api/v1/dashboard_config (public)
  PUT  /api/v1/dashboard_config (auth-required)

  The dashboard layout now lives on the server in a small key/value
  table (sqlite + pgsql migrations), so it can be edited only by
  authenticated users — anonymous viewers see the same dashboard
  but can't mutate it. The body is treated as an opaque JSON blob;
  schema validation lives in the client.
Major rework of the dashboard editing experience:

Data model
- Tiles now live inside named **rows** (horizontal containers).
  Rows visually segment the dashboard and can be reordered as units.
- Config schema bumped to version 2; no migration from v1 (per user
  request) — invalid blobs reset to the default dashboard.
- Config is server-backed (replaces localStorage). The hook is a
  react-query mutation that PUTs after a 250ms debounce so rapid
  drag-and-drop edits coalesce into one round-trip.

Edit mode
- Each tile is now visually framed in edit mode (chip header +
  bordered body), so the type label can't be confused with the
  tile above it — the bug from the previous edit-mode screenshot.
- Each row gets its own dashed-border frame with row-level
  controls (rename, move up/down, remove).
- Add-tile modal is gone. Replaced with a sidebar palette (like
  the builder) of draggable tile types — drop them on any row's
  drop zone, or click to append to the last row.
- Tiles are draggable too: rearrange within a row, move across
  rows. Powered by @dnd-kit/core.
- Export / Import buttons round-trip the config as JSON (download
  + file picker).
- Edit-mode controls are hidden when the user isn't authenticated;
  the PUT endpoint requires auth, so anonymous editing would
  silently fail to persist.

New tile types
- client_status — per-client EL/CL liveness dots + heads, polls
  the existing /clients endpoint.
- network_status — head slot/epoch, finalized + justified
  checkpoints, EL head, queue depth, ready-client counts;
  polls the new /api/v1/network_status endpoint.
Edits during edit-mode now stage to a local 'draft' state. The
react-query cache still mirrors the server-side baseline; the draft
only flushes to the server when the user clicks 'Save changes'.

This fixes the regression where rapid mutations (add row, drag tile
from palette, etc.) would appear to disappear: the previous debounced
PUT raced with re-renders and was overwriting optimistic updates on
some auth states. With an explicit save we never touch the server
during edits, so there's no race.

UX changes:
- 'UNSAVED' badge in the title when the draft differs from the
  server config.
- New toolbar buttons: 'Save changes' (primary, enabled when dirty)
  and 'Discard' (only shown when dirty).
- 'Done' confirms before exiting with unsaved changes.
- beforeunload guard so tab close / refresh prompts when dirty.
- Reset and Import both stage into the draft — neither hits the
  server until the user saves.
The latest_result, recent_runs, client_status and text tiles can
all grow with their content. On a dashboard with a long matrix or
a 30-client fleet, that pushes everything else off-screen.

Add an optional heightPx field on those four tile configs. When
set, the tile's outer card gets max-height: <px>px and the inner
body becomes scrollable (the flex-col layout was already in place
for three of them; TextTile got refactored to match).

The tile editor exposes heightPx as a 'Max height (px, optional)'
input — leave it blank for the previous behaviour (size to content).

Also tidies the editor: client_status now has its own editor body
(showExecution checkbox + height), and network_status renders a
short 'no extra settings' note instead of an empty form.
Three new HTTP endpoints powering the schedule rework, plus an
extended schedule API that preserves the legacy shape:

- PUT /api/v1/test/{testId}/schedule (auth)
  Replace a registered test's schedule (startup, cron[], skipQueue).
  Cron expressions are validated up-front; invalid input rejects
  the whole request.

- GET /api/v1/test/{testId}/next_run
  Walks each cron expression and returns the next firing time per
  expression plus the overall earliest. Drives the 'next in 7m'
  hint shown on the runs page banner.

- GET /api/v1/test_queue
  Returns the runner's running + pending queue in execution order.
  Feeds the StartTestModal's QueuePicker.

POST /api/v1/test_runs/schedule gains a new optional 'queue'
field (mode: immediate|end|after, after_run_id). The legacy
'skip_queue' boolean stays as a deprecated fallback — when both
are supplied 'queue' wins.

Internally, types.Coordinator gains ScheduleTestWithOptions; the
4-arg ScheduleTest is preserved and now forwards to it. The
testrunner can splice into c.testQueue at an arbitrary position
(after the named RunID); falls back to 'append' silently if the
target has already started/finished.

TestRegistry.UpdateTestSchedule mutates the descriptor under a
schedule mutex and re-upserts the test_configs row so changes
survive restarts.
Adds a CronEditor and a ScheduleCard component that surface a
test's schedule wherever it matters:

  - Library → Local Playbooks → expanded test row: card variant
    with the full breakdown (Startup / Skip-queue toggles, each
    cron expression + cronstrue-rendered description).
  - Runs page → above the runs table when a test is selected:
    banner variant — one-line summary + 'next in 7m (HH:MM)'.

The editor itself (opened from either surface) supports:
  - 7 one-click presets (every minute, hourly, daily, weekly, …)
  - Free-form crontab input validated by cron-parser
  - cronstrue translation under each expression
  - The next 3 firings rendered inline
  - Per-row remove + 'add another expression'
  - 'Clear schedule' to wipe everything

Edits are auth-gated end-to-end: the modal's Save button calls the
new auth-required PUT endpoint; anonymous users see the card in
read-only mode (no Edit button).
The Run dialog gets two clearly distinct sections:
  - 'Test configuration' — the per-test knobs (now in a bordered
    card with an uppercase header + config icon).
  - 'Run options' — bordered card with a settings icon, holding:
      - 'Queue placement': the new QueuePicker dropdown
      - 'Allow duplicate' checkbox

QueuePicker is a rich, 2-level nested dropdown:
  Level 1:
    - Run immediately   (mode='immediate'; parallel slot)
    - Add to queue · End  (mode='end'; append)
    - Add to queue · After…  (opens level 2)
  Level 2:
    - One entry per currently running or queued test, each with a
      status dot + run id + name. Selecting one sets mode='after',
      after_run_id=<run id>.

The dropdown renders in-flow (not absolutely positioned) so the
containing modal scrolls naturally when the menu overflows.

Also fixes an unrelated 'renders 0' bug — when a test's default
timeout was exactly 0, React rendered the falsy 0 inside the
test-info card.
@pk910 pk910 changed the title dashboard rework dashboard rework & api check workflow May 18, 2026
@pk910 pk910 changed the title dashboard rework & api check workflow Dashboard rework, GLOAS beacon-API compatibility playbook & cron scheduling May 18, 2026
pk910 added 20 commits May 18, 2026 23:19
No need for a dedicated table + migration just to hold one row.
The existing assertoor_state KV store (key TEXT primary, value TEXT)
already serves exactly this purpose for other singletons in the
codebase, so reuse it.

Changes:
- Drop pkg/db/dashboard_config.go and the two 20260518150000_*.sql
  migrations (sqlite + pgsql).
- Rewrite GetDashboardConfig / PutDashboardConfig to round-trip the
  JSON blob through GetAssertoorState / SetAssertoorState under the
  fixed key 'dashboard_config'. json.RawMessage is the carrier type
  so the bytes pass through verbatim (RawMessage.MarshalJSON is
  identity), avoiding the double-encoding the KV's generic
  interface{} path would otherwise do.

Also gofmt-fixes a couple of struct-tag tables in the new API files
and tidies an exhaustive-switch warning by listing the terminal
TestStatuses explicitly.

Verified end-to-end:
- GET (no config) → 204
- PUT valid JSON → 200, persisted
- GET → returns the same JSON verbatim (not double-encoded)
- PUT invalid JSON → 400 with a clear error
- migration log shows only the 4 pre-existing migrations
`make lint` now passes with 0 issues. None of the touched code paths
change behaviour — every fix is annotation-only.

Three buckets:

- **errcheck on crypto/rand.Read** in the three spam-transaction
  generators (random-target address paths). crypto/rand never fails
  on Linux; the zero fallback would be a harmless spam target if it
  ever did. Now explicitly discarded with `_, _ = rand.Read(...)`
  and a one-line note.

- **gosec G115 integer-overflow conversions** (12 sites across
  pkg/clients, pkg/tasks/generate_*, pkg/txmgr). Each is suppressed
  with `//nolint:gosec` and a short reason — most are guarded by a
  `> 0` check on the line above, or operate on non-negative slice
  indices.

- **nolintlint 'unused directive'** in pkg/web/api/get_test_yaml_api.go.
  Three stale `//nolint:gosec` comments left over from a refactor —
  the underlying gosec rules no longer fire here, so the directives
  themselves are now lint findings. Removed.
My earlier 'pre-existing lint cleanup' commit was run against local
golangci-lint v2.6.0, which flags G115 integer-overflow conversions.
CI runs v2.11.3, which does NOT flag those — so every G115 nolint
became an 'unused directive' nolintlint error in CI.

Two corrections:

- Restore the three //nolint:gosec annotations on
  pkg/web/api/get_test_yaml_api.go that v2.11.3 still flags (G704
  SSRF on the HTTP request paths, G703 path-traversal on the local
  file read). Both inputs come from configured test sources, never
  from request bodies.

- Drop the 13 //nolint:gosec annotations on G115 integer-overflow
  conversions across pkg/clients/, pkg/tasks/generate_*, and
  pkg/txmgr/spamoor.go. v2.11.3 doesn't fire on these in the first
  place, so the directives are unused.

Verified clean under golangci-lint v2.11.3 (the version pinned in
.github/workflows/_shared-check.yaml):

    $ golangci-lint run --timeout 5m ./...
    0 issues.
Resolves to a stable point a few slots back from head (default 4 slots
≈ 48s) plus the canonical block root at that slot. This lets the
GLOAS API playbook target a slot where derived state like
execution-payload envelopes and PTC committee data is reliably
available across every client, instead of racing against head.

New placeholders, all offset-arithmetic capable:
  - {recent_slot}, {recent_slot+N}, {recent_slot-N}
  - {recent_epoch}, {recent_epoch+N}, {recent_epoch-N}
  - {recent_block_root}  (root fetched via GetBlockHeaderBySlot)

Implementation details:
  - resolvePath() builds a pathContext once per task execution, doing
    a single best-effort GetBlockHeaderBySlot RPC for the recent root.
  - resolvePlaceholder() now takes the pathContext value-struct
    instead of an ever-growing tuple of head{Slot,Epoch,Root}.
  - The signed-offset parser (+N/-N) is unchanged, so '_' in
    'recent_slot' is correctly part of the keyword.
  - On lookup failure the recent root falls back to head, so the
    task still produces something usable even if /beacon/headers
    is briefly unavailable.
The original matrix was dominated by 🟡 footnotes saying 'error status
404 but body does not match ErrorMessage schema'. Those reflected our
strict OpenAPI ErrorMessage requirement (`required: [code, message]`)
more than they reflected anything about endpoint compatibility. CL
implementations diverge wildly on error-body shape (some omit `code`,
Lodestar wraps differently, Nimbus/Grandine return extra fields,
etc.).

Two changes:

- **Permissive errorSchema everywhere**: `errorSchema: { type: object }`.
  We only require that the body parses as a JSON object — which is
  enough to distinguish 'route registered, returned a structured
  error' from 'connection refused / HTML 404 / random text'. The
  spec-correct `{code, message}` schema would still flag legitimate
  outliers, but for a cross-client compatibility matrix it produced
  more noise than signal.

- **Stable inputs via {recent_*}**: row 7 (envelope/{slot}/{root})
  was using head-slot/head-root, but the envelope for the very head
  may not be produced yet on every client. Now it uses
  {recent_slot}/{recent_block_root} (head - 4 with the canonical
  root fetched via /beacon/headers), so each client has a stable,
  agreed-upon target. Same idea for rows 2, 4, 8, 9.

- **Expanded expectStatuses**: rows 2/4 now accept 500 and 501. A
  500 from a stub handler on synthetic input still tells us the
  route is registered (which is the question we're answering); 501
  with a structured body is a partial-implementation signal worth
  capturing as 🟡-or-better, not ❌.

- **Loosened success schema on row 6**: clients return the envelope
  in slightly different wrappings (some {version,data}, some bare).
  Accept any JSON object — the structural validation belongs in
  spec-conformance tests, not the compatibility matrix.

Description, version (1.0.0 → 1.1.0) and emoji-legend updated to
match the new contract.
The eventstream library's Stream.Ready is an unbuffered channel that
the stream goroutine writes to before entering its read loop (see
eventstream.go:169 — `stream.Ready <- true`). If no consumer drains
it that send blocks forever, the receive goroutine never starts, and
every SSE topic appears to produce zero events even when the server
is actively pushing them.

That was the root cause of the SSE matrix rows always showing
'subscription opened (no events within window)' — including for
high-volume topics like execution_payload where events are
guaranteed under a healthy GLOAS chain.

Drain Ready immediately after Subscribe returns, gated on the same
context the rest of the SSE loop uses so a stalled subscription
still terminates the task cleanly.
Two focused, generic tasks playbooks can compose with configVars to
feed realistic inputs into other tasks:

- get_consensus_block_header — fetch one beacon-block header by
  slot, root, or 'head - headOffset' (walking past missed slots).
  Outputs slot, root, proposerIndex, parentRoot, stateRoot.

- get_consensus_proposer_duties — fetch the proposer schedule for
  one epoch (absolute, or current + epochOffset). Outputs the
  duties array plus convenience fields: the first duty strictly
  after the current head (proposerSlot + validator index for
  produceBlockV4 inputs) and up to N deduped real validator
  indices for body-payloaded endpoints.

These replace the earlier monolithic gather_gloas_context concept:
they're small and orthogonal, can be used independently in any
playbook, and don't carry domain assumptions about which fields a
particular check needs.
…l SSE

The matrix now actually verifies the response shape: each GET row
declares a responseSchema listing every required field from the
spec PR (#552, #580, etc.), so a client returning the wrong wrapper
or missing a GLOAS-new field (execution_payload_bid, payload_attestations,
the envelope's payload sub-fields, etc.) earns a 🟡 instead of a
false ✅.

Inputs come from two new generic tasks rather than synthetic
placeholders:

  - tasks.recent  -- get_consensus_block_header(headOffset=4): a
                     canonical slot/root pair guaranteed to have a
                     committed envelope on every client.
  - tasks.duties  -- get_consensus_proposer_duties(epochOffset=1):
                     the next-epoch proposer schedule, which gives
                     us a real future-slot proposer for
                     produceBlockV4 and a list of live validator
                     indices for PTC duties.

Per-row wiring via configVars/pathParams:

  - row02 (produceBlockV4):  slot = first future proposer slot
  - row04 (get_bid):         slot + builder_index = future proposer pair
  - row06 (envelope/{id}):   block_id = recent root
  - row07 (envelope/{s}/{r}): slot + beacon_block_root = recent pair
  - row08 (ptc duties):      epoch = duties.epoch, body = real validator indices
  - row09 (payload_attestation_data): slot = recent canonical slot

SSE rows (12-16) now run concurrently inside one run_tasks_concurrent
wrapper with newVariableScope:false — children still register at the
root tasks scope so the matrix renderer can find them by id. That
cuts the SSE wallclock from ~4 minutes (5 × 48s sequential) to ~48s.

Permissive errorSchema everywhere ({type:object}) is unchanged from
the previous pass — error-body shapes diverge legitimately and that
divergence isn't what the matrix is trying to measure.
vars.Variables.ConsumeVars prefixes every query with '.' unless it
already starts with one, so '{ key: val }' becomes '.{ key: val }'
which is a syntax error.

Three-fold fix:

- Lead each query with a '|' so the auto-prefix produces '.| { ... }'
  (identity | object constructor — valid jq).
- Inside the object constructor, field accesses need a leading '.'
  (jq's bare 'foo.bar' shorthand only works at the very start of an
  expression). 'tasks.X.Y' becomes '.tasks.X.Y'.
- 'body' uses no constructor so just keeps the canonical
  'tasks.duties.outputs.validatorIndices | map(tostring)' form
  (which the auto-prefix turns into a valid '.tasks…').

Every query parsed with gojq before commit; all seven now compile.
…assify empty body as missing

Two bugs the live e2e against glamsterdam-devnet-3 surfaced:

- resolvePath had an early return when every placeholder came from
  explicit pathParams. The substitution loop was only hit when the
  function had to walk chain state, so requests like
  'GET /eth/v1/beacon/execution_payload_envelope/{block_id}' with
  block_id supplied via configVars went out with the literal
  {block_id} URL-encoded as %7Bblock_id%7D and clients responded
  with 400 'invalid block ID'. The matrix then graded the test as
  'well-formed error' (✅) — a false positive across the board.

- Empty (or non-JSON) bodies on a documented error status were
  silently being scored as schema-mismatch partial (🟡). Empty 4xx
  bodies are how most web frameworks signal a fully-missing route
  (the global not-found handler), so classify them as ❌ 'route
  likely missing' instead. Same for non-JSON bodies — every client
  that actually implements an endpoint sends a structured error
  body.

Also rewrites every responseSchema and eventSchema in the playbook
to match the canonical beacon-APIs types (~/repos/beacon-APIs):

- Row 2 (produceBlockV4): body uses 'signed_execution_payload_bid'
  (NOT 'execution_payload_bid'); 'blob_kzg_commitments' and
  'execution_requests' are NOT body fields in gloas.
- Row 4 / SSE 12 (bid): full ExecutionPayloadBid required-field list
  (parent_block_hash, parent_block_root, block_hash, prev_randao,
  fee_recipient, gas_limit, builder_index, slot, value,
  execution_payment, blob_kzg_commitments). The previous
  'blob_kzg_commitments_root' field doesn't exist in the spec.
- Row 6/7 (envelope GETs): wrapper is {version, execution_optimistic,
  finalized, data}; ExecutionPayloadEnvelope required fields are
  exactly 5: payload, execution_requests, builder_index,
  beacon_block_root, parent_beacon_block_root (no slot,
  state_root, blob_kzg_commitments — those live elsewhere).
- Rows 9/11 (payload attestation data, pool/payload_attestations):
  PayloadAttestationData uses 'payload_present' + 'blob_data_available'
  booleans (not 'payload_status'); pool response is wrapped {version,
  data}.
- SSE 13 (execution_payload_available): field is 'block_root' (not
  'beacon_block_root').
- SSE 14 (payload_attestation_message): wrapped {version, data},
  inner data carries 'payload_present'/'blob_data_available'.
- SSE 15 (execution_payload): flat {slot, builder_index, block_hash,
  block_root, execution_optimistic}.
- SSE 16 (execution_payload_gossip): same as 15 minus
  execution_optimistic.

Also fixes row 10's POST body to use the correct PayloadAttestationMessage
shape (validator_index outside, nested data with the four
PayloadAttestationData fields).
Previous run revealed false-positive ✅s on routes that DON'T exist
but the framework returns a structured-JSON 404. Lighthouse's global
not-found handler returns `{code,message:NOT_FOUND}`; Lodestar's
returns `{code,message:"Route POST:/eth/v1/foo not found"}` — both
are technically valid JSON objects that pass an open schema, but
neither indicates that the route is registered.

Add a tiny content-aware classifier that scans the error `message`
for generic framework-level phrases and treats those as 'route
likely missing' instead of 'well-formed error'. Patterns covered:

  NOT_FOUND, route X not found, unsupported endpoint version,
  endpoint not found, unknown endpoint, unknown route,
  no matching route, method not allowed, no handler

The patterns are tight enough to ignore domain-specific 404s
('Execution payload envelope not found', 'block not found at slot
100', 'Currently syncing', etc.). Verified on 9 sample messages —
all classify correctly.
…work vs domain 404s

Previous run flipped Lighthouse's row 6 (envelope/{block_id}) to a
false ❌ because its body was 'NOT_FOUND: execution payload envelope
for block root 0xe77…' — the route IS registered, the specific
resource just isn't available. LH happens to prefix every error
message with 'NOT_FOUND:' (its global error format), so the
substring match was overreaching.

Tighten: only treat the literal 'NOT_FOUND' / 'not found' message
(no detail) as a framework miss. Anything with a colon and detail
('NOT_FOUND: execution payload envelope for block …') stays as a
real domain 404 and counts as ✅ for endpoint existence.

Lodestar's 'Route POST:/eth/v1/foo not found' continues to match
via the dedicated 'route … not found' branch; the other patterns
(unsupported endpoint, unknown route, method not allowed, …) are
unchanged.

Verified on 11 sample messages — all classify correctly including
the new LH domain case.
…ents)

beacon-APIs PR #580 was revised mid-flight: the response shape now uses
`execution_payload_included` (boolean flag) instead of the originally
proposed `execution_payload_value` (string), and `data` is anyOf
{BeaconBlock, BlockContents} where BlockContents = {block,
execution_payload_envelope, kzg_proofs, blobs}. Top-level fields are
now { version, consensus_block_value, execution_payload_included,
data }.

Prysm already implements the revised shape (returns
{version, consensus_block_value, execution_payload_included, data:
{block,...}} for self-built blocks). The old schema required
`execution_payload_value` and put block fields directly under `data`,
which incorrectly flagged Prysm's spec-conformant response as 🟡
"response body does not match success schema".

Rewrite the schema with $defs + anyOf to accept either shape — clients
that diverge from BOTH variants still surface mismatches.
Lighthouse strictly requires `skip_randao_verification` to be a
presence-only flag — the underlying SkipRandaoVerification type
accepts None (= No) and Some("") (= Yes) but rejects any other value
with `Invalid query string`. Sending `skip_randao_verification=true`
made LH return 400 at the query-parse stage, so we never reached the
handler and never validated the success response body.

Switching to the spec-conformant empty-value form (`?skip_randao_verification=`)
keeps Prysm and Lodestar happy (both accept either form) and lets LH
actually produce a block, so its response is validated against the
PR #580 schema instead of being credited as a "well-formed 400".
…it rate

The envelope-by-block-id row (row 6) probes each client with a single
root picked from the first online consensus client. With headOffset=4
the chosen slot was often past Nimbus's envelope retention window, so
Nimbus returned 404 instead of 200 and we never validated its actual
response schema (which lacks the spec-required `version` field).

Drop the offset to 1 so the root we probe is the most-recent slot
that's almost guaranteed to be canonical on every client AND still
have its envelope cached. Keeps the existing maxLookback fallback
(now 4) for the rare case where head-1 was a missed slot.
A stability run failed when slots 190..193 on the devnet were all
missed in quick succession (the devnet briefly desynced and recovered
through a small run of empty slots). With maxLookback=4 the recent
block resolver only looked at slot{N..N-3} and bailed, taking the
whole playbook down before it ever reached the matrix.

16 slots is roughly half an epoch — enough to absorb any realistic
run of misses on a healthy devnet while still picking a recent root.
…y param

Per beacon-APIs PR #580, the canonical endpoint is
GET /eth/v1/validator/execution_payload_envelope/{slot}
with `beacon_block_root` as a *query parameter* (re-org resistance:
the BN returns 404 when its cached envelope is for a different
block root). Response data is the unwrapped ExecutionPayloadEnvelope
struct — no `message`/`signature` wrap; the VC signs the envelope.

The previous test queried `{slot}/{beacon_block_root}` as path
segments and validated a `{message, signature}` shape inside `data`.
That layout matches Lodestar's (non-spec) implementation but missed
on Lighthouse and Prysm, both of which correctly register the
spec-conformant 1-segment path. Net effect: the matrix flipped the
verdicts — Lodestar showed ✅, LH/Prysm showed ❌ — exactly opposite
of reality.

Fix: probe the spec-conformant URL with query param and validate the
spec-conformant response shape. LH/Prysm should now correctly resolve
the route (status depends on cache state), Lodestar should 404 (no
route for the 1-segment form on Lodestar).
@pk910 pk910 merged commit edeb601 into master May 19, 2026
9 checks passed
@pk910 pk910 deleted the research/dashboard-rework branch May 19, 2026 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant