Feat/restore observability panel#591
Conversation
- Header: add Observability button (pulse icon) that opens the existing OperationsPage (All Operations / Performance / Logs / Errors tabs). Entry point was orphaned after the v10 UI rewrite; page itself and its PanelCenter wiring were already intact. - PanelBottom: replace fixed 200px bottom panel with a fully resizable, draggable panel (vertical drag handle on top edge, persisted height via layout store). Adds a maximise/restore toggle (80vh overlay). Height is stored alongside leftWidth/rightWidth and survives reloads. - Transactions tab: wired from existing /api/operations (publish/update ops that reached the chain phase) — on-chain activity without any new backend routes. Expandable rows show tx hash, peer, phase waterfall. - Gossip tab: live-filtered view of the node log showing only libp2p / gossipsub / peer / SWM lines. Keyword list is broad enough to catch relay, DHT and protocol events. - Node Log tab: adds level filter (error/warn/info/debug), pause button, auto-scroll that respects manual scroll position. Note: all three tabs are currently backed by the local SQLite-backed /api/* endpoints. Once the OTEL telemetry stack is live these tabs will be replaced by Tempo trace / Loki log streams at the fleet level. Co-authored-by: Cursor <cursoragent@cursor.com>
OperationName in packages/core/src/logger.ts now covers:
publish, publishFromSWM, update, ka-update, query, resolve, connect,
sync, share, gossip, reconstruct, verify, init, system
Operations.tsx OP_TYPE_COLORS and OP_TYPE_DESCRIPTIONS were missing
the seven new ones (share, publishFromSWM, ka-update, reconstruct,
verify, init, resolve). All added with distinct colours and descriptions.
PanelBottom Transactions tab TX_OP_TYPES expanded from {publish, update}
to {publish, publishFromSWM, update, ka-update, reconstruct} — all op
types that can reach the chain phase and submit a tx.
Co-authored-by: Cursor <cursoragent@cursor.com>
- Strip ANSI escape sequences from all log lines (Node Log + Gossip) so raw color codes no longer appear as literal text - Remove pause/play button from Node Log toolbar (simplified to filter + level select only) - Extract useAutoScroll() hook shared by Node Log and Gossip - Gossip tab now shows only real libp2p events (Connection opened/closed, ProtocolRouter timings, Circuit relay, GossipSub, FinalizationHandler) instead of leaking general DKGAgent structured log lines - CSS: remove browser focus outlines on tab/toggle buttons, fix toolbar flex layout, add v10-log-level-select class, use pre-wrap + word-break on log lines so long lines wrap instead of overflowing, active tab now uses accent-blue underline Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
The v10 UI rewrite left Operations.tsx without corresponding stylesheet
definitions for its legacy v9 class names. Added:
tab-group / tab-item — horizontal tab bar with accent-blue underline
input / select.input — themed form controls with custom select arrow
data-table — striped/hoverable table with uppercase headers
badge / badge-{success,error,warn,info} — coloured type/status pills
empty-state / --compact / --rich — centred placeholder layouts
page-section / page-title — page wrapper and heading
card-title — section heading inside cards
filters — flex filter bar
phase-bar-wrap/seg — inline phase progress bars
tx-link-icon — subtle tx hash link styling
Co-authored-by: Cursor <cursoragent@cursor.com>
…ph_* Databases created before the v10 terminology rename still have `paranet_count` and `paranet_id` columns. INSERT statements targeting the new `contextGraph_*` names fail silently, preventing metrics and operations from being stored. Adds schema version 14 which detects the old column names via pragma table_info and renames them in-place. Co-authored-by: Cursor <cursoragent@cursor.com>
Adds OperationTracker instrumentation for DKG events that were previously untracked: PROJECT_SYNCED (sync), GOSSIP_MESSAGE (gossip), KC_CONFIRMED (verify), and KA_UPDATED (ka-update). These are event-based records (work completed in the core before the event fires), so they capture occurrence + metadata rather than multi-phase timing. Co-authored-by: Cursor <cursoragent@cursor.com>
- Reduce SNAPSHOT_INTERVAL_MS from 120s to 30s for more responsive hardware metrics in the Observability panel. - Handle 401 responses in useFetch by triggering a page reload so the server re-injects a fresh auth token after node restarts. Co-authored-by: Cursor <cursoragent@cursor.com>
- Split the old "Performance" tab into dedicated "All Operations" and "Hardware" tabs. Operations tab shows operation stats charts + the full operations list; Hardware tab shows live stat cards and time-series graphs. - Redesign MiniGantt component to display phase name pills with colored dots and durations inline (no hover required). - Show "event-based" label for operations without phases instead of a bare dash. - Fix header showing "**" instead of node name when agent identity has a placeholder name. Co-authored-by: Cursor <cursoragent@cursor.com>
- Remove sync (PROJECT_SYNCED) tracker: event fires after catch-up completes so it recorded misleading 0ms durations. Sync events are still surfaced via SSE notifications. - Remove gossip (GOSSIP_MESSAGE) tracker: one operation row per inbound message would flood the DB on busy nodes. Gossip activity is better served by aggregate metrics. - Fix ka-update tracker to use data.ual and data.rootEntities instead of the non-existent data.kaUri field. - Route PanelBottom's fetchOperationsWithPhases through api-wrapper for consistent mock/offline fallback behavior. - Fix PanelBottom error display: read op.error_message (the actual API field) instead of op.error. Co-authored-by: Cursor <cursoragent@cursor.com>
…ility-panel Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # packages/node-ui/src/ui/components/Shell/PanelBottom.tsx # packages/node-ui/src/ui/stores/layout.ts # packages/node-ui/src/ui/styles.css
- Remove duplicate BOTTOM_HEIGHT_MIN/MAX declarations in layout.ts (keep the single source at the clamp-constants block). - Remove dead KC_CONFIRMED listener — the event is never emitted in the current tree, so it was a no-op. - Gate KA_UPDATED tracker to remote-origin updates only (fromPeerId present) to avoid duplicating the operation already tracked by /api/update for local publishes. - Add 'verify' to TX_OP_TYPES in PanelBottom so verification chain transactions appear in the Transactions tab. - Make 401 reload a one-shot recovery: only reload if a token was injected and no prior reload was attempted this session, preventing infinite refresh loops in dev mode or persistent auth failures. Co-authored-by: Cursor <cursoragent@cursor.com>
- Use sessionStorage for 401 reload marker so it persists across the reload and prevents infinite loops (window properties are lost). - Replace invalid periodMs param with from (timestamp) which the /api/operations endpoint actually supports for time-range filtering. - Restore viewport clamp on bottom panel height via maxBottomHeight() so a value saved on a tall screen can't overflow on shorter ones. - Fix auto-scroll dependency: use last visible line content instead of array length, so scroll advances when new logs arrive at the buffer cap (length stays constant but content changes). Co-authored-by: Cursor <cursoragent@cursor.com>
…r-side - Restore totalMs <= 0 early return in MiniGantt to prevent NaN/Infinity percentages when phases exist but have 0ms duration. - Add `names` query param (comma-separated) to /api/operations so the Transactions tab can request only tx-capable operation types server-side. This prevents non-tx operations (query, share) from pushing real transactions out of the 100-row result set on busy nodes. - Update DashboardDB.getOperations to support IN-clause filtering when multiple names are provided. Co-authored-by: Cursor <cursoragent@cursor.com>
| tracker.start(ctx, { | ||
| contextGraphId: data.contextGraphId, | ||
| peerId: data.fromPeerId, | ||
| details: { |
There was a problem hiding this comment.
🔴 Bug: the remote KA_UPDATED event emitted by update-handler carries batchId and txHash, not ual. As written, these new ka-update rows lose the useful chain metadata, so Observability can't link them back to the underlying transaction. Persist the actual remote-event fields here and call tracker.setTxHash(ctx, data.txHash) before complete().
| api.fetchOperationsWithPhases({ limit: '100', from, names }) | ||
| .then((data: any) => { | ||
| const filtered = (data?.operations ?? []).filter((op: any) => | ||
| (op.phases ?? []).some((p: any) => p.phase === 'chain') |
There was a problem hiding this comment.
🔴 Bug: this filter only keeps operations with a chain phase, which drops the event-based ka-update records added by this PR because those rows are inserted without any phases. If this tab is meant to include ka-update, filter on tx-backed rows as well (for example op.tx_hash) or synthesize a phase when tracking the event.
| const hasToken = !!(window as any).__DKG_TOKEN__; | ||
| const alreadyRetried = sessionStorage.getItem('__dkg_401_reloaded') === '1'; | ||
| if (hasToken && !alreadyRetried) { | ||
| sessionStorage.setItem('__dkg_401_reloaded', '1'); |
There was a problem hiding this comment.
🟡 Issue: __dkg_401_reloaded is set but never cleared. After the first stale-token incident in a tab, every later 401 will skip the auto-reload path forever and only show the error banner. Clear this flag after a successful fetch or once a fresh token has been bootstrapped.
|
|
||
| const TX_OP_TYPES = new Set(['publish', 'publishFromSWM', 'update', 'ka-update', 'verify', 'reconstruct']); | ||
|
|
||
| const TX_TYPE_COLORS: Record<string, string> = { |
There was a problem hiding this comment.
🟡 Issue: this introduces a second operation-type color map, and it already drifts from Operations.tsx (verify is purple here but green there). Pull the shared operation metadata from one constant so the same operation type renders consistently across the UI.
Summary
Restores the Observability page that was lost during the v10 merge, adds comprehensive operation tracking for all DKG event types, and includes a critical database migration fix for pre-existing nodes.
What's included
Observability UI (restored & enhanced)
Operation Tracking (daemon)
sync(PROJECT_SYNCED),gossip(GOSSIP_MESSAGE),verify(KC_CONFIRMED),ka-update(KA_UPDATED)query(with parse/execute phases),publish,connect,share(with validate/store phases)Database Migration (critical fix)
paranet_count→contextGraph_countandparanet_id→contextGraph_idpragma table_infobefore rename (safe for fresh DBs)Quality-of-life fixes
**from agent identityTest plan
contextGraph_*columns, Observability page loads correctlyparse+executephases visible in Operations tabsyncoperations appear**