Skip to content

Feat/restore observability panel#591

Open
Niks988 wants to merge 14 commits into
mainfrom
feat/restore-observability-panel
Open

Feat/restore observability panel#591
Niks988 wants to merge 14 commits into
mainfrom
feat/restore-observability-panel

Conversation

@Niks988
Copy link
Copy Markdown
Collaborator

@Niks988 Niks988 commented May 20, 2026

Summary

Restores the Observability page that was lost during the v10 merge, adds comprehensive operation tracking for all DKG event types, and includes a critical database migration fix for pre-existing nodes.

What's included

Observability UI (restored & enhanced)

  • Full Observability page with dedicated "All Operations" and "Hardware" tabs
  • Expandable bottom panel with live logs (ANSI-stripped), level filtering, and gossip noise filtering
  • Operation list with inline phase visualization (MiniGantt with labeled pills)
  • Hardware live stat cards (CPU, RAM, Heap, Disk, Peers, RPC Latency) + time-series charts
  • Operation success rate charts and per-type time series
  • Click-to-expand detail panel with Phase Timeline waterfall and correlated logs

Operation Tracking (daemon)

  • Tracks all DKG event types: sync (PROJECT_SYNCED), gossip (GOSSIP_MESSAGE), verify (KC_CONFIRMED), ka-update (KA_UPDATED)
  • Existing tracking preserved: query (with parse/execute phases), publish, connect, share (with validate/store phases)
  • All operation types now appear in the type filter dropdown

Database Migration (critical fix)

  • Schema version 14: auto-renames legacy paranet_countcontextGraph_count and paranet_idcontextGraph_id
  • Fixes silent INSERT failures on nodes that were created before the v10 terminology rename
  • Column existence checked via pragma table_info before rename (safe for fresh DBs)

Quality-of-life fixes

  • Metrics collection interval reduced from 120s to 30s for more responsive hardware display
  • Auto page reload on 401 (stale auth token after node restart)
  • Header shows node name instead of placeholder ** from agent identity
  • Log level select properly styled (no native browser chrome)

Test plan

  • Fresh node: verify schema creates with contextGraph_* columns, Observability page loads correctly
  • Existing node (pre-rename DB): verify migration renames columns, metrics and operations appear in UI
  • Run queries via CLI → confirm parse + execute phases visible in Operations tab
  • Join a context graph → confirm sync operations appear
  • Verify hardware stats populate within 30s of node start
  • Restart node → confirm UI auto-reloads and continues working (no permanent 401 loop)
  • Check header displays configured node name, not **

Niks988 and others added 9 commits May 19, 2026 14:46
- Header: add Observability button (pulse icon) that opens the existing
  OperationsPage (All Operations / Performance / Logs / Errors tabs).
  Entry point was orphaned after the v10 UI rewrite; page itself and its
  PanelCenter wiring were already intact.

- PanelBottom: replace fixed 200px bottom panel with a fully resizable,
  draggable panel (vertical drag handle on top edge, persisted height via
  layout store). Adds a maximise/restore toggle (80vh overlay). Height
  is stored alongside leftWidth/rightWidth and survives reloads.

- Transactions tab: wired from existing /api/operations (publish/update
  ops that reached the chain phase) — on-chain activity without any new
  backend routes. Expandable rows show tx hash, peer, phase waterfall.

- Gossip tab: live-filtered view of the node log showing only libp2p /
  gossipsub / peer / SWM lines. Keyword list is broad enough to catch
  relay, DHT and protocol events.

- Node Log tab: adds level filter (error/warn/info/debug), pause button,
  auto-scroll that respects manual scroll position.

Note: all three tabs are currently backed by the local SQLite-backed
/api/* endpoints. Once the OTEL telemetry stack is live these tabs will
be replaced by Tempo trace / Loki log streams at the fleet level.

Co-authored-by: Cursor <cursoragent@cursor.com>
OperationName in packages/core/src/logger.ts now covers:
  publish, publishFromSWM, update, ka-update, query, resolve, connect,
  sync, share, gossip, reconstruct, verify, init, system

Operations.tsx OP_TYPE_COLORS and OP_TYPE_DESCRIPTIONS were missing
the seven new ones (share, publishFromSWM, ka-update, reconstruct,
verify, init, resolve). All added with distinct colours and descriptions.

PanelBottom Transactions tab TX_OP_TYPES expanded from {publish, update}
to {publish, publishFromSWM, update, ka-update, reconstruct} — all op
types that can reach the chain phase and submit a tx.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Strip ANSI escape sequences from all log lines (Node Log + Gossip)
  so raw color codes no longer appear as literal text
- Remove pause/play button from Node Log toolbar (simplified to filter
  + level select only)
- Extract useAutoScroll() hook shared by Node Log and Gossip
- Gossip tab now shows only real libp2p events (Connection opened/closed,
  ProtocolRouter timings, Circuit relay, GossipSub, FinalizationHandler)
  instead of leaking general DKGAgent structured log lines
- CSS: remove browser focus outlines on tab/toggle buttons, fix toolbar
  flex layout, add v10-log-level-select class, use pre-wrap + word-break
  on log lines so long lines wrap instead of overflowing, active tab now
  uses accent-blue underline

Co-authored-by: Cursor <cursoragent@cursor.com>
The v10 UI rewrite left Operations.tsx without corresponding stylesheet
definitions for its legacy v9 class names. Added:

  tab-group / tab-item     — horizontal tab bar with accent-blue underline
  input / select.input     — themed form controls with custom select arrow
  data-table               — striped/hoverable table with uppercase headers
  badge / badge-{success,error,warn,info} — coloured type/status pills
  empty-state / --compact / --rich — centred placeholder layouts
  page-section / page-title — page wrapper and heading
  card-title               — section heading inside cards
  filters                  — flex filter bar
  phase-bar-wrap/seg       — inline phase progress bars
  tx-link-icon             — subtle tx hash link styling

Co-authored-by: Cursor <cursoragent@cursor.com>
…ph_*

Databases created before the v10 terminology rename still have
`paranet_count` and `paranet_id` columns. INSERT statements targeting
the new `contextGraph_*` names fail silently, preventing metrics and
operations from being stored.

Adds schema version 14 which detects the old column names via
pragma table_info and renames them in-place.

Co-authored-by: Cursor <cursoragent@cursor.com>
Adds OperationTracker instrumentation for DKG events that were
previously untracked: PROJECT_SYNCED (sync), GOSSIP_MESSAGE (gossip),
KC_CONFIRMED (verify), and KA_UPDATED (ka-update).

These are event-based records (work completed in the core before
the event fires), so they capture occurrence + metadata rather than
multi-phase timing.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Reduce SNAPSHOT_INTERVAL_MS from 120s to 30s for more responsive
  hardware metrics in the Observability panel.
- Handle 401 responses in useFetch by triggering a page reload so
  the server re-injects a fresh auth token after node restarts.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Split the old "Performance" tab into dedicated "All Operations" and
  "Hardware" tabs. Operations tab shows operation stats charts + the
  full operations list; Hardware tab shows live stat cards and
  time-series graphs.
- Redesign MiniGantt component to display phase name pills with
  colored dots and durations inline (no hover required).
- Show "event-based" label for operations without phases instead of
  a bare dash.
- Fix header showing "**" instead of node name when agent identity
  has a placeholder name.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread packages/cli/src/daemon/lifecycle.ts Outdated
Comment thread packages/cli/src/daemon/lifecycle.ts Outdated
Comment thread packages/cli/src/daemon/lifecycle.ts Outdated
Comment thread packages/node-ui/src/ui/components/Shell/PanelBottom.tsx Outdated
Comment thread packages/node-ui/src/ui/components/Shell/PanelBottom.tsx Outdated
Niks988 and others added 2 commits May 22, 2026 17:33
- Remove sync (PROJECT_SYNCED) tracker: event fires after catch-up
  completes so it recorded misleading 0ms durations. Sync events are
  still surfaced via SSE notifications.
- Remove gossip (GOSSIP_MESSAGE) tracker: one operation row per inbound
  message would flood the DB on busy nodes. Gossip activity is better
  served by aggregate metrics.
- Fix ka-update tracker to use data.ual and data.rootEntities instead
  of the non-existent data.kaUri field.
- Route PanelBottom's fetchOperationsWithPhases through api-wrapper
  for consistent mock/offline fallback behavior.
- Fix PanelBottom error display: read op.error_message (the actual API
  field) instead of op.error.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ility-panel

Co-authored-by: Cursor <cursoragent@cursor.com>

# Conflicts:
#	packages/node-ui/src/ui/components/Shell/PanelBottom.tsx
#	packages/node-ui/src/ui/stores/layout.ts
#	packages/node-ui/src/ui/styles.css
Comment thread packages/node-ui/src/ui/stores/layout.ts Outdated
Comment thread packages/cli/src/daemon/lifecycle.ts Outdated
Comment thread packages/cli/src/daemon/lifecycle.ts
Comment thread packages/node-ui/src/ui/components/Shell/PanelBottom.tsx Outdated
Comment thread packages/node-ui/src/ui/hooks.ts
- Remove duplicate BOTTOM_HEIGHT_MIN/MAX declarations in layout.ts
  (keep the single source at the clamp-constants block).
- Remove dead KC_CONFIRMED listener — the event is never emitted in
  the current tree, so it was a no-op.
- Gate KA_UPDATED tracker to remote-origin updates only (fromPeerId
  present) to avoid duplicating the operation already tracked by
  /api/update for local publishes.
- Add 'verify' to TX_OP_TYPES in PanelBottom so verification chain
  transactions appear in the Transactions tab.
- Make 401 reload a one-shot recovery: only reload if a token was
  injected and no prior reload was attempted this session, preventing
  infinite refresh loops in dev mode or persistent auth failures.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread packages/node-ui/src/ui/hooks.ts Outdated
Comment thread packages/node-ui/src/ui/components/Shell/PanelBottom.tsx Outdated
Comment thread packages/node-ui/src/ui/components/Shell/PanelBottom.tsx
Comment thread packages/node-ui/src/ui/components/Shell/PanelBottom.tsx Outdated
Comment thread packages/node-ui/src/ui/components/Shell/PanelBottom.tsx
- Use sessionStorage for 401 reload marker so it persists across the
  reload and prevents infinite loops (window properties are lost).
- Replace invalid periodMs param with from (timestamp) which the
  /api/operations endpoint actually supports for time-range filtering.
- Restore viewport clamp on bottom panel height via maxBottomHeight()
  so a value saved on a tall screen can't overflow on shorter ones.
- Fix auto-scroll dependency: use last visible line content instead of
  array length, so scroll advances when new logs arrive at the buffer
  cap (length stays constant but content changes).

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread packages/node-ui/src/ui/components/Shell/PanelBottom.tsx Outdated
Comment thread packages/node-ui/src/ui/components/Shell/PanelBottom.tsx
Comment thread packages/node-ui/src/ui/hooks.ts
Comment thread packages/node-ui/src/ui/pages/Operations.tsx
…r-side

- Restore totalMs <= 0 early return in MiniGantt to prevent NaN/Infinity
  percentages when phases exist but have 0ms duration.
- Add `names` query param (comma-separated) to /api/operations so the
  Transactions tab can request only tx-capable operation types server-side.
  This prevents non-tx operations (query, share) from pushing real
  transactions out of the 100-row result set on busy nodes.
- Update DashboardDB.getOperations to support IN-clause filtering when
  multiple names are provided.

Co-authored-by: Cursor <cursoragent@cursor.com>
tracker.start(ctx, {
contextGraphId: data.contextGraphId,
peerId: data.fromPeerId,
details: {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: the remote KA_UPDATED event emitted by update-handler carries batchId and txHash, not ual. As written, these new ka-update rows lose the useful chain metadata, so Observability can't link them back to the underlying transaction. Persist the actual remote-event fields here and call tracker.setTxHash(ctx, data.txHash) before complete().

api.fetchOperationsWithPhases({ limit: '100', from, names })
.then((data: any) => {
const filtered = (data?.operations ?? []).filter((op: any) =>
(op.phases ?? []).some((p: any) => p.phase === 'chain')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: this filter only keeps operations with a chain phase, which drops the event-based ka-update records added by this PR because those rows are inserted without any phases. If this tab is meant to include ka-update, filter on tx-backed rows as well (for example op.tx_hash) or synthesize a phase when tracking the event.

const hasToken = !!(window as any).__DKG_TOKEN__;
const alreadyRetried = sessionStorage.getItem('__dkg_401_reloaded') === '1';
if (hasToken && !alreadyRetried) {
sessionStorage.setItem('__dkg_401_reloaded', '1');
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: __dkg_401_reloaded is set but never cleared. After the first stale-token incident in a tab, every later 401 will skip the auto-reload path forever and only show the error banner. Clear this flag after a successful fetch or once a fresh token has been bootstrapped.


const TX_OP_TYPES = new Set(['publish', 'publishFromSWM', 'update', 'ka-update', 'verify', 'reconstruct']);

const TX_TYPE_COLORS: Record<string, string> = {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: this introduces a second operation-type color map, and it already drifts from Operations.tsx (verify is purple here but green there). Pull the shared operation metadata from one constant so the same operation type renders consistently across the UI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant