observability: add Overview + per-service stub dashboards#4699
Draft
lukemelia wants to merge 1 commit intografana-stack/05-realms-and-users-entitiesfrom
Draft
observability: add Overview + per-service stub dashboards#4699lukemelia wants to merge 1 commit intografana-stack/05-realms-and-users-entitiesfrom
lukemelia wants to merge 1 commit intografana-stack/05-realms-and-users-entitiesfrom
Conversation
The navigation backbone for the rationalized dashboard tree. Lands last because it links to dashboards introduced in #4696, #4697, #4698. * `overview.json` — top-of-tree dashboard tagged `overview`. Five service-health stats (Loki log volume in last 5m as a liveness proxy, with drill-down panel-link to each service's stub), full-width alertlist (replaces the deleted worker-status.json), four indexing pipeline stats, an indexing throughput timeseries, and a markdown text panel listing the other dashboards by category. * `service-{realm-server,prerender-server,prerender-manager,worker}.json` — minimal per-service deep-dives tagged `service:<name>`. Each is a Loki-filtered logs panel plus a service-specific top-row chart: * realm-server: HTTP request rate (total/4xx/5xx) parsed from the `httpLogging` exit-log lines (`--> METHOD ACCEPT URL: STATUS`) * prerender / prerender-manager: same HTTP rate chart * worker: indexed-files-per-second + error-rate from the indexing-progress event stream CloudWatch ECS metrics (CPU / memory / RunningTaskCount) are intentionally deferred until cluster + task-family naming is standardized in observability config — these stubs are the natural home to add them. * `worker-status.json` deleted — its sole alertlist panel is folded into Overview row 2. The three governing principles for the rationalized tree are: 1. System dashboards aggregate; entity dashboards filter. 2. Dashboard = context; action lives in its context. 3. Drill-downs go down or sideways, never up. Overview is the only dashboard that links downward; deep-dives link sideways to peers. Logs is the universal forensics terminal, with `entity:*` sideways links that pass `realm_url` / `matrix_user_id` to pivot rather than drill back up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Observability diff (vs staging)Diff truncated (201178 bytes; limit 60000). Full diff: https://github.com/cardstack/boxel/actions/runs/25474811501 diff --git a/tmp/remote-canon.4VIHeN/dashboards/boxel-status/boxel-logs.json b/tmp/committed-canon.PyiEU3/dashboards/boxel-status/boxel-logs.json
index 02a4a9a..7f380ac 100644
--- a/tmp/remote-canon.4VIHeN/dashboards/boxel-status/boxel-logs.json
+++ b/tmp/committed-canon.PyiEU3/dashboards/boxel-status/boxel-logs.json
@@ -27,7 +27,30 @@
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
- "links": [],
+ "links": [
+ {
+ "asDropdown": false,
+ "icon": "external link",
+ "includeVars": true,
+ "keepTime": true,
+ "tags": [
+ "entity:realm"
+ ],
+ "title": "Realms",
+ "type": "dashboards"
+ },
+ {
+ "asDropdown": false,
+ "icon": "external link",
+ "includeVars": true,
+ "keepTime": true,
+ "tags": [
+ "entity:user"
+ ],
+ "title": "Users",
+ "type": "dashboards"
+ }
+ ],
"panels": [
{
"datasource": {
@@ -58,7 +81,7 @@
"type": "loki",
"uid": "loki"
},
- "expr": "{service=~\"realm-server|worker\"} |= \"[indexing-progress]\" |= \"${realm_url}\"",
+ "expr": "{service=~\"realm-server|worker\"} |= \"[indexing-progress]\" |= \"${realm_url}\" |= \"${matrix_user_id}\"",
"queryType": "range",
"refId": "A"
}
@@ -95,7 +118,7 @@
"type": "loki",
"uid": "loki"
},
- "expr": "{service=~\"realm-server|worker|prerender|prerender-manager\"} |= \"[job: $job_id]\"",
+ "expr": "{service=~\"realm-server|worker|prerender|prerender-manager\"} |= \"[job: $job_id]\" |= \"${realm_url}\" |= \"${matrix_user_id}\"",
"queryType": "range",
"refId": "A"
}
@@ -132,7 +155,7 @@
"type": "loki",
"uid": "loki"
},
- "expr": "{service=\"worker\", worker_id=\"$worker_id\"}",
+ "expr": "{service=\"worker\", worker_id=\"$worker_id\"} |= \"${realm_url}\" |= \"${matrix_user_id}\"",
"queryType": "range",
"refId": "A"
}
@@ -169,29 +192,83 @@
"type": "loki",
"uid": "loki"
},
- "expr": "{service=\"realm-server\"}",
+ "expr": "{service=\"realm-server\"} |= \"${realm_url}\" |= \"${matrix_user_id}\"",
"queryType": "range",
"refId": "A"
}
],
"title": "Realm Server Logs",
"type": "logs"
+ },
+ {
+ "datasource": {
+ "type": "loki",
+ "uid": "loki"
+ },
+ "description": "Logs from realm-server, worker, prerender, and prerender-manager for a specific indexing job. Two query patterns: (A) `[job: J.R]` substring — emitted by all four services after CS-XXXXX wired the `x-boxel-job-id` header through the prerender call chain; (B) `[indexing-progress] ... job=J` events emitted by the worker (file-by-file progress feed). Both vars pre-set by drill-throughs from the Indexing dashboard.",
+ "gridPos": {
+ "h": 14,
+ "w": 24,
+ "x": 0,
+ "y": 34
+ },
+ "id": 20,
+ "options": {
+ "dedupStrategy": "none",
+ "enableLogDetails": true,
+ "prettifyLogMessage": false,
+ "showCommonLabels": false,
+ "showLabels": false,
+ "showTime": true,
+ "sortOrder": "Descending",
+ "wrapLogMessage": true
+ },
+ "targets": [
+ {
+ "datasource": {
+ "type": "loki",
+ "uid": "loki"
+ },
+ "expr": "{service=~\"realm-server|worker|prerender|prerender-manager\"} |= \"[job: ${indexing_job_full}]\"",
+ "queryType": "range",
+ "refId": "A"
+ },
+ {
+ "datasource": {
+ "type": "loki",
+ "uid": "loki"
+ },
+ "expr": "{service=~\"realm-server|worker\"} |= \"[indexing-progress]\" |= \"job=${indexing_job_short}\"",
+ "queryType": "range",
+ "refId": "B"
+ }
+ ],
+ "title": "Indexing Job Activity",
+ "type": "logs"
}
],
"refresh": "",
"schemaVersion": 42,
- "tags": [],
+ "tags": [
+ "forensics"
+ ],
"templating": {
"list": [
{
+ "allValue": "",
+ "current": {
+ "selected": true,
+ "text": "All",
+ "value": "$__all"
+ },
"datasource": {
"type": "grafana-postgresql-datasource",
"uid": "cef5v5sl9k7i8f"
},
"definition": "SELECT DISTINCT realm_url AS __value, REPLACE(REPLACE(realm_url, '${realm_server}', ''), 'https://', '') AS __text FROM boxel_index ORDER BY __text;",
- "description": "Filter the Indexing Activity Feed to one realm. Drill-throughs from Boxel Jobs prefill this.",
+ "description": "Filter all log panels to lines mentioning this realm. 'All' = no realm filter. Drill-throughs from other dashboards prefill this.",
"hide": 0,
- "includeAll": false,
+ "includeAll": true,
"label": "Realm",
"multi": false,
"name": "realm_url",
@@ -210,6 +287,32 @@
"skipUrlSync": false,
"type": "constant"
},
+ {
+ "allValue": "",
+ "current": {
+ "selected": true,
+ "text": "All",
+ "value": "$__all"
+ },
+ "datasource": {
+ "type": "grafana-postgresql-datasource",
+ "uid": "cef5v5sl9k7i8f"
+ },
+ "definition": "SELECT DISTINCT matrix_user_id FROM users ORDER BY matrix_user_id;",
+ "description": "Filter all log panels to lines mentioning this user. 'All' = no user filter. Drill-throughs from other dashboards prefill this.",
+ "hide": 0,
+ "includeAll": true,
+ "label": "User",
+ "multi": false,
+ "name": "matrix_user_id",
+ "options": [],
+ "query": "SELECT DISTINCT matrix_user_id FROM users ORDER BY matrix_user_id;",
+ "refresh": 1,
+ "regex": "",
+ "skipUrlSync": false,
+ "sort": 1,
+ "type": "query"
+ },
{
"datasource": {
"type": "grafana-postgresql-datasource",
@@ -249,6 +352,36 @@
"skipUrlSync": false,
"sort": 0,
"type": "query"
+ },
+ {
+ "current": {
+ "selected": false,
+ "text": "",
+ "value": ""
+ },
+ "description": "Full job.reservation id (e.g., 20678.26619) for the Indexing Job Activity panel. Set automatically by drill-throughs from the Indexing dashboard.",
+ "hide": 0,
+ "label": "Indexing job (full id)",
+ "name": "indexing_job_full",
+ "options": [],
+ "query": "",
+ "skipUrlSync": false,
+ "type": "textbox"
+ },
+ {
+ "current": {
+ "selected": false,
+ "text": "",
+ "value": ""
+ },
+ "description": "Short job id (just j.id, no reservation suffix) for matching `[indexing-progress] ... job=J` events.",
+ "hide": 0,
+ "label": "Indexing job (short id)",
+ "name": "indexing_job_short",
+ "options": [],
+ "query": "",
+ "skipUrlSync": false,
+ "type": "textbox"
}
]
},
diff --git a/tmp/remote-canon.4VIHeN/dashboards/boxel-status/indexing.json b/tmp/committed-canon.PyiEU3/dashboards/boxel-status/indexing.json
index 6a40566..21ebdf5 100644
--- a/tmp/remote-canon.4VIHeN/dashboards/boxel-status/indexing.json
+++ b/tmp/committed-canon.PyiEU3/dashboards/boxel-status/indexing.json
@@ -27,108 +27,264 @@
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
- "links": [],
+ "links": [
+ {
+ "asDropdown": false,
+ "icon": "external link",
+ "includeVars": false,
+ "keepTime": true,
+ "tags": [
+ "entity:realm"
+ ],
+ "title": "Realms",
+ "type": "dashboards"
+ },
+ {
+ "asDropdown": false,
+ "icon": "external link",
+ "includeVars": false,
+ "keepTime": true,
+ "tags": [
+ "workflow:queue"
+ ],
+ "title": "Job Queue",
+ "type": "dashboards"
+ },
+ {
+ "asDropdown": false,
+ "icon": "external link",
+ "includeVars": false,
+ "keepTime": true,
+ "tags": [
+ "forensics"
+ ],
+ "title": "Logs",
+ "type": "dashboards"
+ }
+ ],
"panels": [
{
"datasource": {
"type": "grafana-postgresql-datasource",
"uid": "cef5v5sl9k7i8f"
},
- "description": "Operator actions: trigger a reindex via the realm-server. Each button POSTs with an Authorization: Bearer header (token substituted into a hidden constant template variable at apply time from SSM, CS-10929) and shows a confirmation dialog. Single-realm reindex targets the realm in the variable picker above; full reindex hits every realm and is the more disruptive option.",
+ "description": "Operator actions: trigger a reindex via the realm-server. Live blast-radius (pending / in-flight / oldest pending) is fetched from boxel_index/jobs every refresh; the reindex buttons disable themselves while an indexing job is already in flight for the selected realm. Each click POSTs with `Authorization: Bearer ${grafana_secret}` (substituted from SSM at apply time, CS-10929).",
"fieldConfig": {
- "defaults": {
- "actions": [
- {
- "confirmation": "Reindex ${full_index_realm}?",
- "fetch": {
- "body": "",
- "headers": [
- [
- "Authorization",
- "Bearer ${grafana_secret}"
- ]
- ],
- "method": "POST",
- "queryParams": [
- [
- "realm",
- "${full_index_realm}"
- ]
- ],
- "url": "${realm_server}_grafana-reindex"
- },
- "oneClick": false,
- "title": "Reindex ${full_index_realm}",
- "type": "fetch"
- },
- {
- "confirmation": "Reindex ALL realms? This kicks off an indexing job for every realm on the server and can take a long time.",
- "fetch": {
- "body": "",
- "headers": [
- [
- "Authorization",
- "Bearer ${grafana_secret}"
- ]
- ],
- "method": "POST",
- "queryParams": [],
- "url": "${realm_server}_grafana-full-reindex"
- },
- "oneClick": false,
- "title": "Reindex ALL realms",
- "type": "fetch"
- }
- ],
- "color": {
- "mode": "thresholds"
- },
- "mappings": [
- {
- "options": {
- "from": 0,
- "result": {
- "index": 0,
- "text": "Reindex"
- },
- "to": 9999999999999
- },
- "type": "range"
- }
- ],
- "thresholds": {
- "mode": "absolute",
- "steps": [
- {
- "color": "blue"
- }
- ]
- }
- },
+ "defaults": {},
"overrides": []
},
"gridPos": {
- "h": 4,
+ "h": 11,
"w": 24,
"x": 0,
"y": 0
},
"id": 4,
"options": {
- "colorMode": "value",
- "graphMode": "none",
- "justifyMode": "center",
- "orientation": "horizontal",
- "reduceOptions": {
- "calcs": [
- "lastNotNull"
+ "buttonGroup": {
+ "orientation": "center",
+ "size": "md"
+ },
+ "confirmModal": {
+ "body": "Please confirm the action.",
+ "cancel": "Cancel",
+ "columns": {
+ "include": [
+ "name",
+ "newValue"
+ ],
+ "name": "Field",
+ "newValue": "Value",
+ "oldValue": "Previous"
+ },
+ "confirm": "Confirm",
+ "elementDisplayMode": "modified",
+ "title": "Confirm operator action"
+ },
+ "elementValueChanged": "if (context.element.id === 'realm_picker' && context.element.value) {\n context.grafana.locationService.partial({ 'var-full_index_realm': context.element.value }, true);\n}\n",
+ "elements": [
+ {
+ "id": "realm_picker",
+ "labelWidth": 28,
+ "options": [],
+ "optionsSource": "Query",
+ "queryField": {
+ "refId": "A",
+ "value": "realm"
+ },
+ "queryOptions": {
+ "label": "label",
+ "source": "B",
+ "value": "value"
+ },
+ "section": "current",
+ "title": "Realm",
+ "tooltip": "Pick the realm to operate on. Selection is mirrored into the URL (?var-full_index_realm=…) so links to this dashboard preselect a realm.",
+ "type": "select",
+ "value": ""
+ },
+ {
+ "fieldName": "pending",
+ "id": "pending",
+ "labelWidth": 28,
+ "queryField": {
+ "refId": "A",
+ "value": "pending"
+ },
+ "section": "current",
+ "title": "Pending jobs (this realm)",
+ "tooltip": "Indexing jobs queued for the selected realm with no live worker reservation.",
+ "type": "disabled"
+ },
+ {
+ "fieldName": "in_flight",
+ "id": "in_flight",
+ "labelWidth": 28,
+ "queryField": {
+ "refId": "A",
+ "value": "in_flight"
+ },
+ "section": "current",
+ "title": "In-flight (this realm)",
+ "tooltip": "Indexing jobs currently held by a worker for the selected realm. While > 0, the per-realm reindex button is disabled.",
+ "type": "disabled"
+ },
+ {
+ "fieldName": "pending_full_reindex",
+ "id": "pending_full_reindex",
+ "labelWidth": 28,
+ "queryField": {
+ "refId": "A",
+ "value": "pending_full_reindex"
+ },
+ "section": "current",
+ "title": "Pending full-reindex",
+ "tooltip": "Number of `full-reindex` orchestration jobs currently queued or running. While > 0, the \"Reindex ALL realms\" button is disabled to prevent stacking duplicate full reindexes.",
+ "type": "disabled"
+ },
+ {
+ "fieldName": "oldest_pending_human",
+ "id": "oldest_pending_human",
+ "labelWidth": 28,
+ "queryField": {
+ "refId": "A",
+ "value": "oldest_pending_human"
+ },
+ "section": "current",
+ "title": "Oldest pending",
+ "tooltip": "Age of the oldest pending indexing job for the selected realm. Sustained age usually means workers are saturated or stuck.",
+ "type": "disabled"
+ },
+ {
+ "fieldName": "last_reindex_status",
+ "id": "last_reindex_status",
+ "labelWidth": 28,
+ "queryField": {
+ "refId": "A",
+ "value": "last_reindex_status"
+ },
+ "rows": 2,
+ "section": "current",
+ "title": "Last reindex (this realm)",
+ "tooltip": "Most recent from-scratch-index for the selected realm. Reads from the jobs / job_progress tables (CS-10930).",
+ "type": "disabledTextarea"
+ },
+ {
+ "buttonLabel": "Reindex ${full_index_realm:text}",
+ "customCode": "const realm = '${full_index_realm}';\nconst inFlight = Number((context.panel.elements.find(function(e){return e.id==='in_flight';})||{}).value || 0);\nconst pending = Number((context.panel.elements.find(function(e){return e.id==='pending';})||{}).value || 0);\nconst oldest = (context.panel.elements.find(function(e){return e.id==='oldest_pending_human';})||{}).value || 'n/a';\nif (inFlight > 0) {\n context.grafana.notifyWarning(['Reindex blocked', 'An indexing job is already in flight for this realm. Wait for it to finish before triggering a new one.']);\n return;\n}\nif (!window.confirm('Reindex ' + realm + '?\\n\\nBlast radius:\\n pending: ' + pending + '\\n oldest pending: ' + oldest + '\\n\\nThis will queue a from-scratch index for the selected realm only.')) { return; }\ntry {\n const r = await fetch('${realm_server}_grafana-reindex?realm=' + encodeURIComponent(realm), { method: 'POST', headers: { 'Authorization': 'Bearer ${grafana_secret}' } });\n if (r.ok) {\n context.grafana.notifySuccess(['Reindex queued', 'Started reindex of ' + realm]);\n if (typeof context.grafana.refresh === 'function') { context.grafana.refresh(); }\n } else {\n const txt = await r.text();\n context.grafana.notifyError(['Reindex failed', 'HTTP ' + r.status + ': ' + txt]);\n }\n} catch (err) {\n context.grafana.notifyError(['Reindex failed', String(err)]);\n}\n",
+ "disableIf": "return Number((context.panel.elements.find(function(e){return e.id==='in_flight';})||{}).value || 0) > 0;",
+ "id": "btn_reindex_realm",
+ "labelWidth": 28,
+ "section": "actions",
+ "show": "form",
+ "size": "md",
+ "title": "",
+ "tooltip": "POST /_grafana-reindex?realm=${full_index_realm}. Disabled while an indexing job is in flight for this realm.",
+ "type": "button",
+ "value": "",
+ "variant": "primary"
+ },
+ {
+ "buttonLabel": "Reindex ALL realms",
+ "customCode": "const pendingFull = Number((context.panel.elements.find(function(e){return e.id==='pending_full_reindex';})||{}).value || 0);\nif (pendingFull > 0) {\n context.grafana.notifyWarning(['Full reindex blocked', 'A full-reindex job is already pending or running. Wait for it to finish before triggering another.']);\n return;\n}\nif (!window.confirm('Reindex ALL realms?\\n\\nThis kicks off an indexing job for every realm on the server and can take a long time.')) { return; }\ntry {\n const r = await fetch('${realm_server}_grafana-full-reindex', { method: 'POST', headers: { 'Authorization': 'Bearer ${grafana_secret}' } });\n if (r.ok) {\n context.grafana.notifySuccess(['Full reindex queued', 'Started reindex of all realms.']);\n if (typeof context.grafana.refresh === 'function') { context.grafana.refresh(); }\n } else {\n const txt = await r.text();\n context.grafana.notifyError(['Full reindex failed', 'HTTP ' + r.status + ': ' + txt]);\n }\n} catch (err) {\n context.grafana.notifyError(['Full reindex failed', String(err)]);\n}\n",
+ "disableIf": "return Number((context.panel.elements.find(function(e){return e.id==='pending_full_reindex';})||{}).value || 0) > 0;",
+ "id": "btn_reindex_all",
+ "labelWidth": 28,
+ "section": "actions",
+ "show": "form",
+ "size": "md",
+ "title": "",
+ "tooltip": "POST /_grafana-full-reindex. Disabled while a `full-reindex` orchestration job is already pending or running. Long-running — every realm is reindexed.",
+ "type": "button",
+ "value": "",
+ "variant": "destructive"
+ }
+ ],
+ "initial": {
+ "code": "",
+ "contentType": "application/json",
+ "getPayload": "return {};",
+ "highlight": false,
+ "method": "query",
+ "payload": {}
+ },
+ "layout": {
+ "orientation": "horizontal",
+ "padding": 10,
+ "sectionVariant": "default",
+ "sections": [
+ {
+ "id": "current",
+ "name": "Current state"
+ },
+ {
+ "id": "actions",
+ "name": "Actions"
+ }
],
- "fields": "",
- "values": false
+ "variant": "split"
+ },
+ "reset": {
+ "backgroundColor": "purple",
+ "foregroundColor": "yellow",
+ "icon": "process",
+ "text": "Refresh",
+ "variant": "hidden"
},
- "textMode": "name"
+ "resetAction": {
+ "code": "",
+ "confirm": false,
+ "contentType": "application/json",
+ "getPayload": "return {};",
+ "method": "-",
+ "mode": "initial",
+ "payload": {}
+ },
+ "saveDefault": {
+ "icon": "save",
+ "text": "Save Default",
+ "variant": "hidden"
+ },
+ "submit": {
+ "backgroundColor": "purple",
+ "foregroundColor": "yellow",
+ "icon": "cloud-upload",
+ "text": "Submit",
+ "variant": "hidden"
+ },
+ "sync": false,
+ "update": {
+ "code": "",
+ "confirm": false,
+ "contentType": "application/json",
+ "getPayload": "return {};",
+ "method": "-",
+ "payload": {},
+ "payloadMode": "all"
+ },
+ "updateEnabled": "disabled"
},
- "pluginVersion": "12.4.3",
+ "pluginVersion": "6.2.0",
"targets": [
{
"datasource": {
@@ -138,36 +294,30 @@
"editorMode": "code",
"format": "table",
"rawQuery": true,
- "rawSql": "SELECT 1 AS click;",
- "refId": "A",
- "sql": {
- "columns": [
- {
- "parameters": [],
- "type": "function"
- }
- ],
- "groupBy": [
- {
- "property": {
- "type": "string"
- },
- "type": "groupBy"
- }
- ],
- "limit": 50
- }
+ "rawSql": "WITH realm_jobs AS (\n SELECT j.*\n FROM jobs j\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n AND COALESCE(j.args->>'realmURL','') = '${full_index_realm}'\n),\nrealm_pending AS (\n SELECT COUNT(*) AS n,\n MIN(j.created_at) AS oldest_created\n FROM realm_jobs j\n LEFT JOIN job_reservations jr ON j.id = jr.job_id\n AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n WHERE j.status = 'unfulfilled' AND jr.id IS NULL\n),\nrealm_in_flight AS (\n SELECT COUNT(*) AS n\n FROM realm_jobs j\n JOIN job_reservations jr ON j.id = jr.job_id\n AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n WHERE j.finished_at IS NULL\n),\npending_full_reindex AS (\n SELECT COUNT(*) AS n\n FROM jobs j\n WHERE j.job_type = 'full-reindex'\n AND j.finished_at IS NULL\n),\nlast_reindex AS (\n SELECT j.id, j.created_at AS started, j.finished_at AS finished,\n j.status,\n COALESCE(jp.files_completed, 0) AS files_completed,\n COALESCE(jp.total_files, 0) AS total_files\n FROM realm_jobs j\n LEFT JOIN job_progress jp ON jp.job_id = j.id\n WHERE j.job_type = 'from-scratch-index'\n ORDER BY j.created_at DESC\n LIMIT 1\n)\nSELECT\n '${full_index_realm}' AS realm,\n COALESCE((SELECT n FROM realm_pending), 0) AS pending,\n COALESCE((SELECT n FROM realm_in_flight), 0) AS in_flight,\n COALESCE((SELECT n FROM pending_full_reindex), 0) AS pending_full_reindex,\n CASE\n WHEN (SELECT oldest_created FROM realm_pending) IS NULL THEN '—'\n ELSE TO_CHAR(NOW() - (SELECT oldest_created FROM realm_pending), 'HH24:MI:SS')\n || ' (since ' || TO_CHAR((SELECT oldest_created FROM realm_pending) AT TIME ZONE 'UTC', 'YYYY-MM-DD HH24:MI:SS') || ' UTC)'\n END AS oldest_pending_human,\n COALESCE(\n (SELECT\n CASE\n WHEN finished IS NULL AND started IS NOT NULL THEN\n 'running — ' || files_completed || '/' || NULLIF(total_files,0) || ' files, started ' || TO_CHAR(started AT TIME ZONE 'UTC', 'YYYY-MM-DD HH24:MI:SS') || ' UTC'\n WHEN finished IS NOT NULL THEN\n INITCAP(COALESCE(status::text,'finished')) || ' at ' || TO_CHAR(finished AT TIME ZONE 'UTC', 'YYYY-MM-DD HH24:MI:SS') || ' UTC'\n ELSE COALESCE(status::text,'unknown')\n END\n FROM last_reindex),\n 'never'\n ) AS last_reindex_status;",
+ "refId": "A"
+ },
+ {
+ "datasource": {
+ "type": "grafana-postgresql-datasource",
+ "uid": "cef5v5sl9k7i8f"
+ },
+ "editorMode": "code",
+ "format": "table",
+ "rawQuery": true,
+ "rawSql": "SELECT REGEXP_REPLACE(url, '^https?://', '') AS label, url AS value FROM realm_registry WHERE kind IN ('bootstrap', 'source') ORDER BY 1;",
+ "refId": "B"
}
],
"title": "Operator Actions",
- "type": "stat"
+ "type": "volkovlabs-form-panel"
},
{
"datasource": {
"type": "grafana-postgresql-datasource",
"uid": "cef5v5sl9k7i8f"
},
- "description": "Indexing jobs waiting for a worker across all realm-server tasks (CS-10930). Reconciles with `SELECT count(*) FROM jobs WHERE status='unfulfilled' AND job_type IN ('from-scratch-index','incremental-index')` minus those with an active reservation.",
+ "description": "Indexing jobs waiting for a worker across all realm-server tasks (CS-10930). Reconciles with `SELECT count(*) FROM jobs WHERE status='unfulfilled' AND job_type IN ('from-scratch-index','incremental-index','full-reindex')` minus those with an active reservation.",
"fieldConfig": {
"defaults": {
"color": {
@@ -225,7 +375,7 @@
"editorMode": "code",
"format": "table",
"rawQuery": true,
- "rawSql": "SELECT COUNT(*) AS pending\n FROM jobs j\n LEFT JOIN job_reservations jr ON j.id = jr.job_id\n AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n WHERE j.status = 'unfulfilled'\n AND j.job_type IN ('from-scratch-index','incremental-index')\n AND jr.id IS NULL;",
+ "rawSql": "SELECT COUNT(*) AS pending\n FROM jobs j\n LEFT JOIN job_reservations jr ON j.id = jr.job_id\n AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n WHERE j.status = 'unfulfilled'\n AND j.job_type IN ('from-scratch-index','incremental-index','full-reindex')\n AND jr.id IS NULL;",
"refId": "A"
}
],
@@ -291,7 +441,7 @@
"editorMode": "code",
"format": "table",
"rawQuery": true,
- "rawSql": "SELECT COUNT(*) AS in_flight\n FROM jobs j\n JOIN job_reservations jr ON j.id = jr.job_id\n AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n WHERE j.finished_at IS NULL\n AND j.job_type IN ('from-scratch-index','incremental-index');",
+ "rawSql": "SELECT COUNT(*) AS in_flight\n FROM jobs j\n JOIN job_reservations jr ON j.id = jr.job_id\n AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n WHERE j.finished_at IS NULL\n AND j.job_type IN ('from-scratch-index','incremental-index','full-reindex');",
"refId": "A"
}
],
@@ -361,7 +511,7 @@
"editorMode": "code",
"format": "table",
"rawQuery": true,
- "rawSql": "SELECT EXTRACT(EPOCH FROM (NOW() - MIN(j.created_at))) AS oldest_pending_seconds\n FROM jobs j\n LEFT JOIN job_reservations jr ON j.id = jr.job_id\n AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n WHERE j.status = 'unfulfilled'\n AND j.job_type IN ('from-scratch-index','incremental-index')\n AND jr.id IS NULL;",
+ "rawSql": "SELECT EXTRACT(EPOCH FROM (NOW() - MIN(j.created_at))) AS oldest_pending_seconds\n FROM jobs j\n LEFT JOIN job_reservations jr ON j.id = jr.job_id\n AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n WHERE j.status = 'unfulfilled'\n AND j.job_type IN ('from-scratch-index','incremental-index','full-reindex')\n AND jr.id IS NULL;",
"refId": "A"
}
],
@@ -435,7 +585,7 @@
"editorMode": "code",
"format": "time_series",
"rawQuery": true,
- "rawSql": "SELECT $__timeGroupAlias(j.created_at, '30s') AS time,\n COUNT(*) AS arrived\n FROM jobs j\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n AND $__timeFilter(j.created_at)\n GROUP BY 1\n ORDER BY 1;",
+ "rawSql": "SELECT $__timeGroupAlias(j.created_at, '30s') AS time,\n COUNT(*) AS arrived\n FROM jobs j\n WHERE j.job_type IN ('from-scratch-index','incremental-index','full-reindex')\n AND $__timeFilter(j.created_at)\n GROUP BY 1\n ORDER BY 1;",
"refId": "A"
},
{
@@ -446,7 +596,7 @@
"editorMode": "code",
"format": "time_series",
"rawQuery": true,
- "rawSql": "SELECT $__timeGroupAlias(jr.created_at, '30s') AS time,\n COUNT(*) AS started\n FROM job_reservations jr\n JOIN jobs j ON j.id = jr.job_id\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n AND $__timeFilter(jr.created_at)\n GROUP BY 1\n ORDER BY 1;",
+ "rawSql": "SELECT $__timeGroupAlias(jr.created_at, '30s') AS time,\n COUNT(*) AS started\n FROM job_reservations jr\n JOIN jobs j ON j.id = jr.job_id\n WHERE j.job_type IN ('from-scratch-index','incremental-index','full-reindex')\n AND $__timeFilter(jr.created_at)\n GROUP BY 1\n ORDER BY 1;",
"refId": "B"
},
{
@@ -457,7 +607,7 @@
"editorMode": "code",
"format": "time_series",
"rawQuery": true,
- "rawSql": "SELECT $__timeGroupAlias(j.finished_at, '30s') AS time,\n COUNT(*) AS completed\n FROM jobs j\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n AND j.finished_at IS NOT NULL\n AND $__timeFilter(j.finished_at)\n GROUP BY 1\n ORDER BY 1;",
+ "rawSql": "SELECT $__timeGroupAlias(j.finished_at, '30s') AS time,\n COUNT(*) AS completed\n FROM jobs j\n WHERE j.job_type IN ('from-scratch-index','incremental-index','full-reindex')\n AND j.finished_at IS NOT NULL\n AND $__timeFilter(j.finished_at)\n GROUP BY 1\n ORDER BY 1;",
"refId": "C"
}
],
@@ -531,7 +681,7 @@
"editorMode": "code",
"format": "time_series",
"rawQuery": true,
- "rawSql": "WITH buckets AS (\n SELECT generate_series($__timeFrom()::timestamptz, $__timeTo()::timestamptz, '1 minute') AS bucket\n),\nindexing_jobs AS (\n SELECT j.id, j.created_at, j.finished_at,\n (SELECT MIN(jr.created_at) FROM job_reservations jr WHERE jr.job_id = j.id) AS first_started_at\n FROM jobs j\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n AND j.created_at <= $__timeTo()::timestamptz\n)\nSELECT b.bucket AS time,\n COUNT(*) FILTER (WHERE ij.created_at <= b.bucket\n AND (ij.first_started_at IS NULL OR ij.first_started_at > b.bucket)\n AND (ij.finished_at IS NULL OR ij.finished_at > b.bucket)) AS pending,\n COUNT(*) FILTER (WHERE ij.first_started_at IS NOT NULL\n AND ij.first_started_at <= b.bucket\n AND (ij.finished_at IS NULL OR ij.finished_at > b.bucket)) AS in_flight,\n COUNT(*) FILTER (WHERE ij.finished_at IS NOT NULL AND ij.finished_at <= b.bucket) AS completed\n FROM buckets b LEFT JOIN indexing_jobs ij ON TRUE\n GROUP BY b.bucket\n ORDER BY b.bucket;",
+ "rawSql": "WITH buckets AS (\n SELECT generate_series($__timeFrom()::timestamptz, $__timeTo()::timestamptz, '1 minute') AS bucket\n),\nindexing_jobs AS (\n SELECT j.id, j.created_at, j.finished_at,\n (SELECT MIN(jr.created_at) FROM job_reservations jr WHERE jr.job_id = j.id) AS first_started_at\n FROM jobs j\n WHERE j.job_type IN ('from-scratch-index','incremental-index','full-reindex')\n AND j.created_at <= $__timeTo()::timestamptz\n)\nSELECT b.bucket AS time,\n COUNT(*) FILTER (WHERE ij.created_at <= b.bucket\n AND (ij.first_started_at IS NULL OR ij.first_started_at > b.bucket)\n AND (ij.finished_at IS NULL OR ij.finished_at > b.bucket)) AS pending,\n COUNT(*) FILTER (WHERE ij.first_started_at IS NOT NULL\n AND ij.first_started_at <= b.bucket\n AND (ij.finished_at IS NULL OR ij.finished_at > b.bucket)) AS in_flight,\n COUNT(*) FILTER (WHERE ij.finished_at IS NOT NULL AND ij.finished_at <= b.bucket) AS completed\n FROM buckets b LEFT JOIN indexing_jobs ij ON TRUE\n GROUP BY b.bucket\n ORDER BY b.bucket;",
"refId": "A"
}
],
@@ -665,8 +815,8 @@
"value": [
{
"targetBlank": true,
- "title": "View logs",
- "url": "/d/fetquzizsej28b?${__url_time_range}&var-job_id=${__data.fields.job_id}.${__data.fields.reservation_id}&orgId=1&viewPanel=3"
+ "title": "View job logs",
+ "url": "/d/fetquzizsej28b?${__url_time_range}&var-indexing_job_full=${__data.fields.job_id}.${__data.fields.reservation_id}&var-indexing_job_short=${__data.fields.job_id}&orgId=1&viewPanel=20"
}
]
},
@@ -864,6 +1014,7 @@
"type": "grafana-postgresql-datasource",
"uid": "cef5v5sl9k7i8f"
},
+ "description": "Indexing jobs (from-scratch-index, incremental-index) waiting on a worker. Same query as Job Queue → Waiting Jobs, narrowed to indexing job types.",
"fieldConfig": {
"defaults": {
"color": {
@@ -957,12 +1108,12 @@
]
},
"gridPos": {
- "h": 9,
+ "h": 10,
"w": 24,
"x": 0,
"y": 40
},
- "id": 2,
+ "id": 18,
"options": {
"cellHeight": "sm",
"footer": {
@@ -987,7 +1138,7 @@
"editorMode": "code",
"format": "table",
"rawQuery": true,
- "rawSql": "SELECT \n j.id, \n j.priority, \n j.job_type, \n CASE \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/.+' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://[^/]+/', '') \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/?$' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://', '') \n ELSE j.concurrency_group \n END AS concurrency_group, \n j.status AS status, \n j.created_at AS created_at, \n\n\n -- Wait time in seconds\n CASE \n WHEN jr.created_at IS NOT NULL \n THEN EXTRACT(EPOCH FROM (jr.created_at - j.created_at))\n ELSE \n EXTRACT(EPOCH FROM (NOW() - j.created_at))\n END\n AS wait_seconds,\n j.id as job_id\n\nFROM \n jobs j\n \nLEFT JOIN \n job_reservations jr ON j.id = jr.job_id\n\nWHERE\njr.job_id IS NULL AND j.status = 'unfulfilled' \n \nORDER BY \n j.created_at ASC\nLIMIT 500;",
+ "rawSql": "SELECT \n j.id, \n j.priority, \n j.job_type, \n CASE \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/.+' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://[^/]+/', '') \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/?$' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://', '') \n ELSE j.concurrency_group \n END AS concurrency_group, \n j.status AS status, \n j.created_at AS created_at, \n\n\n -- Wait time in seconds\n CASE \n WHEN jr.created_at IS NOT NULL \n THEN EXTRACT(EPOCH FROM (jr.created_at - j.created_at))\n ELSE \n EXTRACT(EPOCH FROM (NOW() - j.created_at))\n END\n AS wait_seconds,\n j.id as job_id\n\nFROM \n jobs j\n \nLEFT JOIN \n job_reservations jr ON j.id = jr.job_id\n\nWHERE\njr.job_id IS NULL AND j.status = 'unfulfilled' AND j.job_type IN ('from-scratch-index','incremental-index','full-reindex') \n \nORDER BY \n j.created_at ASC\nLIMIT 500;",
"refId": "A",
"sql": {
"columns": [
@@ -1008,334 +1159,15 @@
}
}
],
- "title": "Waiting Jobs",
- "type": "table"
- },
- {
- "datasource": {
- "type": "grafana-postgresql-datasource",
- "uid": "cef5v5sl9k7i8f"
- },
- "fieldConfig": {
- "defaults": {
- "color": {
- "mode": "thresholds"
- },
- "custom": {
- "align": "left",
- "cellOptions": {
- "type": "auto"
- },
- "filterable": true,
- "inspect": false,
- "minWidth": 150
- },
- "mappings": [],
- "thresholds": {
- "mode": "absolute",
- "steps": [
- {
- "color": "green"
- },
- {
- "color": "red",
- "value": 80
- }
- ]
- }
- },
- "overrides": [
- {
- "matcher": {
- "id": "byName",
- "options": "worker_id"
- },
- "properties": [
- {
- "id": "links",
- "value": [
- {
- "targetBlank": true,
- "title": "View logs",
- "url": "/d/fetquzizsej28b?${__url_time_range}&var-job_id=${__data.fields.id}.${__data.fields.reservation_id}&orgId=1&viewPanel=3"
- }
- ]
- },
- {
- "id": "mappings",
- "value": [
- {
- "options": {
- "pattern": "^(.{6}).*$",
- "result": {
- "index": 0,
- "text": "View logs ($1)"
- }
- },
- "type": "regex"
- }
- ]
- }
- ]
- },
- {
- "matcher": {
- "id": "byName",
- "options": "reservation_id"
- },
- "properties": [
- {
- "id": "actions",
- "value": [
- {
- "confirmation": "Cancel running reservation ${__value.raw}? The worker will stop processing it.",
- "fetch": {
- "body": "",
- "headers": [
- [
- "Authorization",
- "Bearer ${grafana_secret}"
- ]
- ],
- "method": "POST",
- "queryParams": [
- [
- "reservation_id",
- "${__value.raw}"
- ]
- ],
- "url": "${realm_server}_grafana-complete-job"
- },
- "oneClick": false,
- "title": "Delete reservation ${__value.raw}",
- "type": "fetch"
- }
- ]
- },
- {
- "id": "mappings",
- "value": [
- {
- "options": {
- "from": 0,
- "result": {
- "color": "red",
- "index": 0,
- "text": "Delete"
- },
- "to": 9999999999999
- },
- "type": "range"
- }
- ]
- },
- {
- "id": "displayName",
- "value": "Action"
- },
- {
- "id": "custom.filterable",
- "value": false
- }
- ]
- }
- ]
- },
- "gridPos": {
- "h": 11,
- "w": 24,
- "x": 0,
- "y": 49
- },
- "id": 1,
- "options": {
- "cellHeight": "sm",
- "footer": {
- "countRows": false,
- "enablePagination": false,
- "fields": "",
- "reducer": [
- "sum"
- ],
- "show": false
- },
- "showHeader": true,
- "sortBy": []
- },
- "pluginVersion": "10.4.1",
- "targets": [
- {
- "datasource": {
- "type": "grafana-postgresql-datasource",
- "uid": "cef5v5sl9k7i8f"
- },
- "editorMode": "code",
- "format": "table",
- "rawQuery": true,
- "rawSql": "SELECT \n j.id,\n COALESCE(jrc.attempt, 0) AS attempt, \n j.priority, \n j.job_type, \n CASE \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/.+' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://[^/]+/', '') \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/?$' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://', '') \n ELSE j.concurrency_group \n END AS concurrency_group, \n j.status AS status, \n j.created_at AS created_at, \n\n\n -- Wait time in seconds\n CASE \n WHEN jr.created_at IS NOT NULL \n THEN EXTRACT(EPOCH FROM (jr.created_at - j.created_at))\n ELSE \n EXTRACT(EPOCH FROM (NOW() - j.created_at))\n END\n AS wait_seconds,\n\n jr.created_at AS started_at, \n\n\n -- Run time in seconds\n CASE \n WHEN jr.created_at IS NOT NULL THEN\n CASE \n WHEN j.finished_at IS NOT NULL \n THEN EXTRACT(EPOCH FROM (j.finished_at - jr.created_at))\n ELSE \n EXTRACT(EPOCH FROM (NOW() - jr.created_at))\n END\n ELSE NULL\n END\n AS run_seconds\n, jr.worker_id,\n jr.id as reservation_id \n\nFROM \n jobs j\nJOIN \n job_reservations jr ON j.id = jr.job_id AND jr.completed_at IS NULL AND jr.locked_until > NOW()\nLEFT JOIN \n (SELECT job_id, COUNT(*) AS attempt FROM job_reservations GROUP BY job_id) jrc ON j.id = jrc.job_id\nWHERE j.finished_at IS NULL\nORDER BY \n jr.created_at DESC\nLIMIT 500;",
- "refId": "A",
- "sql": {
- "columns": [
- {
- "parameters": [],
- "type": "function"
- }
- ],
- "groupBy": [
- {
- "property": {
- "type": "string"
- },
- "type": "groupBy"
- }
- ],
- "limit": 50
- }
- }
- ],
- "title": "Running Jobs",
- "type": "table"
- },
- {
- "datasource": {
- "type": "grafana-postgresql-datasource",
- "uid": "cef5v5sl9k7i8f"
- },
- "fieldConfig": {
- "defaults": {
- "color": {
- "mode": "thresholds"
- },
- "custom": {
- "align": "left",
- "cellOptions": {
- "type": "auto"
- },
- "filterable": true,
- "inspect": false,
- "minWidth": 150
- },
- "mappings": [],
- "thresholds": {
- "mode": "absolute",
- "steps": [
- {
- "color": "green"
- },
- {
- "color": "red",
- "value": 80
- }
- ]
- }
- },
- "overrides": [
- {
- "matcher": {
- "id": "byName",
- "options": "worker_id"
- },
- "properties": [
- {
- "id": "links",
- "value": [
- {
- "targetBlank": true,
- "title": "View logs",
- "url": "/d/fetquzizsej28b?${__url_time_range}&var-job_id=${__data.fields.id}.${__data.fields.reservation_id}&orgId=1&viewPanel=3\n\n\n"
- }
- ]
- },
- {
- "id": "mappings",
- "value": [
- {
- "options": {
- "pattern": "^(.{6}).*$",
- "result": {
- "index": 0,
- "text": "View logs ($1)"
- }
- },
- "type": "regex"
- }
- ]
- }
- ]
- },
- {
- "matcher": {
- "id": "byName",
- "options": "reservation_id"
- },
- "properties": [
- {
- "id": "custom.hidden",
- "value": true
- }
- ]
- }
- ]
- },
- "gridPos": {
- "h": 18,
- "w": 24,
- "x": 0,
- "y": 60
- },
- "id": 3,
- "options": {
- "cellHeight": "sm",
- "footer": {
- "countRows": false,
- "enablePagination": false,
- "fields": "",
- "reducer": [
- "sum"
- ],
- "show": false
- },
- "showHeader": true,
- "sortBy": []
- },
- "pluginVersion": "10.4.1",
- "targets": [
- {
- "datasource": {
- "type": "grafana-postgresql-datasource",
- "uid": "cef5v5sl9k7i8f"
- },
- "editorMode": "code",
- "format": "table",
- "rawQuery": true,
- "rawSql": "SELECT \n j.id, \n jr.id as reservation_id, \n ROW_NUMBER() OVER (PARTITION BY j.id ORDER BY jr.created_at) AS attempt, \n j.priority, \n j.job_type, \n CASE \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/.+' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://[^/]+/', '') \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/?$' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://', '') \n ELSE j.concurrency_group \n END AS concurrency_group, \n j.status AS status, \n j.created_at AS created_at, \n\n\n -- Wait time in seconds\n CASE \n WHEN jr.created_at IS NOT NULL \n THEN EXTRACT(EPOCH FROM (jr.created_at - j.created_at))\n ELSE \n EXTRACT(EPOCH FROM (NOW() - j.created_at))\n END\n AS wait_seconds,\n\n jr.created_at AS started_at, \n\n\n -- Run time in seconds\n CASE \n WHEN jr.created_at IS NOT NULL THEN\n CASE \n WHEN j.finished_at IS NOT NULL \n THEN EXTRACT(EPOCH FROM (j.finished_at - jr.created_at))\n ELSE \n EXTRACT(EPOCH FROM (NOW() - jr.created_at))\n END\n ELSE NULL\n END\n AS run_seconds,\n j.finished_at AS finished_at, \n jr.worker_id\n\nFROM \n jobs j\nLEFT JOIN \n job_reservations jr ON j.id = jr.job_id\nWHERE j.finished_at IS NOT NULL\nORDER BY \n j.finished_at DESC\nLIMIT 500;",
- "refId": "A",
- "sql": {
- "columns": [
- {
- "parameters": [],
- "type": "function"
- }
- ],
- "groupBy": [
- {
- "property": {
- "type": "string"
- },
- "type": "groupBy"
- }
- ],
- "limit": 50
- }
- }
- ],
- "title": "Finished Jobs (limit 500)",
+ "title": "Queued Indexing Jobs",
"type": "table"
}
],
"refresh": "5s",
"schemaVersion": 42,
- "tags": [],
+ "tags": [
+ "workflow:indexing"
+ ],
"templating": {
"list": [
{
@@ -1343,14 +1175,14 @@
"type": "grafana-postgresql-datasource",
"uid": "cef5v5sl9k7i8f"
},
- "definition": "SELECT DISTINCT REPLACE(REPLACE(realm_url, '${realm_server}', ''), 'https://', '') AS __text, realm_url as __value\nFROM boxel_index;",
- "hide": 0,
+ "definition": "SELECT REGEXP_REPLACE(url, '^https?://', '') AS __text, url AS __value FROM realm_registry WHERE kind IN ('bootstrap', 'source') ORDER BY 1;",
+ "hide": 2,
"includeAll": false,
"label": "Realm to Full Index",
"multi": false,
"name": "full_index_realm",
"options": [],
- "query": "SELECT DISTINCT REPLACE(REPLACE(realm_url, '${realm_server}', ''), 'https://', '') AS __text, realm_url as __value\nFROM boxel_index;",
+ "query": "SELECT REGEXP_REPLACE(url, '^https?://', '') AS __text, url AS __value FROM realm_registry WHERE kind IN ('bootstrap', 'source') ORDER BY 1;",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
@@ -1379,7 +1211,7 @@
},
"timepicker": {},
"timezone": "browser",
- "title": "Boxel Jobs",
+ "title": "Indexing",
"weekStart": ""
}
}
diff --git a/tmp/committed-canon.PyiEU3/dashboards/boxel-status/job-queue.json b/tmp/committed-canon.PyiEU3/dashboards/boxel-status/job-queue.json
new file mode 100644
index 0000000..1bab6c0
--- /dev/null
+++ b/tmp/committed-canon.PyiEU3/dashboards/boxel-status/job-queue.json
@@ -0,0 +1,546 @@
+{
+ "apiVersion": "dashboard.grafana.app/v1beta1",
+ "kind": "Dashboard",
+ "metadata": {
+ "annotations": {
+ "grafana.app/folder": "defd2d156sav4d"
+ },
+ "name": "boxeljobqueue1"
+ },
+ "spec": {
+ "annotations": {
+ "list": [
+ {
+ "builtIn": 1,
+ "datasource": {
+ "type": "grafana",
+ "uid": "-- Grafana --"
+ },
+ "enable": true,
+ "hide": true,
+ "iconColor": "rgba(0, 211, 255, 1)",
+ "name": "Annotations & Alerts",
+ "type": "dashboard"
+ }
+ ]
+ },
+ "editable": true,
+ "fiscalYearStartMonth": 0,
+ "graphTooltip": 0,
+ "links": [
+ {
+ "asDropdown": false,
+ "icon": "external link",
+ "includeVars": false,
+ "keepTime": true,
+ "tags": [
+ "workflow:indexing"
+ ],
+ "title": "Indexing",
+ "type": "dashboards"
+ },
+ {
+ "asDropdown": false,
+ "icon": "external link",
+ "includeVars": false,
+ "keepTime": true,
+ "tags": [
+ "forensics"
+ ],
+ "title": "Logs",
+ "type": "dashboards"
+ }
+ ],
+ "panels": [
+ {
+ "datasource": {
+ "type": "grafana-postgresql-datasource",
+ "uid": "cef5v5sl9k7i8f"
+ },
+ "fieldConfig": {
+ "defaults": {
+ "color": {
+ "mode": "thresholds"
+ },
+ "custom": {
+ "align": "left",
+ "cellOptions": {
+ "type": "auto"
+ },
+ "filterable": true,
+ "inspect": false,
+ "minWidth": 150
+ },
+ "mappings": [],
+ "thresholds": {
+ "mode": "absolute",
+ "steps": [
+ {
+ "color": "green"
+ },
+ {
+ "color": "red",
+ "value": 80
+ }
+ ]
+ }
+ },
+ "overrides": [
+ {
+ "matcher": {
+ "id": "byName",
+ "options": "job_id"
+ },
+ "properties": [
+ {
+ "id": "actions",
+ "value": [
+ {
+ "confirmation": "Delete waiting job ${__value.raw}? This marks it as completed without running it.",
+ "fetch": {
+ "body": "",
+ "headers": [
+ [
+ "Authorization",
+ "Bearer ${grafana_secret}"
+ ]
+ ],
+ "method": "POST",
+ "queryParams": [
+ [
+ "job_id",
+ "${__value.raw}"
+ ]
+ ],
+ "url": "${realm_server}_grafana-complete-job"
+ },
+ "oneClick": false,
+ "title": "Delete job ${__value.raw}",
+ "type": "fetch"
+ }
+ ]
+ },
+ {
+ "id": "mappings",
+ "value": [
+ {
+ "options": {
+ "from": 0,
+ "result": {
+ "color": "red",
+ "index": 0,
+ "text": "Delete"
+ },
+ "to": 9999999999999
+ },
+ "type": "range"
+ }
+ ]
+ },
+ {
+ "id": "displayName",
+ "value": "Action"
+ },
+ {
+ "id": "custom.filterable",
+ "value": false
+ }
+ ]
+ }
+ ]
+ },
+ "gridPos": {
+ "h": 9,
+ "w": 24,
+ "x": 0,
+ "y": 0
+ },
+ "id": 2,
+ "options": {
+ "cellHeight": "sm",
+ "footer": {
+ "countRows": false,
+ "enablePagination": false,
+ "fields": "",
+ "reducer": [
+ "sum"
+ ],
+ "show": false
+ },
+ "showHeader": true,
+ "sortBy": []
+ },
+ "pluginVersion": "10.4.1",
+ "targets": [
+ {
+ "datasource": {
+ "type": "grafana-postgresql-datasource",
+ "uid": "cef5v5sl9k7i8f"
+ },
+ "editorMode": "code",
+ "format": "table",
+ "rawQuery": true,
+ "rawSql": "SELECT \n j.id, \n j.priority, \n j.job_type, \n CASE \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/.+' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://[^/]+/', '') \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/?$' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://', '') \n ELSE j.concurrency_group \n END AS concurrency_group, \n j.status AS status, \n j.created_at AS created_at, \n\n\n -- Wait time in seconds\n CASE \n WHEN jr.created_at IS NOT NULL \n THEN EXTRACT(EPOCH FROM (jr.created_at - j.created_at))\n ELSE \n EXTRACT(EPOCH FROM (NOW() - j.created_at))\n END\n AS wait_seconds,\n j.id as job_id\n\nFROM \n jobs j\n \nLEFT JOIN \n job_reservations jr ON j.id = jr.job_id\n\nWHERE\njr.job_id IS NULL AND j.status = 'unfulfilled' \n \nORDER BY \n j.created_at ASC\nLIMIT 500;",
+ "refId": "A",
+ "sql": {
+ "columns": [
+ {
+ "parameters": [],
+ "type": "function"
+ }
+ ],
+ "groupBy": [
+ {
+ "property": {
+ "type": "string"
+ },
+ "type": "groupBy"
+ }
+ ],
+ "limit": 50
+ }
+ }
+ ],
+ "title": "Waiting Jobs",
+ "type": "table"
+ },
+ {
+ "datasource": {
+ "type": "grafana-postgresql-datasource",
+ "uid": "cef5v5sl9k7i8f"
+ },
+ "fieldConfig": {
+ "defaults": {
+ "color": {
+ "mode": "thresholds"
+ },
+ "custom": {
+ "align": "left",
+ "cellOptions": {
+ "type": "auto"
+ },
+ "filterable": true,
+ "inspect": false,
+ "minWidth": 150
+ },
+ "mappings": [],
+ "thresholds": {
+ "mode": "absolute",
+ "steps": [
+ {
+ "color": "green"
+ },
+ {
+ "color": "red",
+ "value": 80
+ }
+ ]
+ }
+ },
+ "overrides": [
+ {
+ "matcher": {
+ "id": "byName",
+ "options": "worker_id"
+ },
+ "properties": [
+ {
+ "id": "links",
+ "value": [
+ {
+ "targetBlank": true,
+ "title": "View logs",
+ "url": "/d/fetquzizsej28b?${__url_time_range}&var-job_id=${__data.fields.id}.${__data.fields.reservation_id}&orgId=1&viewPanel=3"
+ }
+ ]
+ },
+ {
+ "id": "mappings",
+ "value": [
+ {
+ "options": {
+ "pattern": "^(.{6}).*$",
+ "result": {
+ "index": 0,
+ "text": "View logs ($1)"
+ }
+ },
+ "type": "regex"
+ }
+ ]
+ }
+ ]
+ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The navigation backbone for the rationalized dashboard tree. Lands last in the stack because it links to dashboards introduced in earlier PRs.
overview.json(new — tagoverview)Per-service stubs (
service-{realm-server,prerender-server,prerender-manager,worker}.json)Minimal per-service deep-dives tagged
service:<name>. Each has:httpLoggingexit-log lines (--> METHOD ACCEPT URL: STATUS)[indexing-progress]eventsCloudWatch ECS metrics (CPU / memory / RunningTaskCount) are intentionally deferred until cluster + task-family naming is standardized in observability config — these stubs are the natural home to add them.
Deleted
worker-status.json— its sole alertlist panel folds into Overview row 2.Three governing principles
entity:*sideways links that passrealm_url/matrix_user_idto pivot rather than drill back up.Test plan
cd packages/observability && ./scripts/apply.sh🤖 Generated with Claude Code