observability: fix local Loki ingest stalls and Alloy flush latency#4694
Draft
observability: fix local Loki ingest stalls and Alloy flush latency#4694
Conversation
Two related local-dev observability fixes that surfaced as blank Grafana log panels with no errors anywhere obvious. * `loki/config.yaml`: explicitly set `storage_config.filesystem.directory` so Loki's chunks store has a writable base. Without it Loki 3.x falls back to `mkdir <tenant>` against the container's CWD `/`, which the `loki` user cannot write — producing "mkdir fake: permission denied" flush errors that cascade into ingester ring marking itself unhealthy and Alloy's writes returning 500 "empty ring". Verified via the /config endpoint that this is the slot the chunks flusher reads (separate from `common.storage.filesystem.chunks_directory`, which does not propagate). * `alloy/config.alloy`: set `loki.write.local.endpoint.batch_wait` to 500ms (default 1s). An incremental indexing job emits only 3 `[indexing-progress]` lines, which doesn't reliably trip the default flush trigger; tighter batch_wait keeps low-volume host-process streams visible in real time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Observability diff (vs staging)Diff truncated (62400 bytes; limit 60000). Full diff: https://github.com/cardstack/boxel/actions/runs/25474686445 diff --git a/tmp/remote-canon.lmpVWy/dashboards/boxel-status/boxel-jobs.json b/tmp/committed-canon.tc1erx/dashboards/boxel-status/boxel-jobs.json
index 6a40566..607199f 100644
--- a/tmp/remote-canon.lmpVWy/dashboards/boxel-status/boxel-jobs.json
+++ b/tmp/committed-canon.tc1erx/dashboards/boxel-status/boxel-jobs.json
@@ -162,6 +162,479 @@
"title": "Operator Actions",
"type": "stat"
},
+ {
+ "datasource": {
+ "type": "grafana-postgresql-datasource",
+ "uid": "cef5v5sl9k7i8f"
+ },
+ "fieldConfig": {
+ "defaults": {
+ "color": {
+ "mode": "thresholds"
+ },
+ "custom": {
+ "align": "left",
+ "cellOptions": {
+ "type": "auto"
+ },
+ "filterable": true,
+ "inspect": false,
+ "minWidth": 150
+ },
+ "mappings": [],
+ "thresholds": {
+ "mode": "absolute",
+ "steps": [
+ {
+ "color": "green"
+ },
+ {
+ "color": "red",
+ "value": 80
+ }
+ ]
+ }
+ },
+ "overrides": [
+ {
+ "matcher": {
+ "id": "byName",
+ "options": "job_id"
+ },
+ "properties": [
+ {
+ "id": "actions",
+ "value": [
+ {
+ "confirmation": "Delete waiting job ${__value.raw}? This marks it as completed without running it.",
+ "fetch": {
+ "body": "",
+ "headers": [
+ [
+ "Authorization",
+ "Bearer ${grafana_secret}"
+ ]
+ ],
+ "method": "POST",
+ "queryParams": [
+ [
+ "job_id",
+ "${__value.raw}"
+ ]
+ ],
+ "url": "${realm_server}_grafana-complete-job"
+ },
+ "oneClick": false,
+ "title": "Delete job ${__value.raw}",
+ "type": "fetch"
+ }
+ ]
+ },
+ {
+ "id": "mappings",
+ "value": [
+ {
+ "options": {
+ "from": 0,
+ "result": {
+ "color": "red",
+ "index": 0,
+ "text": "Delete"
+ },
+ "to": 9999999999999
+ },
+ "type": "range"
+ }
+ ]
+ },
+ {
+ "id": "displayName",
+ "value": "Action"
+ },
+ {
+ "id": "custom.filterable",
+ "value": false
+ }
+ ]
+ }
+ ]
+ },
+ "gridPos": {
+ "h": 9,
+ "w": 24,
+ "x": 0,
+ "y": 40
+ },
+ "id": 2,
+ "options": {
+ "cellHeight": "sm",
+ "footer": {
+ "countRows": false,
+ "enablePagination": false,
+ "fields": "",
+ "reducer": [
+ "sum"
+ ],
+ "show": false
+ },
+ "showHeader": true,
+ "sortBy": []
+ },
+ "pluginVersion": "10.4.1",
+ "targets": [
+ {
+ "datasource": {
+ "type": "grafana-postgresql-datasource",
+ "uid": "cef5v5sl9k7i8f"
+ },
+ "editorMode": "code",
+ "format": "table",
+ "rawQuery": true,
+ "rawSql": "SELECT \n j.id, \n j.priority, \n j.job_type, \n CASE \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/.+' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://[^/]+/', '') \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/?$' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://', '') \n ELSE j.concurrency_group \n END AS concurrency_group, \n j.status AS status, \n j.created_at AS created_at, \n\n\n -- Wait time in seconds\n CASE \n WHEN jr.created_at IS NOT NULL \n THEN EXTRACT(EPOCH FROM (jr.created_at - j.created_at))\n ELSE \n EXTRACT(EPOCH FROM (NOW() - j.created_at))\n END\n AS wait_seconds,\n j.id as job_id\n\nFROM \n jobs j\n \nLEFT JOIN \n job_reservations jr ON j.id = jr.job_id\n\nWHERE\njr.job_id IS NULL AND j.status = 'unfulfilled' \n \nORDER BY \n j.created_at ASC\nLIMIT 500;",
+ "refId": "A",
+ "sql": {
+ "columns": [
+ {
+ "parameters": [],
+ "type": "function"
+ }
+ ],
+ "groupBy": [
+ {
+ "property": {
+ "type": "string"
+ },
+ "type": "groupBy"
+ }
+ ],
+ "limit": 50
+ }
+ }
+ ],
+ "title": "Waiting Jobs",
+ "type": "table"
+ },
+ {
+ "datasource": {
+ "type": "grafana-postgresql-datasource",
+ "uid": "cef5v5sl9k7i8f"
+ },
+ "fieldConfig": {
+ "defaults": {
+ "color": {
+ "mode": "thresholds"
+ },
+ "custom": {
+ "align": "left",
+ "cellOptions": {
+ "type": "auto"
+ },
+ "filterable": true,
+ "inspect": false,
+ "minWidth": 150
+ },
+ "mappings": [],
+ "thresholds": {
+ "mode": "absolute",
+ "steps": [
+ {
+ "color": "green"
+ },
+ {
+ "color": "red",
+ "value": 80
+ }
+ ]
+ }
+ },
+ "overrides": [
+ {
+ "matcher": {
+ "id": "byName",
+ "options": "worker_id"
+ },
+ "properties": [
+ {
+ "id": "links",
+ "value": [
+ {
+ "targetBlank": true,
+ "title": "View logs",
+ "url": "/d/fetquzizsej28b?${__url_time_range}&var-job_id=${__data.fields.id}.${__data.fields.reservation_id}&orgId=1&viewPanel=3"
+ }
+ ]
+ },
+ {
+ "id": "mappings",
+ "value": [
+ {
+ "options": {
+ "pattern": "^(.{6}).*$",
+ "result": {
+ "index": 0,
+ "text": "View logs ($1)"
+ }
+ },
+ "type": "regex"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "matcher": {
+ "id": "byName",
+ "options": "reservation_id"
+ },
+ "properties": [
+ {
+ "id": "actions",
+ "value": [
+ {
+ "confirmation": "Cancel running reservation ${__value.raw}? The worker will stop processing it.",
+ "fetch": {
+ "body": "",
+ "headers": [
+ [
+ "Authorization",
+ "Bearer ${grafana_secret}"
+ ]
+ ],
+ "method": "POST",
+ "queryParams": [
+ [
+ "reservation_id",
+ "${__value.raw}"
+ ]
+ ],
+ "url": "${realm_server}_grafana-complete-job"
+ },
+ "oneClick": false,
+ "title": "Delete reservation ${__value.raw}",
+ "type": "fetch"
+ }
+ ]
+ },
+ {
+ "id": "mappings",
+ "value": [
+ {
+ "options": {
+ "from": 0,
+ "result": {
+ "color": "red",
+ "index": 0,
+ "text": "Delete"
+ },
+ "to": 9999999999999
+ },
+ "type": "range"
+ }
+ ]
+ },
+ {
+ "id": "displayName",
+ "value": "Action"
+ },
+ {
+ "id": "custom.filterable",
+ "value": false
+ }
+ ]
+ }
+ ]
+ },
+ "gridPos": {
+ "h": 11,
+ "w": 24,
+ "x": 0,
+ "y": 49
+ },
+ "id": 1,
+ "options": {
+ "cellHeight": "sm",
+ "footer": {
+ "countRows": false,
+ "enablePagination": false,
+ "fields": "",
+ "reducer": [
+ "sum"
+ ],
+ "show": false
+ },
+ "showHeader": true,
+ "sortBy": []
+ },
+ "pluginVersion": "10.4.1",
+ "targets": [
+ {
+ "datasource": {
+ "type": "grafana-postgresql-datasource",
+ "uid": "cef5v5sl9k7i8f"
+ },
+ "editorMode": "code",
+ "format": "table",
+ "rawQuery": true,
+ "rawSql": "SELECT \n j.id,\n COALESCE(jrc.attempt, 0) AS attempt, \n j.priority, \n j.job_type, \n CASE \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/.+' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://[^/]+/', '') \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/?$' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://', '') \n ELSE j.concurrency_group \n END AS concurrency_group, \n j.status AS status, \n j.created_at AS created_at, \n\n\n -- Wait time in seconds\n CASE \n WHEN jr.created_at IS NOT NULL \n THEN EXTRACT(EPOCH FROM (jr.created_at - j.created_at))\n ELSE \n EXTRACT(EPOCH FROM (NOW() - j.created_at))\n END\n AS wait_seconds,\n\n jr.created_at AS started_at, \n\n\n -- Run time in seconds\n CASE \n WHEN jr.created_at IS NOT NULL THEN\n CASE \n WHEN j.finished_at IS NOT NULL \n THEN EXTRACT(EPOCH FROM (j.finished_at - jr.created_at))\n ELSE \n EXTRACT(EPOCH FROM (NOW() - jr.created_at))\n END\n ELSE NULL\n END\n AS run_seconds\n, jr.worker_id,\n jr.id as reservation_id \n\nFROM \n jobs j\nJOIN \n job_reservations jr ON j.id = jr.job_id AND jr.completed_at IS NULL AND jr.locked_until > NOW()\nLEFT JOIN \n (SELECT job_id, COUNT(*) AS attempt FROM job_reservations GROUP BY job_id) jrc ON j.id = jrc.job_id\nWHERE j.finished_at IS NULL\nORDER BY \n jr.created_at DESC\nLIMIT 500;",
+ "refId": "A",
+ "sql": {
+ "columns": [
+ {
+ "parameters": [],
+ "type": "function"
+ }
+ ],
+ "groupBy": [
+ {
+ "property": {
+ "type": "string"
+ },
+ "type": "groupBy"
+ }
+ ],
+ "limit": 50
+ }
+ }
+ ],
+ "title": "Running Jobs",
+ "type": "table"
+ },
+ {
+ "datasource": {
+ "type": "grafana-postgresql-datasource",
+ "uid": "cef5v5sl9k7i8f"
+ },
+ "fieldConfig": {
+ "defaults": {
+ "color": {
+ "mode": "thresholds"
+ },
+ "custom": {
+ "align": "left",
+ "cellOptions": {
+ "type": "auto"
+ },
+ "filterable": true,
+ "inspect": false,
+ "minWidth": 150
+ },
+ "mappings": [],
+ "thresholds": {
+ "mode": "absolute",
+ "steps": [
+ {
+ "color": "green"
+ },
+ {
+ "color": "red",
+ "value": 80
+ }
+ ]
+ }
+ },
+ "overrides": [
+ {
+ "matcher": {
+ "id": "byName",
+ "options": "worker_id"
+ },
+ "properties": [
+ {
+ "id": "links",
+ "value": [
+ {
+ "targetBlank": true,
+ "title": "View logs",
+ "url": "/d/fetquzizsej28b?${__url_time_range}&var-job_id=${__data.fields.id}.${__data.fields.reservation_id}&orgId=1&viewPanel=3\n\n\n"
+ }
+ ]
+ },
+ {
+ "id": "mappings",
+ "value": [
+ {
+ "options": {
+ "pattern": "^(.{6}).*$",
+ "result": {
+ "index": 0,
+ "text": "View logs ($1)"
+ }
+ },
+ "type": "regex"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "matcher": {
+ "id": "byName",
+ "options": "reservation_id"
+ },
+ "properties": [
+ {
+ "id": "custom.hidden",
+ "value": true
+ }
+ ]
+ }
+ ]
+ },
+ "gridPos": {
+ "h": 18,
+ "w": 24,
+ "x": 0,
+ "y": 60
+ },
+ "id": 3,
+ "options": {
+ "cellHeight": "sm",
+ "footer": {
+ "countRows": false,
+ "enablePagination": false,
+ "fields": "",
+ "reducer": [
+ "sum"
+ ],
+ "show": false
+ },
+ "showHeader": true,
+ "sortBy": []
+ },
+ "pluginVersion": "10.4.1",
+ "targets": [
+ {
+ "datasource": {
+ "type": "grafana-postgresql-datasource",
+ "uid": "cef5v5sl9k7i8f"
+ },
+ "editorMode": "code",
+ "format": "table",
+ "rawQuery": true,
+ "rawSql": "SELECT \n j.id, \n jr.id as reservation_id, \n ROW_NUMBER() OVER (PARTITION BY j.id ORDER BY jr.created_at) AS attempt, \n j.priority, \n j.job_type, \n CASE \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/.+' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://[^/]+/', '') \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/?$' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://', '') \n ELSE j.concurrency_group \n END AS concurrency_group, \n j.status AS status, \n j.created_at AS created_at, \n\n\n -- Wait time in seconds\n CASE \n WHEN jr.created_at IS NOT NULL \n THEN EXTRACT(EPOCH FROM (jr.created_at - j.created_at))\n ELSE \n EXTRACT(EPOCH FROM (NOW() - j.created_at))\n END\n AS wait_seconds,\n\n jr.created_at AS started_at, \n\n\n -- Run time in seconds\n CASE \n WHEN jr.created_at IS NOT NULL THEN\n CASE \n WHEN j.finished_at IS NOT NULL \n THEN EXTRACT(EPOCH FROM (j.finished_at - jr.created_at))\n ELSE \n EXTRACT(EPOCH FROM (NOW() - jr.created_at))\n END\n ELSE NULL\n END\n AS run_seconds,\n j.finished_at AS finished_at, \n jr.worker_id\n\nFROM \n jobs j\nLEFT JOIN \n job_reservations jr ON j.id = jr.job_id\nWHERE j.finished_at IS NOT NULL\nORDER BY \n j.finished_at DESC\nLIMIT 500;",
+ "refId": "A",
+ "sql": {
+ "columns": [
+ {
+ "parameters": [],
+ "type": "function"
+ }
+ ],
+ "groupBy": [
+ {
+ "property": {
+ "type": "string"
+ },
+ "type": "groupBy"
+ }
+ ],
+ "limit": 50
+ }
+ }
+ ],
+ "title": "Finished Jobs (limit 500)",
+ "type": "table"
+ },
{
"datasource": {
"type": "grafana-postgresql-datasource",
@@ -346,84 +819,10 @@
"calcs": [
"lastNotNull"
],
- "fields": "",
- "values": false
- },
- "textMode": "auto"
- },
- "pluginVersion": "12.4.3",
- "targets": [
- {
- "datasource": {
- "type": "grafana-postgresql-datasource",
- "uid": "cef5v5sl9k7i8f"
- },
- "editorMode": "code",
- "format": "table",
- "rawQuery": true,
- "rawSql": "SELECT EXTRACT(EPOCH FROM (NOW() - MIN(j.created_at))) AS oldest_pending_seconds\n FROM jobs j\n LEFT JOIN job_reservations jr ON j.id = jr.job_id\n AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n WHERE j.status = 'unfulfilled'\n AND j.job_type IN ('from-scratch-index','incremental-index')\n AND jr.id IS NULL;",
- "refId": "A"
- }
- ],
- "title": "Oldest Pending",
- "type": "stat"
- },
- {
- "datasource": {
- "type": "grafana-postgresql-datasource",
- "uid": "cef5v5sl9k7i8f"
- },
- "description": "Indexing-job flow rates. Three series — `arrived` (jobs queued), `started` (a worker reserved them), `completed` (jobs.finished_at set). Bucketed by 30s. A persistent gap between arrived and completed indicates worker saturation.",
- "fieldConfig": {
- "defaults": {
- "color": {
- "mode": "palette-classic"
- },
- "custom": {
- "axisGridShow": true,
- "axisLabel": "jobs / 30s",
- "axisPlacement": "auto",
- "drawStyle": "line",
- "fillOpacity": 10,
- "gradientMode": "none",
- "lineInterpolation": "smooth",
- "lineWidth": 1,
- "pointSize": 4,
- "scaleDistribution": {
- "type": "linear"
- },
- "showPoints": "never",
- "spanNulls": true,
- "stacking": {
- "mode": "none"
- }
- },
- "mappings": [],
- "unit": "short"
- },
- "overrides": []
- },
- "gridPos": {
- "h": 8,
- "w": 24,
- "x": 0,
- "y": 8
- },
- "id": 14,
- "options": {
- "legend": {
- "calcs": [
- "mean",
- "lastNotNull"
- ],
- "displayMode": "list",
- "placement": "bottom",
- "showLegend": true
+ "fields": "",
+ "values": false
},
- "tooltip": {
- "mode": "multi",
- "sort": "none"
- }
+ "textMode": "auto"
},
"pluginVersion": "12.4.3",
"targets": [
@@ -433,43 +832,21 @@
"uid": "cef5v5sl9k7i8f"
},
"editorMode": "code",
- "format": "time_series",
+ "format": "table",
"rawQuery": true,
- "rawSql": "SELECT $__timeGroupAlias(j.created_at, '30s') AS time,\n COUNT(*) AS arrived\n FROM jobs j\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n AND $__timeFilter(j.created_at)\n GROUP BY 1\n ORDER BY 1;",
+ "rawSql": "SELECT EXTRACT(EPOCH FROM (NOW() - MIN(j.created_at))) AS oldest_pending_seconds\n FROM jobs j\n LEFT JOIN job_reservations jr ON j.id = jr.job_id\n AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n WHERE j.status = 'unfulfilled'\n AND j.job_type IN ('from-scratch-index','incremental-index')\n AND jr.id IS NULL;",
"refId": "A"
- },
- {
- "datasource": {
- "type": "grafana-postgresql-datasource",
- "uid": "cef5v5sl9k7i8f"
- },
- "editorMode": "code",
- "format": "time_series",
- "rawQuery": true,
- "rawSql": "SELECT $__timeGroupAlias(jr.created_at, '30s') AS time,\n COUNT(*) AS started\n FROM job_reservations jr\n JOIN jobs j ON j.id = jr.job_id\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n AND $__timeFilter(jr.created_at)\n GROUP BY 1\n ORDER BY 1;",
- "refId": "B"
- },
- {
- "datasource": {
- "type": "grafana-postgresql-datasource",
- "uid": "cef5v5sl9k7i8f"
- },
- "editorMode": "code",
- "format": "time_series",
- "rawQuery": true,
- "rawSql": "SELECT $__timeGroupAlias(j.finished_at, '30s') AS time,\n COUNT(*) AS completed\n FROM jobs j\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n AND j.finished_at IS NOT NULL\n AND $__timeFilter(j.finished_at)\n GROUP BY 1\n ORDER BY 1;",
- "refId": "C"
}
],
- "title": "Indexing throughput (arrived / started / completed)",
- "type": "timeseries"
+ "title": "Oldest Pending",
+ "type": "stat"
},
{
"datasource": {
"type": "grafana-postgresql-datasource",
"uid": "cef5v5sl9k7i8f"
},
- "description": "Stocks of indexing jobs over time. `pending` = queued, no live reservation; `in_flight` = a worker has an open reservation; `completed` = cumulative finished. Bucketed at 1 minute over the panel's time range. Cost: O(buckets × indexing-jobs in window) per refresh — if this gets slow, switch to a snapshot/materialized table.",
+ "description": "Indexing-job flow rates. Three series — `arrived` (jobs queued), `started` (a worker reserved them), `completed` (jobs.finished_at set). Bucketed by 30s. A persistent gap between arrived and completed indicates worker saturation.",
"fieldConfig": {
"defaults": {
"color": {
@@ -477,7 +854,7 @@
},
"custom": {
"axisGridShow": true,
- "axisLabel": "jobs",
+ "axisLabel": "jobs / 30s",
"axisPlacement": "auto",
"drawStyle": "line",
"fillOpacity": 10,
@@ -503,13 +880,13 @@
"h": 8,
"w": 24,
"x": 0,
- "y": 16
+ "y": 8
},
- "id": 15,
+ "id": 14,
"options": {
"legend": {
"calcs": [
- "max",
+ "mean",
"lastNotNull"
],
"displayMode": "list",
@@ -531,316 +908,91 @@
"editorMode": "code",
"format": "time_series",
"rawQuery": true,
- "rawSql": "WITH buckets AS (\n SELECT generate_series($__timeFrom()::timestamptz, $__timeTo()::timestamptz, '1 minute') AS bucket\n),\nindexing_jobs AS (\n SELECT j.id, j.created_at, j.finished_at,\n (SELECT MIN(jr.created_at) FROM job_reservations jr WHERE jr.job_id = j.id) AS first_started_at\n FROM jobs j\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n AND j.created_at <= $__timeTo()::timestamptz\n)\nSELECT b.bucket AS time,\n COUNT(*) FILTER (WHERE ij.created_at <= b.bucket\n AND (ij.first_started_at IS NULL OR ij.first_started_at > b.bucket)\n AND (ij.finished_at IS NULL OR ij.finished_at > b.bucket)) AS pending,\n COUNT(*) FILTER (WHERE ij.first_started_at IS NOT NULL\n AND ij.first_started_at <= b.bucket\n AND (ij.finished_at IS NULL OR ij.finished_at > b.bucket)) AS in_flight,\n COUNT(*) FILTER (WHERE ij.finished_at IS NOT NULL AND ij.finished_at <= b.bucket) AS completed\n FROM buckets b LEFT JOIN indexing_jobs ij ON TRUE\n GROUP BY b.bucket\n ORDER BY b.bucket;",
+ "rawSql": "SELECT $__timeGroupAlias(j.created_at, '30s') AS time,\n COUNT(*) AS arrived\n FROM jobs j\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n AND $__timeFilter(j.created_at)\n GROUP BY 1\n ORDER BY 1;",
"refId": "A"
- }
- ],
- "title": "Pending vs in-flight vs completed (over time)",
- "type": "timeseries"
- },
- {
- "datasource": {
- "type": "grafana-postgresql-datasource",
- "uid": "cef5v5sl9k7i8f"
- },
- "description": "One row per indexing job currently held by a worker (across all realm-server / worker tasks). `progress` and `current files` come from `job_progress` (CS-10930), populated by each realm-server's IndexingEventSink write-through. Click the realm cell to drill into the live activity feed.",
- "fieldConfig": {
- "defaults": {
- "color": {
- "mode": "thresholds"
- },
- "custom": {
- "align": "left",
- "cellOptions": {
- "type": "auto"
- },
- "filterable": true,
- "inspect": false,
- "minWidth": 100
- },
- "mappings": [],
- "thresholds": {
- "mode": "absolute",
- "steps": [
- {
- "color": "green"
- },
- {
- "color": "red",
- "value": 80
- }
- ]
- }
},
- "overrides": [
- {
- "matcher": {
- "id": "byName",
- "options": "percent"
- },
- "properties": [
- {
- "id": "custom.cellOptions",
- "value": {
- "mode": "gradient",
- "type": "gauge"
- }
- },
- {
- "id": "unit",
- "value": "percent"
- },
- {
- "id": "min",
- "value": 0
- },
- {
- "id": "max",
- "value": 100
- }
- ]
- },
- {
- "matcher": {
- "id": "byName",
- "options": "elapsed_seconds"
- },
- "properties": [
- {
- "id": "unit",
- "value": "s"
- }
- ]
- },
- {
- "matcher": {
- "id": "byName",
- "options": "realm_url"
- },
- "properties": [
- {
- "id": "custom.hidden",
- "value": true
- }
- ]
- },
- {
- "matcher": {
- "id": "byName",
- "options": "reservation_id"
- },
- "properties": [
- {
- "id": "custom.hidden",
- "value": true
- }
- ]
- },
- {
- "matcher": {
- "id": "byName",
- "options": "realm"
- },
- "properties": [
- {
- "id": "links",
- "value": [
- {
- "targetBlank": true,
- "title": "View activity feed",
- "url": "/d/fetquzizsej28b?${__url_time_range}&var-realm_url=${__data.fields.realm_url:queryparam}&var-job_id=${__data.fields.job_id}.${__data.fields.reservation_id}&orgId=1&viewPanel=11"
- }
- ]
- }
- ]
+ {
+ "datasource": {
+ "type": "grafana-postgresql-datasource",
+ "uid": "cef5v5sl9k7i8f"
},
- {
- "matcher": {
- "id": "byName",
- "options": "worker_id"
- },
- "properties": [
- {
- "id": "links",
- "value": [
- {
- "targetBlank": true,
- "title": "View logs",
- "url": "/d/fetquzizsej28b?${__url_time_range}&var-job_id=${__data.fields.job_id}.${__data.fields.reservation_id}&orgId=1&viewPanel=3"
- }
- ]
- },
- {
- "id": "mappings",
- "value": [
- {
- "options": {
- "pattern": "^(.{6}).*$",
- "result": {
- "index": 0,
- "text": "View logs ($1)"
- }
- },
- "type": "regex"
- }
- ]
- }
- ]
- }
- ]
- },
- "gridPos": {
- "h": 10,
- "w": 24,
- "x": 0,
- "y": 24
- },
- "id": 16,
- "options": {
- "cellHeight": "sm",
- "footer": {
- "countRows": false,
- "enablePagination": false,
- "fields": "",
- "reducer": [
- "sum"
- ],
- "show": false
+ "editorMode": "code",
+ "format": "time_series",
+ "rawQuery": true,
+ "rawSql": "SELECT $__timeGroupAlias(jr.created_at, '30s') AS time,\n COUNT(*) AS started\n FROM job_reservations jr\n JOIN jobs j ON j.id = jr.job_id\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n AND $__timeFilter(jr.created_at)\n GROUP BY 1\n ORDER BY 1;",
+ "refId": "B"
},
- "showHeader": true,
- "sortBy": []
- },
- "pluginVersion": "12.4.3",
- "targets": [
{
"datasource": {
"type": "grafana-postgresql-datasource",
"uid": "cef5v5sl9k7i8f"
},
"editorMode": "code",
- "format": "table",
+ "format": "time_series",
"rawQuery": true,
- "rawSql": "SELECT\n j.id AS job_id,\n RTRIM(REGEXP_REPLACE(j.concurrency_group, '^indexing:https?://[^/]+/', ''), '/') AS realm,\n COALESCE(j.args->>'realmURL','') AS realm_url,\n j.job_type,\n COALESCE(jp.files_completed, 0) AS files_completed,\n COALESCE(jp.total_files, 0) AS total_files,\n CASE WHEN COALESCE(jp.total_files, 0) > 0\n THEN (jp.files_completed::float / jp.total_files) * 100\n ELSE 0\n END AS percent,\n EXTRACT(EPOCH FROM (NOW() - jr.created_at)) AS elapsed_seconds,\n jr.created_at AS started_at,\n jr.worker_id,\n jr.id AS reservation_id\n FROM jobs j\n JOIN job_reservations jr ON jr.job_id = j.id\n AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n LEFT JOIN job_progress jp ON jp.job_id = j.id\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n AND j.finished_at IS NULL\n ORDER BY jr.created_at DESC;",
- "refId": "A"
+ "rawSql": "SELECT $__timeGroupAlias(j.finished_at, '30s') AS time,\n COUNT(*) AS completed\n FROM jobs j\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n AND j.finished_at IS NOT NULL\n AND $__timeFilter(j.finished_at)\n GROUP BY 1\n ORDER BY 1;",
+ "refId": "C"
}
],
- "title": "Active Indexing",
- "type": "table"
+ "title": "Indexing throughput (arrived / started / completed)",
+ "type": "timeseries"
},
{
"datasource": {
"type": "grafana-postgresql-datasource",
"uid": "cef5v5sl9k7i8f"
},
- "description": "Per-realm aggregate of indexing-job state. `oldest_pending_seconds` red after 5 min flags realms whose backlog isn't draining.",
+ "description": "Stocks of indexing jobs over time. `pending` = queued, no live reservation; `in_flight` = a worker has an open reservation; `completed` = cumulative finished. Bucketed at 1 minute over the panel's time range. Cost: O(buckets × indexing-jobs in window) per refresh — if this gets slow, switch to a snapshot/materialized table.",
"fieldConfig": {
"defaults": {
"color": {
- "mode": "thresholds"
+ "mode": "palette-classic"
},
- "custom": {
- "align": "left",
- "cellOptions": {
- "type": "auto"
+ "custom": {
+ "axisGridShow": true,
+ "axisLabel": "jobs",
+ "axisPlacement": "auto",
+ "drawStyle": "line",
+ "fillOpacity": 10,
+ "gradientMode": "none",
+ "lineInterpolation": "smooth",
+ "lineWidth": 1,
+ "pointSize": 4,
+ "scaleDistribution": {
+ "type": "linear"
},
- "filterable": true,
- "inspect": false,
- "minWidth": 100
+ "showPoints": "never",
+ "spanNulls": true,
+ "stacking": {
+ "mode": "none"
+ }
},
"mappings": [],
- "thresholds": {
- "mode": "absolute",
- "steps": [
- {
- "color": "green"
- }
- ]
- }
+ "unit": "short"
},
- "overrides": [
- {
- "matcher": {
- "id": "byName",
- "options": "realm_url"
- },
- "properties": [
- {
- "id": "custom.hidden",
- "value": true
- }
- ]
- },
- {
- "matcher": {
- "id": "byName",
- "options": "oldest_pending_seconds"
- },
- "properties": [
- {
- "id": "unit",
- "value": "s"
- },
- {
- "id": "thresholds",
- "value": {
- "mode": "absolute",
- "steps": [
- {
- "color": "green"
- },
- {
- "color": "yellow",
- "value": 60
- },
- {
- "color": "red",
- "value": 300
- }
- ]
- }
- }
- ]
- },
- {
- "matcher": {
- "id": "byName",
- "options": "realm"
- },
- "properties": [
- {
- "id": "links",
- "value": [
- {
- "targetBlank": true,
- "title": "View activity feed",
- "url": "/d/fetquzizsej28b?${__url_time_range}&var-realm_url=${__data.fields.realm_url:queryparam}&orgId=1&viewPanel=11"
- }
- ]
- }
- ]
- }
- ]
+ "overrides": []
},
"gridPos": {
- "h": 6,
+ "h": 8,
"w": 24,
"x": 0,
- "y": 34
+ "y": 16
},
- "id": 17,
+ "id": 15,
"options": {
- "cellHeight": "sm",
- "footer": {
- "countRows": false,
- "enablePagination": false,
- "fields": "",
- "reducer": [
- "sum"
+ "legend": {
+ "calcs": [
+ "max",
+ "lastNotNull"
],
- "show": false
+ "displayMode": "list",
+ "placement": "bottom",
+ "showLegend": true
},
- "showHeader": true,
- "sortBy": []
+ "tooltip": {
+ "mode": "multi",
+ "sort": "none"
+ }
},
"pluginVersion": "12.4.3",
"targets": [
@@ -850,20 +1002,21 @@
"uid": "cef5v5sl9k7i8f"
},
"editorMode": "code",
- "format": "table",
+ "format": "time_series",
"rawQuery": true,
- "rawSql": "SELECT\n RTRIM(REGEXP_REPLACE(COALESCE(j.args->>'realmURL',''), '^https?://[^/]+/', ''), '/') AS realm,\n COALESCE(j.args->>'realmURL','') AS realm_url,\n COUNT(*) FILTER (WHERE j.status = 'unfulfilled' AND jr.id IS NULL) AS pending,\n COUNT(*) FILTER (WHERE j.status = 'unfulfilled' AND jr.id IS NOT NULL) AS in_flight,\n MAX(j.finished_at) AS last_completed_at,\n EXTRACT(EPOCH FROM (NOW() - MIN(j.created_at)\n FILTER (WHERE j.status = 'unfulfilled' AND jr.id IS NULL))) AS oldest_pending_seconds\n FROM jobs j\n LEFT JOIN job_reservations jr ON j.id = jr.job_id\n AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n GROUP BY j.args->>'realmURL'\n ORDER BY pending DESC, in_flight DESC, last_completed_at DESC NULLS LAST\n LIMIT 200;",
+ "rawSql": "WITH buckets AS (\n SELECT generate_series($__timeFrom()::timestamptz, $__timeTo()::timestamptz, '1 minute') AS bucket\n),\nindexing_jobs AS (\n SELECT j.id, j.created_at, j.finished_at,\n (SELECT MIN(jr.created_at) FROM job_reservations jr WHERE jr.job_id = j.id) AS first_started_at\n FROM jobs j\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n AND j.created_at <= $__timeTo()::timestamptz\n)\nSELECT b.bucket AS time,\n COUNT(*) FILTER (WHERE ij.created_at <= b.bucket\n AND (ij.first_started_at IS NULL OR ij.first_started_at > b.bucket)\n AND (ij.finished_at IS NULL OR ij.finished_at > b.bucket)) AS pending,\n COUNT(*) FILTER (WHERE ij.first_started_at IS NOT NULL\n AND ij.first_started_at <= b.bucket\n AND (ij.finished_at IS NULL OR ij.finished_at > b.bucket)) AS in_flight,\n COUNT(*) FILTER (WHERE ij.finished_at IS NOT NULL AND ij.finished_at <= b.bucket) AS completed\n FROM buckets b LEFT JOIN indexing_jobs ij ON TRUE\n GROUP BY b.bucket\n ORDER BY b.bucket;",
"refId": "A"
}
],
- "title": "Per-realm indexing status",
- "type": "table"
+ "title": "Pending vs in-flight vs completed (over time)",
+ "type": "timeseries"
},
{
"datasource": {
"type": "grafana-postgresql-datasource",
"uid": "cef5v5sl9k7i8f"
},
+ "description": "One row per indexing job currently held by a worker (across all realm-server / worker tasks). `progress` and `current files` come from `job_progress` (CS-10930), populated by each realm-server's IndexingEventSink write-through. Click the realm cell to drill into the live activity feed.",
"fieldConfig": {
"defaults": {
"color": {
@@ -876,7 +1029,7 @@
},
"filterable": true,
"inspect": false,
- "minWidth": 150
+ "minWidth": 100
},
"mappings": [],
"thresholds": {
@@ -896,159 +1049,70 @@
{
"matcher": {
"id": "byName",
- "options": "job_id"
+ "options": "percent"
},
"properties": [
{
- "id": "actions",
- "value": [
- {
- "confirmation": "Delete waiting job ${__value.raw}? This marks it as completed without running it.",
- "fetch": {
- "body": "",
- "headers": [
- [
- "Authorization",
- "Bearer ${grafana_secret}"
- ]
- ],
- "method": "POST",
- "queryParams": [
- [
- "job_id",
- "${__value.raw}"
- ]
- ],
- "url": "${realm_server}_grafana-complete-job"
- },
- "oneClick": false,
- "title": "Delete job ${__value.raw}",
- "type": "fetch"
- }
- ]
- },
- {
- "id": "mappings",
- "value": [
- {
- "options": {
- "from": 0,
- "result": {
- "color": "red",
- "index": 0,
- "text": "Delete"
- },
- "to": 9999999999999
- },
- "type": "range"
- }
- ]
+ "id": "custom.cellOptions",
+ "value": {
+ "mode": "gradient",
+ "type": "gauge"
+ }
},
{
- "id": "displayName",
- "value": "Action"
+ "id": "unit",
+ "value": "percent"
},
{
- "id": "custom.filterable",
- "value": false
- }
- ]
- }
- ]
- },
- "gridPos": {
- "h": 9,
- "w": 24,
- "x": 0,
- "y": 40
- },
- "id": 2,
- "options": {
- "cellHeight": "sm",
- "footer": {
- "countRows": false,
- "enablePagination": false,
- "fields": "",
- "reducer": [
- "sum"
- ],
- "show": false
- },
- "showHeader": true,
- "sortBy": []
- },
- "pluginVersion": "10.4.1",
- "targets": [
- {
- "datasource": {
- "type": "grafana-postgresql-datasource",
- "uid": "cef5v5sl9k7i8f"
- },
- "editorMode": "code",
- "format": "table",
- "rawQuery": true,
- "rawSql": "SELECT \n j.id, \n j.priority, \n j.job_type, \n CASE \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/.+' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://[^/]+/', '') \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/?$' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://', '') \n ELSE j.concurrency_group \n END AS concurrency_group, \n j.status AS status, \n j.created_at AS created_at, \n\n\n -- Wait time in seconds\n CASE \n WHEN jr.created_at IS NOT NULL \n THEN EXTRACT(EPOCH FROM (jr.created_at - j.created_at))\n ELSE \n EXTRACT(EPOCH FROM (NOW() - j.created_at))\n END\n AS wait_seconds,\n j.id as job_id\n\nFROM \n jobs j\n \nLEFT JOIN \n job_reservations jr ON j.id = jr.job_id\n\nWHERE\njr.job_id IS NULL AND j.status = 'unfulfilled' \n \nORDER BY \n j.created_at ASC\nLIMIT 500;",
- "refId": "A",
- "sql": {
- "columns": [
- {
- "parameters": [],
- "type": "function"
- }
- ],
- "groupBy": [
- {
- "property": {
- "type": "string"
- },
- "type": "groupBy"
- }
- ],
- "limit": 50
- }
- }
- ],
- "title": "Waiting Jobs",
- "type": "table"
- },
- {
- "datasource": {
- "type": "grafana-postgresql-datasource",
- "uid": "cef5v5sl9k7i8f"
- },
- "fieldConfig": {
- "defaults": {
- "color": {
- "mode": "thresholds"
+ "id": "min",
+ "value": 0
+ },
+ {
+ "id": "max",
+ "value": 100
+ }
+ ]
},
- "custom": {
- "align": "left",
- "cellOptions": {
- "type": "auto"
+ {
+ "matcher": {
+ "id": "byName",
+ "options": "elapsed_seconds"
},
- "filterable": true,
- "inspect": false,
- "minWidth": 150
+ "properties": [
+ {
+ "id": "unit",
+ "value": "s"
+ }
+ ]
},
- "mappings": [],
- "thresholds": {
- "mode": "absolute",
- "steps": [
+ {
+ "matcher": {
+ "id": "byName",
+ "options": "realm_url"
+ },
+ "properties": [
{
- "color": "green"
- },
+ "id": "custom.hidden",
+ "value": true
+ }
+ ]
+ },
+ {
+ "matcher": {
+ "id": "byName",
+ "options": "reservation_id"
+ },
+ "properties": [
{
- "color": "red",
- "value": 80
+ "id": "custom.hidden",
+ "value": true
}
]
- }
- },
- "overrides": [
+ },
{
"matcher": {
"id": "byName",
- "options": "worker_id"
+ "options": "realm"
},
"properties": [
{
@@ -1056,23 +1120,8 @@
"value": [
{
"targetBlank": true,
- "title": "View logs",
- "url": "/d/fetquzizsej28b?${__url_time_range}&var-job_id=${__data.fields.id}.${__data.fields.reservation_id}&orgId=1&viewPanel=3"
- }
- ]
- },
- {
- "id": "mappings",
- "value": [
- {
- "options": {
- "pattern": "^(.{6}).*$",
- "result": {
- "index": 0,
- "text": "View logs ($1)"
- }
- },
- "type": "regex"
+ "title": "View activity feed",
+ "url": "/d/fetquzizsej28b?${__url_time_range}&var-realm_url=${__data.fields.realm_url:queryparam}&var-job_id=${__data.fields.job_id}.${__data.fields.reservation_id}&orgId=1&viewPanel=11"
}
]
}
@@ -1081,34 +1130,16 @@
{
"matcher": {
"id": "byName",
- "options": "reservation_id"
+ "options": "worker_id"
},
"properties": [
{
- "id": "actions",
+ "id": "links",
"value": [
{
- "confirmation": "Cancel running reservation ${__value.raw}? The worker will stop processing it.",
- "fetch": {
- "body": "",
- "headers": [
- [
- "Authorization",
- "Bearer ${grafana_secret}"
- ]
- ],
- "method": "POST",
- "queryParams": [
- [
- "reservation_id",
- "${__value.raw}"
- ]
- ],
- "url": "${realm_server}_grafana-complete-job"
- },
- "oneClick": false,
- "title": "Delete reservation ${__value.raw}",
- "type": "fetch"
+ "targetBlank": true,
+ "title": "View logs",
+ "url": "/d/fetquzizsej28b?${__url_time_range}&var-job_id=${__data.fields.job_id}.${__data.fields.reservation_id}&orgId=1&viewPanel=3"
}
]
},
@@ -1117,37 +1148,27 @@
"value": [
{
"options": {
- "from": 0,
+ "pattern": "^(.{6}).*$",
"result": {
- "color": "red",
"index": 0,
- "text": "Delete"
- },
- "to": 9999999999999
+ "text": "View logs ($1)"
+ }
},
- "type": "range"
+ "type": "regex"
}
]
- },
- {
- "id": "displayName",
- "value": "Action"
- },
- {
- "id": "custom.filterable",
- "value": false
}
]
}
]
},
"gridPos": {
- "h": 11,
+ "h": 10,
"w": 24,
"x": 0,
- "y": 49
+ "y": 24
},
- "id": 1,
+ "id": 16,
"options": {
"cellHeight": "sm",
"footer": {
@@ -1162,7 +1183,7 @@
"showHeader": true,
"sortBy": []
},
- "pluginVersion": "10.4.1",
+ "pluginVersion": "12.4.3",
"targets": [
{
"datasource": {
@@ -1172,28 +1193,11 @@
"editorMode": "code",
"format": "table",
"rawQuery": true,
- "rawSql": "SELECT \n j.id,\n COALESCE(jrc.attempt, 0) AS attempt, \n j.priority, \n j.job_type, \n CASE \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/.+' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://[^/]+/', '') \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/?$' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://', '') \n ELSE j.concurrency_group \n END AS concurrency_group, \n j.status AS status, \n j.created_at AS created_at, \n\n\n -- Wait time in seconds\n CASE \n WHEN jr.created_at IS NOT NULL \n THEN EXTRACT(EPOCH FROM (jr.created_at - j.created_at))\n ELSE \n EXTRACT(EPOCH FROM (NOW() - j.created_at))\n END\n AS wait_seconds,\n\n jr.created_at AS started_at, \n\n\n -- Run time in seconds\n CASE \n WHEN jr.created_at IS NOT NULL THEN\n CASE \n WHEN j.finished_at IS NOT NULL \n THEN EXTRACT(EPOCH FROM (j.finished_at - jr.created_at))\n ELSE \n EXTRACT(EPOCH FROM (NOW() - jr.created_at))\n END\n ELSE NULL\n END\n AS run_seconds\n, jr.worker_id,\n jr.id as reservation_id \n\nFROM \n jobs j\nJOIN \n job_reservations jr ON j.id = jr.job_id AND jr.completed_at IS NULL AND jr.locked_until > NOW()\nLEFT JOIN \n (SELECT job_id, COUNT(*) AS attempt FROM job_reservations GROUP BY job_id) jrc ON j.id = jrc.job_id\nWHERE j.finished_at IS NULL\nORDER BY \n jr.created_at DESC\nLIMIT 500;",
- "refId": "A",
- "sql": {
- "columns": [
- {
- "parameters": [],
- "type": "function"
- }
- ],
- "groupBy": [
- {
- "property": {
- "type": "string"
- },
- "type": "groupBy"
- }
- ],
- "limit": 50
- }
+ "rawSql": "SELECT\n j.id AS job_id,\n RTRIM(REGEXP_REPLACE(j.concurrency_group, '^indexing:https?://[^/]+/', ''), '/') AS realm,\n COALESCE(j.args->>'realmURL','') AS realm_url,\n j.job_type,\n COALESCE(jp.files_completed, 0) AS files_completed,\n COALESCE(jp.total_files, 0) AS total_files,\n CASE WHEN COALESCE(jp.total_files, 0) > 0\n THEN (jp.files_completed::float / jp.total_files) * 100\n ELSE 0\n END AS percent,\n EXTRACT(EPOCH FROM (NOW() - jr.created_at)) AS elapsed_seconds,\n jr.created_at AS started_at,\n jr.worker_id,\n jr.id AS reservation_id\n FROM jobs j\n JOIN job_reservations jr ON jr.job_id = j.id\n AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n LEFT JOIN job_progress jp ON jp.job_id = j.id\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n AND j.finished_at IS NULL\n ORDER BY jr.created_at DESC;",
+ "refId": "A"
}
],
- "title": "Running Jobs",
+ "title": "Active Indexing",
"type": "table"
},
{
@@ -1201,6 +1205,7 @@
"type": "grafana-postgresql-datasource",
"uid": "cef5v5sl9k7i8f"
},
+ "description": "Per-realm aggregate of indexing-job state. `oldest_pending_seconds` red after 5 min flags realms whose backlog isn't draining.",
"fieldConfig": {
"defaults": {
"color": {
@@ -1213,7 +1218,7 @@
},
"filterable": true,
"inspect": false,
- "minWidth": 150
+ "minWidth": 100
},
"mappings": [],
"thresholds": {
@@ -1221,10 +1226,6 @@
"steps": [
{
"color": "green"
- },
- {
- "color": "red",
- "value": 80
}
]
}
@@ -1233,57 +1234,73 @@
{
"matcher": {
"id": "byName",
- "options": "worker_id"
+ "options": "realm_url"
},
"properties": [
{
- "id": "links",
- "value": [
- {
- "targetBlank": true,
- "title": "View logs",
- "url": "/d/fetquzizsej28b?${__url_time_range}&var-job_id=${__data.fields.id}.${__data.fields.reservation_id}&orgId=1&viewPanel=3\n\n\n"
- }
- ]
+ "id": "custom.hidden",
+ "value": true
+ }
+ ]
+ },
+ {
+ "matcher": {
+ "id": "byName",
+ "options": "oldest_pending_seconds"
+ },
+ "properties": [
+ {
+ "id": "unit",
+ "value": "s"
},
{
- "id": "mappings",
- "value": [
- {
- "options": {
- "pattern": "^(.{6}).*$",
- "result": {
- "index": 0,
- "text": "View logs ($1)"
- }
+ "id": "thresholds",
+ "value": {
+ "mode": "absolute",
+ "steps": [
+ {
+ "color": "green"
},
- "type": "regex"
- }
- ]
+ {
+ "color": "yellow",
+ "value": 60
+ },
+ {
+ "color": "red",
+ "value": 300
+ }
+ ]
+ }
}
]
},
{
"matcher": {
"id": "byName",
- "options": "reservation_id"
+ "options": "realm"
},
"properties": [
{
- "id": "custom.hidden",
- "value": true
+ "id": "links",
+ "value": [
+ {
+ "targetBlank": true,
+ "title": "View activity feed",
+ "url": "/d/fetquzizsej28b?${__url_time_range}&var-realm_url=${__data.fields.realm_url:queryparam}&orgId=1&viewPanel=11"
+ }
+ ]
}
]
}
]
},
"gridPos": {
- "h": 18,
+ "h": 6,
"w": 24,
"x": 0,
- "y": 60
+ "y": 34
},
- "id": 3,
+ "id": 17,
"options": {
"cellHeight": "sm",
"footer": {
@@ -1298,7 +1315,7 @@
"showHeader": true,
"sortBy": []
},
- "pluginVersion": "10.4.1",
+ "pluginVersion": "12.4.3",
"targets": [
{
"datasource": {
@@ -1308,28 +1325,11 @@
"editorMode": "code",
"format": "table",
"rawQuery": true,
- "rawSql": "SELECT \n j.id, \n jr.id as reservation_id, \n ROW_NUMBER() OVER (PARTITION BY j.id ORDER BY jr.created_at) AS attempt, \n j.priority, \n j.job_type, \n CASE \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/.+' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://[^/]+/', '') \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/?$' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://', '') \n ELSE j.concurrency_group \n END AS co |
This was referenced May 7, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related local-dev observability fixes — symptom was blank Grafana log panels with no errors visible.
loki/config.yaml: explicitstorage_config.filesystem.directory: /loki/chunks. Without it, Loki 3.x's filesystem object store falls back tomkdir <tenant>against the container's CWD/(unwritable for thelokiuser), producingmkdir fake: permission deniedflush errors. The flush failures cascade — chunks pile in memory → ingester ring marks itself unhealthy → Alloy's writes return 500empty ring→ Alloy's retry budget exhausts → batches silently drop. Verified via the/configendpoint that this is the slot the chunks flusher reads (separate fromcommon.storage.filesystem.chunks_directory, which only feeds a few subsystems and does not propagate to chunks).alloy/config.alloy:batch_wait = "500ms"onloki.write.local. The 1s default was too patient for low-volume host-process streams (a successful incremental indexing job emits only 3[indexing-progress]lines), leaving batches sitting un-flushed for minutes.Test plan
docker compose up -d(or restartobservability-loki-1andobservability-alloy-1if already up)mkdir fake/permission deniederrors:docker logs observability-loki-1 | grep permission/loki/chunks/fake/gets populated as ingest happens:docker exec observability-loki-1 ls /loki/chunks🤖 Generated with Claude Code