Skip to content

observability: fix local Loki ingest stalls and Alloy flush latency#4694

Draft
lukemelia wants to merge 1 commit intomainfrom
grafana-stack/01-loki-alloy-infra
Draft

observability: fix local Loki ingest stalls and Alloy flush latency#4694
lukemelia wants to merge 1 commit intomainfrom
grafana-stack/01-loki-alloy-infra

Conversation

@lukemelia
Copy link
Copy Markdown
Contributor

Summary

Two related local-dev observability fixes — symptom was blank Grafana log panels with no errors visible.

  • loki/config.yaml: explicit storage_config.filesystem.directory: /loki/chunks. Without it, Loki 3.x's filesystem object store falls back to mkdir <tenant> against the container's CWD / (unwritable for the loki user), producing mkdir fake: permission denied flush errors. The flush failures cascade — chunks pile in memory → ingester ring marks itself unhealthy → Alloy's writes return 500 empty ring → Alloy's retry budget exhausts → batches silently drop. Verified via the /config endpoint that this is the slot the chunks flusher reads (separate from common.storage.filesystem.chunks_directory, which only feeds a few subsystems and does not propagate to chunks).
  • alloy/config.alloy: batch_wait = "500ms" on loki.write.local. The 1s default was too patient for low-volume host-process streams (a successful incremental indexing job emits only 3 [indexing-progress] lines), leaving batches sitting un-flushed for minutes.

Test plan

  • Bounce local stack: docker compose up -d (or restart observability-loki-1 and observability-alloy-1 if already up)
  • Verify no mkdir fake / permission denied errors: docker logs observability-loki-1 | grep permission
  • Verify /loki/chunks/fake/ gets populated as ingest happens: docker exec observability-loki-1 ls /loki/chunks
  • Probe latency check — should match within ~1s:
    echo "PROBE-\$(date +%s)" >> /tmp/boxel-logs/worker.log; sleep 2
    curl -sG http://localhost:3100/loki/api/v1/query_range \
      --data-urlencode 'query={service="worker"} |= "PROBE-"' \
      --data-urlencode "start=\$(date -u -v-1M '+%Y-%m-%dT%H:%M:%SZ')" \
      --data-urlencode "end=\$(date -u '+%Y-%m-%dT%H:%M:%SZ')" \
      | jq '.data.result | map(.values | length) | add'

🤖 Generated with Claude Code

Two related local-dev observability fixes that surfaced as blank Grafana
log panels with no errors anywhere obvious.

* `loki/config.yaml`: explicitly set `storage_config.filesystem.directory`
  so Loki's chunks store has a writable base. Without it Loki 3.x falls
  back to `mkdir <tenant>` against the container's CWD `/`, which the
  `loki` user cannot write — producing "mkdir fake: permission denied"
  flush errors that cascade into ingester ring marking itself unhealthy
  and Alloy's writes returning 500 "empty ring". Verified via the
  /config endpoint that this is the slot the chunks flusher reads
  (separate from `common.storage.filesystem.chunks_directory`, which
  does not propagate).

* `alloy/config.alloy`: set `loki.write.local.endpoint.batch_wait` to
  500ms (default 1s). An incremental indexing job emits only 3
  `[indexing-progress]` lines, which doesn't reliably trip the default
  flush trigger; tighter batch_wait keeps low-volume host-process
  streams visible in real time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Observability diff (vs staging)

Diff truncated (62400 bytes; limit 60000). Full diff: https://github.com/cardstack/boxel/actions/runs/25474686445

diff --git a/tmp/remote-canon.lmpVWy/dashboards/boxel-status/boxel-jobs.json b/tmp/committed-canon.tc1erx/dashboards/boxel-status/boxel-jobs.json
index 6a40566..607199f 100644
--- a/tmp/remote-canon.lmpVWy/dashboards/boxel-status/boxel-jobs.json
+++ b/tmp/committed-canon.tc1erx/dashboards/boxel-status/boxel-jobs.json
@@ -162,6 +162,479 @@
         "title": "Operator Actions",
         "type": "stat"
       },
+      {
+        "datasource": {
+          "type": "grafana-postgresql-datasource",
+          "uid": "cef5v5sl9k7i8f"
+        },
+        "fieldConfig": {
+          "defaults": {
+            "color": {
+              "mode": "thresholds"
+            },
+            "custom": {
+              "align": "left",
+              "cellOptions": {
+                "type": "auto"
+              },
+              "filterable": true,
+              "inspect": false,
+              "minWidth": 150
+            },
+            "mappings": [],
+            "thresholds": {
+              "mode": "absolute",
+              "steps": [
+                {
+                  "color": "green"
+                },
+                {
+                  "color": "red",
+                  "value": 80
+                }
+              ]
+            }
+          },
+          "overrides": [
+            {
+              "matcher": {
+                "id": "byName",
+                "options": "job_id"
+              },
+              "properties": [
+                {
+                  "id": "actions",
+                  "value": [
+                    {
+                      "confirmation": "Delete waiting job ${__value.raw}? This marks it as completed without running it.",
+                      "fetch": {
+                        "body": "",
+                        "headers": [
+                          [
+                            "Authorization",
+                            "Bearer ${grafana_secret}"
+                          ]
+                        ],
+                        "method": "POST",
+                        "queryParams": [
+                          [
+                            "job_id",
+                            "${__value.raw}"
+                          ]
+                        ],
+                        "url": "${realm_server}_grafana-complete-job"
+                      },
+                      "oneClick": false,
+                      "title": "Delete job ${__value.raw}",
+                      "type": "fetch"
+                    }
+                  ]
+                },
+                {
+                  "id": "mappings",
+                  "value": [
+                    {
+                      "options": {
+                        "from": 0,
+                        "result": {
+                          "color": "red",
+                          "index": 0,
+                          "text": "Delete"
+                        },
+                        "to": 9999999999999
+                      },
+                      "type": "range"
+                    }
+                  ]
+                },
+                {
+                  "id": "displayName",
+                  "value": "Action"
+                },
+                {
+                  "id": "custom.filterable",
+                  "value": false
+                }
+              ]
+            }
+          ]
+        },
+        "gridPos": {
+          "h": 9,
+          "w": 24,
+          "x": 0,
+          "y": 40
+        },
+        "id": 2,
+        "options": {
+          "cellHeight": "sm",
+          "footer": {
+            "countRows": false,
+            "enablePagination": false,
+            "fields": "",
+            "reducer": [
+              "sum"
+            ],
+            "show": false
+          },
+          "showHeader": true,
+          "sortBy": []
+        },
+        "pluginVersion": "10.4.1",
+        "targets": [
+          {
+            "datasource": {
+              "type": "grafana-postgresql-datasource",
+              "uid": "cef5v5sl9k7i8f"
+            },
+            "editorMode": "code",
+            "format": "table",
+            "rawQuery": true,
+            "rawSql": "SELECT \n  j.id, \n  j.priority, \n  j.job_type, \n  CASE \n    WHEN j.concurrency_group ~ '^indexing:https://[^/]+/.+' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://[^/]+/', '') \n    WHEN j.concurrency_group ~ '^indexing:https://[^/]+/?$' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://', '') \n    ELSE j.concurrency_group \n  END AS concurrency_group, \n  j.status AS status, \n  j.created_at AS created_at, \n\n\n  -- Wait time in seconds\n    CASE \n      WHEN jr.created_at IS NOT NULL \n        THEN EXTRACT(EPOCH FROM (jr.created_at - j.created_at))\n      ELSE \n        EXTRACT(EPOCH FROM (NOW() - j.created_at))\n    END\n    AS wait_seconds,\n j.id as job_id\n\nFROM \n  jobs j\n  \nLEFT JOIN \n  job_reservations jr ON j.id = jr.job_id\n\nWHERE\njr.job_id IS NULL AND j.status = 'unfulfilled' \n  \nORDER BY \n  j.created_at ASC\nLIMIT 500;",
+            "refId": "A",
+            "sql": {
+              "columns": [
+                {
+                  "parameters": [],
+                  "type": "function"
+                }
+              ],
+              "groupBy": [
+                {
+                  "property": {
+                    "type": "string"
+                  },
+                  "type": "groupBy"
+                }
+              ],
+              "limit": 50
+            }
+          }
+        ],
+        "title": "Waiting Jobs",
+        "type": "table"
+      },
+      {
+        "datasource": {
+          "type": "grafana-postgresql-datasource",
+          "uid": "cef5v5sl9k7i8f"
+        },
+        "fieldConfig": {
+          "defaults": {
+            "color": {
+              "mode": "thresholds"
+            },
+            "custom": {
+              "align": "left",
+              "cellOptions": {
+                "type": "auto"
+              },
+              "filterable": true,
+              "inspect": false,
+              "minWidth": 150
+            },
+            "mappings": [],
+            "thresholds": {
+              "mode": "absolute",
+              "steps": [
+                {
+                  "color": "green"
+                },
+                {
+                  "color": "red",
+                  "value": 80
+                }
+              ]
+            }
+          },
+          "overrides": [
+            {
+              "matcher": {
+                "id": "byName",
+                "options": "worker_id"
+              },
+              "properties": [
+                {
+                  "id": "links",
+                  "value": [
+                    {
+                      "targetBlank": true,
+                      "title": "View logs",
+                      "url": "/d/fetquzizsej28b?${__url_time_range}&var-job_id=${__data.fields.id}.${__data.fields.reservation_id}&orgId=1&viewPanel=3"
+                    }
+                  ]
+                },
+                {
+                  "id": "mappings",
+                  "value": [
+                    {
+                      "options": {
+                        "pattern": "^(.{6}).*$",
+                        "result": {
+                          "index": 0,
+                          "text": "View logs ($1)"
+                        }
+                      },
+                      "type": "regex"
+                    }
+                  ]
+                }
+              ]
+            },
+            {
+              "matcher": {
+                "id": "byName",
+                "options": "reservation_id"
+              },
+              "properties": [
+                {
+                  "id": "actions",
+                  "value": [
+                    {
+                      "confirmation": "Cancel running reservation ${__value.raw}? The worker will stop processing it.",
+                      "fetch": {
+                        "body": "",
+                        "headers": [
+                          [
+                            "Authorization",
+                            "Bearer ${grafana_secret}"
+                          ]
+                        ],
+                        "method": "POST",
+                        "queryParams": [
+                          [
+                            "reservation_id",
+                            "${__value.raw}"
+                          ]
+                        ],
+                        "url": "${realm_server}_grafana-complete-job"
+                      },
+                      "oneClick": false,
+                      "title": "Delete reservation ${__value.raw}",
+                      "type": "fetch"
+                    }
+                  ]
+                },
+                {
+                  "id": "mappings",
+                  "value": [
+                    {
+                      "options": {
+                        "from": 0,
+                        "result": {
+                          "color": "red",
+                          "index": 0,
+                          "text": "Delete"
+                        },
+                        "to": 9999999999999
+                      },
+                      "type": "range"
+                    }
+                  ]
+                },
+                {
+                  "id": "displayName",
+                  "value": "Action"
+                },
+                {
+                  "id": "custom.filterable",
+                  "value": false
+                }
+              ]
+            }
+          ]
+        },
+        "gridPos": {
+          "h": 11,
+          "w": 24,
+          "x": 0,
+          "y": 49
+        },
+        "id": 1,
+        "options": {
+          "cellHeight": "sm",
+          "footer": {
+            "countRows": false,
+            "enablePagination": false,
+            "fields": "",
+            "reducer": [
+              "sum"
+            ],
+            "show": false
+          },
+          "showHeader": true,
+          "sortBy": []
+        },
+        "pluginVersion": "10.4.1",
+        "targets": [
+          {
+            "datasource": {
+              "type": "grafana-postgresql-datasource",
+              "uid": "cef5v5sl9k7i8f"
+            },
+            "editorMode": "code",
+            "format": "table",
+            "rawQuery": true,
+            "rawSql": "SELECT \n  j.id,\n  COALESCE(jrc.attempt, 0) AS attempt, \n  j.priority, \n  j.job_type, \n  CASE \n    WHEN j.concurrency_group ~ '^indexing:https://[^/]+/.+' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://[^/]+/', '') \n    WHEN j.concurrency_group ~ '^indexing:https://[^/]+/?$' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://', '') \n    ELSE j.concurrency_group \n  END AS concurrency_group, \n  j.status AS status, \n  j.created_at AS created_at, \n\n\n  -- Wait time in seconds\n    CASE \n      WHEN jr.created_at IS NOT NULL \n        THEN EXTRACT(EPOCH FROM (jr.created_at - j.created_at))\n      ELSE \n        EXTRACT(EPOCH FROM (NOW() - j.created_at))\n    END\n    AS wait_seconds,\n\n    jr.created_at AS started_at, \n\n\n  -- Run time in seconds\n    CASE \n      WHEN jr.created_at IS NOT NULL THEN\n        CASE \n          WHEN j.finished_at IS NOT NULL \n            THEN EXTRACT(EPOCH FROM (j.finished_at - jr.created_at))\n          ELSE \n            EXTRACT(EPOCH FROM (NOW() - jr.created_at))\n        END\n      ELSE NULL\n    END\n    AS run_seconds\n, jr.worker_id,\n jr.id as reservation_id \n\nFROM \n  jobs j\nJOIN \n  job_reservations jr ON j.id = jr.job_id AND jr.completed_at IS NULL AND jr.locked_until > NOW()\nLEFT JOIN \n  (SELECT job_id, COUNT(*) AS attempt FROM job_reservations GROUP BY job_id) jrc ON j.id = jrc.job_id\nWHERE j.finished_at IS NULL\nORDER BY \n  jr.created_at DESC\nLIMIT 500;",
+            "refId": "A",
+            "sql": {
+              "columns": [
+                {
+                  "parameters": [],
+                  "type": "function"
+                }
+              ],
+              "groupBy": [
+                {
+                  "property": {
+                    "type": "string"
+                  },
+                  "type": "groupBy"
+                }
+              ],
+              "limit": 50
+            }
+          }
+        ],
+        "title": "Running Jobs",
+        "type": "table"
+      },
+      {
+        "datasource": {
+          "type": "grafana-postgresql-datasource",
+          "uid": "cef5v5sl9k7i8f"
+        },
+        "fieldConfig": {
+          "defaults": {
+            "color": {
+              "mode": "thresholds"
+            },
+            "custom": {
+              "align": "left",
+              "cellOptions": {
+                "type": "auto"
+              },
+              "filterable": true,
+              "inspect": false,
+              "minWidth": 150
+            },
+            "mappings": [],
+            "thresholds": {
+              "mode": "absolute",
+              "steps": [
+                {
+                  "color": "green"
+                },
+                {
+                  "color": "red",
+                  "value": 80
+                }
+              ]
+            }
+          },
+          "overrides": [
+            {
+              "matcher": {
+                "id": "byName",
+                "options": "worker_id"
+              },
+              "properties": [
+                {
+                  "id": "links",
+                  "value": [
+                    {
+                      "targetBlank": true,
+                      "title": "View logs",
+                      "url": "/d/fetquzizsej28b?${__url_time_range}&var-job_id=${__data.fields.id}.${__data.fields.reservation_id}&orgId=1&viewPanel=3\n\n\n"
+                    }
+                  ]
+                },
+                {
+                  "id": "mappings",
+                  "value": [
+                    {
+                      "options": {
+                        "pattern": "^(.{6}).*$",
+                        "result": {
+                          "index": 0,
+                          "text": "View logs ($1)"
+                        }
+                      },
+                      "type": "regex"
+                    }
+                  ]
+                }
+              ]
+            },
+            {
+              "matcher": {
+                "id": "byName",
+                "options": "reservation_id"
+              },
+              "properties": [
+                {
+                  "id": "custom.hidden",
+                  "value": true
+                }
+              ]
+            }
+          ]
+        },
+        "gridPos": {
+          "h": 18,
+          "w": 24,
+          "x": 0,
+          "y": 60
+        },
+        "id": 3,
+        "options": {
+          "cellHeight": "sm",
+          "footer": {
+            "countRows": false,
+            "enablePagination": false,
+            "fields": "",
+            "reducer": [
+              "sum"
+            ],
+            "show": false
+          },
+          "showHeader": true,
+          "sortBy": []
+        },
+        "pluginVersion": "10.4.1",
+        "targets": [
+          {
+            "datasource": {
+              "type": "grafana-postgresql-datasource",
+              "uid": "cef5v5sl9k7i8f"
+            },
+            "editorMode": "code",
+            "format": "table",
+            "rawQuery": true,
+            "rawSql": "SELECT \n  j.id, \n jr.id as reservation_id, \n  ROW_NUMBER() OVER (PARTITION BY j.id ORDER BY jr.created_at) AS attempt, \n  j.priority, \n  j.job_type, \n  CASE \n    WHEN j.concurrency_group ~ '^indexing:https://[^/]+/.+' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://[^/]+/', '') \n    WHEN j.concurrency_group ~ '^indexing:https://[^/]+/?$' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://', '') \n    ELSE j.concurrency_group \n  END AS concurrency_group, \n  j.status AS status, \n  j.created_at AS created_at, \n\n\n  -- Wait time in seconds\n    CASE \n      WHEN jr.created_at IS NOT NULL \n        THEN EXTRACT(EPOCH FROM (jr.created_at - j.created_at))\n      ELSE \n        EXTRACT(EPOCH FROM (NOW() - j.created_at))\n    END\n    AS wait_seconds,\n\n    jr.created_at AS started_at, \n\n\n  -- Run time in seconds\n    CASE \n      WHEN jr.created_at IS NOT NULL THEN\n        CASE \n          WHEN j.finished_at IS NOT NULL \n            THEN EXTRACT(EPOCH FROM (j.finished_at - jr.created_at))\n          ELSE \n            EXTRACT(EPOCH FROM (NOW() - jr.created_at))\n        END\n      ELSE NULL\n    END\n    AS run_seconds,\n    j.finished_at AS finished_at, \n jr.worker_id\n\nFROM \n  jobs j\nLEFT JOIN \n  job_reservations jr ON j.id = jr.job_id\nWHERE j.finished_at IS NOT NULL\nORDER BY \n  j.finished_at DESC\nLIMIT 500;",
+            "refId": "A",
+            "sql": {
+              "columns": [
+                {
+                  "parameters": [],
+                  "type": "function"
+                }
+              ],
+              "groupBy": [
+                {
+                  "property": {
+                    "type": "string"
+                  },
+                  "type": "groupBy"
+                }
+              ],
+              "limit": 50
+            }
+          }
+        ],
+        "title": "Finished Jobs (limit 500)",
+        "type": "table"
+      },
       {
         "datasource": {
           "type": "grafana-postgresql-datasource",
@@ -346,84 +819,10 @@
             "calcs": [
               "lastNotNull"
             ],
-            "fields": "",
-            "values": false
-          },
-          "textMode": "auto"
-        },
-        "pluginVersion": "12.4.3",
-        "targets": [
-          {
-            "datasource": {
-              "type": "grafana-postgresql-datasource",
-              "uid": "cef5v5sl9k7i8f"
-            },
-            "editorMode": "code",
-            "format": "table",
-            "rawQuery": true,
-            "rawSql": "SELECT EXTRACT(EPOCH FROM (NOW() - MIN(j.created_at))) AS oldest_pending_seconds\n  FROM jobs j\n  LEFT JOIN job_reservations jr ON j.id = jr.job_id\n    AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n WHERE j.status = 'unfulfilled'\n   AND j.job_type IN ('from-scratch-index','incremental-index')\n   AND jr.id IS NULL;",
-            "refId": "A"
-          }
-        ],
-        "title": "Oldest Pending",
-        "type": "stat"
-      },
-      {
-        "datasource": {
-          "type": "grafana-postgresql-datasource",
-          "uid": "cef5v5sl9k7i8f"
-        },
-        "description": "Indexing-job flow rates. Three series — `arrived` (jobs queued), `started` (a worker reserved them), `completed` (jobs.finished_at set). Bucketed by 30s. A persistent gap between arrived and completed indicates worker saturation.",
-        "fieldConfig": {
-          "defaults": {
-            "color": {
-              "mode": "palette-classic"
-            },
-            "custom": {
-              "axisGridShow": true,
-              "axisLabel": "jobs / 30s",
-              "axisPlacement": "auto",
-              "drawStyle": "line",
-              "fillOpacity": 10,
-              "gradientMode": "none",
-              "lineInterpolation": "smooth",
-              "lineWidth": 1,
-              "pointSize": 4,
-              "scaleDistribution": {
-                "type": "linear"
-              },
-              "showPoints": "never",
-              "spanNulls": true,
-              "stacking": {
-                "mode": "none"
-              }
-            },
-            "mappings": [],
-            "unit": "short"
-          },
-          "overrides": []
-        },
-        "gridPos": {
-          "h": 8,
-          "w": 24,
-          "x": 0,
-          "y": 8
-        },
-        "id": 14,
-        "options": {
-          "legend": {
-            "calcs": [
-              "mean",
-              "lastNotNull"
-            ],
-            "displayMode": "list",
-            "placement": "bottom",
-            "showLegend": true
+            "fields": "",
+            "values": false
           },
-          "tooltip": {
-            "mode": "multi",
-            "sort": "none"
-          }
+          "textMode": "auto"
         },
         "pluginVersion": "12.4.3",
         "targets": [
@@ -433,43 +832,21 @@
               "uid": "cef5v5sl9k7i8f"
             },
             "editorMode": "code",
-            "format": "time_series",
+            "format": "table",
             "rawQuery": true,
-            "rawSql": "SELECT $__timeGroupAlias(j.created_at, '30s') AS time,\n       COUNT(*) AS arrived\n  FROM jobs j\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n   AND $__timeFilter(j.created_at)\n GROUP BY 1\n ORDER BY 1;",
+            "rawSql": "SELECT EXTRACT(EPOCH FROM (NOW() - MIN(j.created_at))) AS oldest_pending_seconds\n  FROM jobs j\n  LEFT JOIN job_reservations jr ON j.id = jr.job_id\n    AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n WHERE j.status = 'unfulfilled'\n   AND j.job_type IN ('from-scratch-index','incremental-index')\n   AND jr.id IS NULL;",
             "refId": "A"
-          },
-          {
-            "datasource": {
-              "type": "grafana-postgresql-datasource",
-              "uid": "cef5v5sl9k7i8f"
-            },
-            "editorMode": "code",
-            "format": "time_series",
-            "rawQuery": true,
-            "rawSql": "SELECT $__timeGroupAlias(jr.created_at, '30s') AS time,\n       COUNT(*) AS started\n  FROM job_reservations jr\n  JOIN jobs j ON j.id = jr.job_id\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n   AND $__timeFilter(jr.created_at)\n GROUP BY 1\n ORDER BY 1;",
-            "refId": "B"
-          },
-          {
-            "datasource": {
-              "type": "grafana-postgresql-datasource",
-              "uid": "cef5v5sl9k7i8f"
-            },
-            "editorMode": "code",
-            "format": "time_series",
-            "rawQuery": true,
-            "rawSql": "SELECT $__timeGroupAlias(j.finished_at, '30s') AS time,\n       COUNT(*) AS completed\n  FROM jobs j\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n   AND j.finished_at IS NOT NULL\n   AND $__timeFilter(j.finished_at)\n GROUP BY 1\n ORDER BY 1;",
-            "refId": "C"
           }
         ],
-        "title": "Indexing throughput (arrived / started / completed)",
-        "type": "timeseries"
+        "title": "Oldest Pending",
+        "type": "stat"
       },
       {
         "datasource": {
           "type": "grafana-postgresql-datasource",
           "uid": "cef5v5sl9k7i8f"
         },
-        "description": "Stocks of indexing jobs over time. `pending` = queued, no live reservation; `in_flight` = a worker has an open reservation; `completed` = cumulative finished. Bucketed at 1 minute over the panel's time range. Cost: O(buckets × indexing-jobs in window) per refresh — if this gets slow, switch to a snapshot/materialized table.",
+        "description": "Indexing-job flow rates. Three series — `arrived` (jobs queued), `started` (a worker reserved them), `completed` (jobs.finished_at set). Bucketed by 30s. A persistent gap between arrived and completed indicates worker saturation.",
         "fieldConfig": {
           "defaults": {
             "color": {
@@ -477,7 +854,7 @@
             },
             "custom": {
               "axisGridShow": true,
-              "axisLabel": "jobs",
+              "axisLabel": "jobs / 30s",
               "axisPlacement": "auto",
               "drawStyle": "line",
               "fillOpacity": 10,
@@ -503,13 +880,13 @@
           "h": 8,
           "w": 24,
           "x": 0,
-          "y": 16
+          "y": 8
         },
-        "id": 15,
+        "id": 14,
         "options": {
           "legend": {
             "calcs": [
-              "max",
+              "mean",
               "lastNotNull"
             ],
             "displayMode": "list",
@@ -531,316 +908,91 @@
             "editorMode": "code",
             "format": "time_series",
             "rawQuery": true,
-            "rawSql": "WITH buckets AS (\n  SELECT generate_series($__timeFrom()::timestamptz, $__timeTo()::timestamptz, '1 minute') AS bucket\n),\nindexing_jobs AS (\n  SELECT j.id, j.created_at, j.finished_at,\n    (SELECT MIN(jr.created_at) FROM job_reservations jr WHERE jr.job_id = j.id) AS first_started_at\n   FROM jobs j\n   WHERE j.job_type IN ('from-scratch-index','incremental-index')\n     AND j.created_at <= $__timeTo()::timestamptz\n)\nSELECT b.bucket AS time,\n  COUNT(*) FILTER (WHERE ij.created_at <= b.bucket\n    AND (ij.first_started_at IS NULL OR ij.first_started_at > b.bucket)\n    AND (ij.finished_at IS NULL OR ij.finished_at > b.bucket)) AS pending,\n  COUNT(*) FILTER (WHERE ij.first_started_at IS NOT NULL\n    AND ij.first_started_at <= b.bucket\n    AND (ij.finished_at IS NULL OR ij.finished_at > b.bucket)) AS in_flight,\n  COUNT(*) FILTER (WHERE ij.finished_at IS NOT NULL AND ij.finished_at <= b.bucket) AS completed\n FROM buckets b LEFT JOIN indexing_jobs ij ON TRUE\n GROUP BY b.bucket\n ORDER BY b.bucket;",
+            "rawSql": "SELECT $__timeGroupAlias(j.created_at, '30s') AS time,\n       COUNT(*) AS arrived\n  FROM jobs j\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n   AND $__timeFilter(j.created_at)\n GROUP BY 1\n ORDER BY 1;",
             "refId": "A"
-          }
-        ],
-        "title": "Pending vs in-flight vs completed (over time)",
-        "type": "timeseries"
-      },
-      {
-        "datasource": {
-          "type": "grafana-postgresql-datasource",
-          "uid": "cef5v5sl9k7i8f"
-        },
-        "description": "One row per indexing job currently held by a worker (across all realm-server / worker tasks). `progress` and `current files` come from `job_progress` (CS-10930), populated by each realm-server's IndexingEventSink write-through. Click the realm cell to drill into the live activity feed.",
-        "fieldConfig": {
-          "defaults": {
-            "color": {
-              "mode": "thresholds"
-            },
-            "custom": {
-              "align": "left",
-              "cellOptions": {
-                "type": "auto"
-              },
-              "filterable": true,
-              "inspect": false,
-              "minWidth": 100
-            },
-            "mappings": [],
-            "thresholds": {
-              "mode": "absolute",
-              "steps": [
-                {
-                  "color": "green"
-                },
-                {
-                  "color": "red",
-                  "value": 80
-                }
-              ]
-            }
           },
-          "overrides": [
-            {
-              "matcher": {
-                "id": "byName",
-                "options": "percent"
-              },
-              "properties": [
-                {
-                  "id": "custom.cellOptions",
-                  "value": {
-                    "mode": "gradient",
-                    "type": "gauge"
-                  }
-                },
-                {
-                  "id": "unit",
-                  "value": "percent"
-                },
-                {
-                  "id": "min",
-                  "value": 0
-                },
-                {
-                  "id": "max",
-                  "value": 100
-                }
-              ]
-            },
-            {
-              "matcher": {
-                "id": "byName",
-                "options": "elapsed_seconds"
-              },
-              "properties": [
-                {
-                  "id": "unit",
-                  "value": "s"
-                }
-              ]
-            },
-            {
-              "matcher": {
-                "id": "byName",
-                "options": "realm_url"
-              },
-              "properties": [
-                {
-                  "id": "custom.hidden",
-                  "value": true
-                }
-              ]
-            },
-            {
-              "matcher": {
-                "id": "byName",
-                "options": "reservation_id"
-              },
-              "properties": [
-                {
-                  "id": "custom.hidden",
-                  "value": true
-                }
-              ]
-            },
-            {
-              "matcher": {
-                "id": "byName",
-                "options": "realm"
-              },
-              "properties": [
-                {
-                  "id": "links",
-                  "value": [
-                    {
-                      "targetBlank": true,
-                      "title": "View activity feed",
-                      "url": "/d/fetquzizsej28b?${__url_time_range}&var-realm_url=${__data.fields.realm_url:queryparam}&var-job_id=${__data.fields.job_id}.${__data.fields.reservation_id}&orgId=1&viewPanel=11"
-                    }
-                  ]
-                }
-              ]
+          {
+            "datasource": {
+              "type": "grafana-postgresql-datasource",
+              "uid": "cef5v5sl9k7i8f"
             },
-            {
-              "matcher": {
-                "id": "byName",
-                "options": "worker_id"
-              },
-              "properties": [
-                {
-                  "id": "links",
-                  "value": [
-                    {
-                      "targetBlank": true,
-                      "title": "View logs",
-                      "url": "/d/fetquzizsej28b?${__url_time_range}&var-job_id=${__data.fields.job_id}.${__data.fields.reservation_id}&orgId=1&viewPanel=3"
-                    }
-                  ]
-                },
-                {
-                  "id": "mappings",
-                  "value": [
-                    {
-                      "options": {
-                        "pattern": "^(.{6}).*$",
-                        "result": {
-                          "index": 0,
-                          "text": "View logs ($1)"
-                        }
-                      },
-                      "type": "regex"
-                    }
-                  ]
-                }
-              ]
-            }
-          ]
-        },
-        "gridPos": {
-          "h": 10,
-          "w": 24,
-          "x": 0,
-          "y": 24
-        },
-        "id": 16,
-        "options": {
-          "cellHeight": "sm",
-          "footer": {
-            "countRows": false,
-            "enablePagination": false,
-            "fields": "",
-            "reducer": [
-              "sum"
-            ],
-            "show": false
+            "editorMode": "code",
+            "format": "time_series",
+            "rawQuery": true,
+            "rawSql": "SELECT $__timeGroupAlias(jr.created_at, '30s') AS time,\n       COUNT(*) AS started\n  FROM job_reservations jr\n  JOIN jobs j ON j.id = jr.job_id\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n   AND $__timeFilter(jr.created_at)\n GROUP BY 1\n ORDER BY 1;",
+            "refId": "B"
           },
-          "showHeader": true,
-          "sortBy": []
-        },
-        "pluginVersion": "12.4.3",
-        "targets": [
           {
             "datasource": {
               "type": "grafana-postgresql-datasource",
               "uid": "cef5v5sl9k7i8f"
             },
             "editorMode": "code",
-            "format": "table",
+            "format": "time_series",
             "rawQuery": true,
-            "rawSql": "SELECT\n  j.id AS job_id,\n  RTRIM(REGEXP_REPLACE(j.concurrency_group, '^indexing:https?://[^/]+/', ''), '/') AS realm,\n  COALESCE(j.args->>'realmURL','') AS realm_url,\n  j.job_type,\n  COALESCE(jp.files_completed, 0) AS files_completed,\n  COALESCE(jp.total_files, 0) AS total_files,\n  CASE WHEN COALESCE(jp.total_files, 0) > 0\n    THEN (jp.files_completed::float / jp.total_files) * 100\n    ELSE 0\n  END AS percent,\n  EXTRACT(EPOCH FROM (NOW() - jr.created_at)) AS elapsed_seconds,\n  jr.created_at AS started_at,\n  jr.worker_id,\n  jr.id AS reservation_id\n FROM jobs j\n JOIN job_reservations jr ON jr.job_id = j.id\n   AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n LEFT JOIN job_progress jp ON jp.job_id = j.id\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n   AND j.finished_at IS NULL\n ORDER BY jr.created_at DESC;",
-            "refId": "A"
+            "rawSql": "SELECT $__timeGroupAlias(j.finished_at, '30s') AS time,\n       COUNT(*) AS completed\n  FROM jobs j\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n   AND j.finished_at IS NOT NULL\n   AND $__timeFilter(j.finished_at)\n GROUP BY 1\n ORDER BY 1;",
+            "refId": "C"
           }
         ],
-        "title": "Active Indexing",
-        "type": "table"
+        "title": "Indexing throughput (arrived / started / completed)",
+        "type": "timeseries"
       },
       {
         "datasource": {
           "type": "grafana-postgresql-datasource",
           "uid": "cef5v5sl9k7i8f"
         },
-        "description": "Per-realm aggregate of indexing-job state. `oldest_pending_seconds` red after 5 min flags realms whose backlog isn't draining.",
+        "description": "Stocks of indexing jobs over time. `pending` = queued, no live reservation; `in_flight` = a worker has an open reservation; `completed` = cumulative finished. Bucketed at 1 minute over the panel's time range. Cost: O(buckets × indexing-jobs in window) per refresh — if this gets slow, switch to a snapshot/materialized table.",
         "fieldConfig": {
           "defaults": {
             "color": {
-              "mode": "thresholds"
+              "mode": "palette-classic"
             },
-            "custom": {
-              "align": "left",
-              "cellOptions": {
-                "type": "auto"
+            "custom": {
+              "axisGridShow": true,
+              "axisLabel": "jobs",
+              "axisPlacement": "auto",
+              "drawStyle": "line",
+              "fillOpacity": 10,
+              "gradientMode": "none",
+              "lineInterpolation": "smooth",
+              "lineWidth": 1,
+              "pointSize": 4,
+              "scaleDistribution": {
+                "type": "linear"
               },
-              "filterable": true,
-              "inspect": false,
-              "minWidth": 100
+              "showPoints": "never",
+              "spanNulls": true,
+              "stacking": {
+                "mode": "none"
+              }
             },
             "mappings": [],
-            "thresholds": {
-              "mode": "absolute",
-              "steps": [
-                {
-                  "color": "green"
-                }
-              ]
-            }
+            "unit": "short"
           },
-          "overrides": [
-            {
-              "matcher": {
-                "id": "byName",
-                "options": "realm_url"
-              },
-              "properties": [
-                {
-                  "id": "custom.hidden",
-                  "value": true
-                }
-              ]
-            },
-            {
-              "matcher": {
-                "id": "byName",
-                "options": "oldest_pending_seconds"
-              },
-              "properties": [
-                {
-                  "id": "unit",
-                  "value": "s"
-                },
-                {
-                  "id": "thresholds",
-                  "value": {
-                    "mode": "absolute",
-                    "steps": [
-                      {
-                        "color": "green"
-                      },
-                      {
-                        "color": "yellow",
-                        "value": 60
-                      },
-                      {
-                        "color": "red",
-                        "value": 300
-                      }
-                    ]
-                  }
-                }
-              ]
-            },
-            {
-              "matcher": {
-                "id": "byName",
-                "options": "realm"
-              },
-              "properties": [
-                {
-                  "id": "links",
-                  "value": [
-                    {
-                      "targetBlank": true,
-                      "title": "View activity feed",
-                      "url": "/d/fetquzizsej28b?${__url_time_range}&var-realm_url=${__data.fields.realm_url:queryparam}&orgId=1&viewPanel=11"
-                    }
-                  ]
-                }
-              ]
-            }
-          ]
+          "overrides": []
         },
         "gridPos": {
-          "h": 6,
+          "h": 8,
           "w": 24,
           "x": 0,
-          "y": 34
+          "y": 16
         },
-        "id": 17,
+        "id": 15,
         "options": {
-          "cellHeight": "sm",
-          "footer": {
-            "countRows": false,
-            "enablePagination": false,
-            "fields": "",
-            "reducer": [
-              "sum"
+          "legend": {
+            "calcs": [
+              "max",
+              "lastNotNull"
             ],
-            "show": false
+            "displayMode": "list",
+            "placement": "bottom",
+            "showLegend": true
           },
-          "showHeader": true,
-          "sortBy": []
+          "tooltip": {
+            "mode": "multi",
+            "sort": "none"
+          }
         },
         "pluginVersion": "12.4.3",
         "targets": [
@@ -850,20 +1002,21 @@
               "uid": "cef5v5sl9k7i8f"
             },
             "editorMode": "code",
-            "format": "table",
+            "format": "time_series",
             "rawQuery": true,
-            "rawSql": "SELECT\n  RTRIM(REGEXP_REPLACE(COALESCE(j.args->>'realmURL',''), '^https?://[^/]+/', ''), '/') AS realm,\n  COALESCE(j.args->>'realmURL','') AS realm_url,\n  COUNT(*) FILTER (WHERE j.status = 'unfulfilled' AND jr.id IS NULL) AS pending,\n  COUNT(*) FILTER (WHERE j.status = 'unfulfilled' AND jr.id IS NOT NULL) AS in_flight,\n  MAX(j.finished_at) AS last_completed_at,\n  EXTRACT(EPOCH FROM (NOW() - MIN(j.created_at)\n    FILTER (WHERE j.status = 'unfulfilled' AND jr.id IS NULL))) AS oldest_pending_seconds\n FROM jobs j\n LEFT JOIN job_reservations jr ON j.id = jr.job_id\n   AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n GROUP BY j.args->>'realmURL'\n ORDER BY pending DESC, in_flight DESC, last_completed_at DESC NULLS LAST\n LIMIT 200;",
+            "rawSql": "WITH buckets AS (\n  SELECT generate_series($__timeFrom()::timestamptz, $__timeTo()::timestamptz, '1 minute') AS bucket\n),\nindexing_jobs AS (\n  SELECT j.id, j.created_at, j.finished_at,\n    (SELECT MIN(jr.created_at) FROM job_reservations jr WHERE jr.job_id = j.id) AS first_started_at\n   FROM jobs j\n   WHERE j.job_type IN ('from-scratch-index','incremental-index')\n     AND j.created_at <= $__timeTo()::timestamptz\n)\nSELECT b.bucket AS time,\n  COUNT(*) FILTER (WHERE ij.created_at <= b.bucket\n    AND (ij.first_started_at IS NULL OR ij.first_started_at > b.bucket)\n    AND (ij.finished_at IS NULL OR ij.finished_at > b.bucket)) AS pending,\n  COUNT(*) FILTER (WHERE ij.first_started_at IS NOT NULL\n    AND ij.first_started_at <= b.bucket\n    AND (ij.finished_at IS NULL OR ij.finished_at > b.bucket)) AS in_flight,\n  COUNT(*) FILTER (WHERE ij.finished_at IS NOT NULL AND ij.finished_at <= b.bucket) AS completed\n FROM buckets b LEFT JOIN indexing_jobs ij ON TRUE\n GROUP BY b.bucket\n ORDER BY b.bucket;",
             "refId": "A"
           }
         ],
-        "title": "Per-realm indexing status",
-        "type": "table"
+        "title": "Pending vs in-flight vs completed (over time)",
+        "type": "timeseries"
       },
       {
         "datasource": {
           "type": "grafana-postgresql-datasource",
           "uid": "cef5v5sl9k7i8f"
         },
+        "description": "One row per indexing job currently held by a worker (across all realm-server / worker tasks). `progress` and `current files` come from `job_progress` (CS-10930), populated by each realm-server's IndexingEventSink write-through. Click the realm cell to drill into the live activity feed.",
         "fieldConfig": {
           "defaults": {
             "color": {
@@ -876,7 +1029,7 @@
               },
               "filterable": true,
               "inspect": false,
-              "minWidth": 150
+              "minWidth": 100
             },
             "mappings": [],
             "thresholds": {
@@ -896,159 +1049,70 @@
             {
               "matcher": {
                 "id": "byName",
-                "options": "job_id"
+                "options": "percent"
               },
               "properties": [
                 {
-                  "id": "actions",
-                  "value": [
-                    {
-                      "confirmation": "Delete waiting job ${__value.raw}? This marks it as completed without running it.",
-                      "fetch": {
-                        "body": "",
-                        "headers": [
-                          [
-                            "Authorization",
-                            "Bearer ${grafana_secret}"
-                          ]
-                        ],
-                        "method": "POST",
-                        "queryParams": [
-                          [
-                            "job_id",
-                            "${__value.raw}"
-                          ]
-                        ],
-                        "url": "${realm_server}_grafana-complete-job"
-                      },
-                      "oneClick": false,
-                      "title": "Delete job ${__value.raw}",
-                      "type": "fetch"
-                    }
-                  ]
-                },
-                {
-                  "id": "mappings",
-                  "value": [
-                    {
-                      "options": {
-                        "from": 0,
-                        "result": {
-                          "color": "red",
-                          "index": 0,
-                          "text": "Delete"
-                        },
-                        "to": 9999999999999
-                      },
-                      "type": "range"
-                    }
-                  ]
+                  "id": "custom.cellOptions",
+                  "value": {
+                    "mode": "gradient",
+                    "type": "gauge"
+                  }
                 },
                 {
-                  "id": "displayName",
-                  "value": "Action"
+                  "id": "unit",
+                  "value": "percent"
                 },
                 {
-                  "id": "custom.filterable",
-                  "value": false
-                }
-              ]
-            }
-          ]
-        },
-        "gridPos": {
-          "h": 9,
-          "w": 24,
-          "x": 0,
-          "y": 40
-        },
-        "id": 2,
-        "options": {
-          "cellHeight": "sm",
-          "footer": {
-            "countRows": false,
-            "enablePagination": false,
-            "fields": "",
-            "reducer": [
-              "sum"
-            ],
-            "show": false
-          },
-          "showHeader": true,
-          "sortBy": []
-        },
-        "pluginVersion": "10.4.1",
-        "targets": [
-          {
-            "datasource": {
-              "type": "grafana-postgresql-datasource",
-              "uid": "cef5v5sl9k7i8f"
-            },
-            "editorMode": "code",
-            "format": "table",
-            "rawQuery": true,
-            "rawSql": "SELECT \n  j.id, \n  j.priority, \n  j.job_type, \n  CASE \n    WHEN j.concurrency_group ~ '^indexing:https://[^/]+/.+' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://[^/]+/', '') \n    WHEN j.concurrency_group ~ '^indexing:https://[^/]+/?$' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://', '') \n    ELSE j.concurrency_group \n  END AS concurrency_group, \n  j.status AS status, \n  j.created_at AS created_at, \n\n\n  -- Wait time in seconds\n    CASE \n      WHEN jr.created_at IS NOT NULL \n        THEN EXTRACT(EPOCH FROM (jr.created_at - j.created_at))\n      ELSE \n        EXTRACT(EPOCH FROM (NOW() - j.created_at))\n    END\n    AS wait_seconds,\n j.id as job_id\n\nFROM \n  jobs j\n  \nLEFT JOIN \n  job_reservations jr ON j.id = jr.job_id\n\nWHERE\njr.job_id IS NULL AND j.status = 'unfulfilled' \n  \nORDER BY \n  j.created_at ASC\nLIMIT 500;",
-            "refId": "A",
-            "sql": {
-              "columns": [
-                {
-                  "parameters": [],
-                  "type": "function"
-                }
-              ],
-              "groupBy": [
-                {
-                  "property": {
-                    "type": "string"
-                  },
-                  "type": "groupBy"
-                }
-              ],
-              "limit": 50
-            }
-          }
-        ],
-        "title": "Waiting Jobs",
-        "type": "table"
-      },
-      {
-        "datasource": {
-          "type": "grafana-postgresql-datasource",
-          "uid": "cef5v5sl9k7i8f"
-        },
-        "fieldConfig": {
-          "defaults": {
-            "color": {
-              "mode": "thresholds"
+                  "id": "min",
+                  "value": 0
+                },
+                {
+                  "id": "max",
+                  "value": 100
+                }
+              ]
             },
-            "custom": {
-              "align": "left",
-              "cellOptions": {
-                "type": "auto"
+            {
+              "matcher": {
+                "id": "byName",
+                "options": "elapsed_seconds"
               },
-              "filterable": true,
-              "inspect": false,
-              "minWidth": 150
+              "properties": [
+                {
+                  "id": "unit",
+                  "value": "s"
+                }
+              ]
             },
-            "mappings": [],
-            "thresholds": {
-              "mode": "absolute",
-              "steps": [
+            {
+              "matcher": {
+                "id": "byName",
+                "options": "realm_url"
+              },
+              "properties": [
                 {
-                  "color": "green"
-                },
+                  "id": "custom.hidden",
+                  "value": true
+                }
+              ]
+            },
+            {
+              "matcher": {
+                "id": "byName",
+                "options": "reservation_id"
+              },
+              "properties": [
                 {
-                  "color": "red",
-                  "value": 80
+                  "id": "custom.hidden",
+                  "value": true
                 }
               ]
-            }
-          },
-          "overrides": [
+            },
             {
               "matcher": {
                 "id": "byName",
-                "options": "worker_id"
+                "options": "realm"
               },
               "properties": [
                 {
@@ -1056,23 +1120,8 @@
                   "value": [
                     {
                       "targetBlank": true,
-                      "title": "View logs",
-                      "url": "/d/fetquzizsej28b?${__url_time_range}&var-job_id=${__data.fields.id}.${__data.fields.reservation_id}&orgId=1&viewPanel=3"
-                    }
-                  ]
-                },
-                {
-                  "id": "mappings",
-                  "value": [
-                    {
-                      "options": {
-                        "pattern": "^(.{6}).*$",
-                        "result": {
-                          "index": 0,
-                          "text": "View logs ($1)"
-                        }
-                      },
-                      "type": "regex"
+                      "title": "View activity feed",
+                      "url": "/d/fetquzizsej28b?${__url_time_range}&var-realm_url=${__data.fields.realm_url:queryparam}&var-job_id=${__data.fields.job_id}.${__data.fields.reservation_id}&orgId=1&viewPanel=11"
                     }
                   ]
                 }
@@ -1081,34 +1130,16 @@
             {
               "matcher": {
                 "id": "byName",
-                "options": "reservation_id"
+                "options": "worker_id"
               },
               "properties": [
                 {
-                  "id": "actions",
+                  "id": "links",
                   "value": [
                     {
-                      "confirmation": "Cancel running reservation ${__value.raw}? The worker will stop processing it.",
-                      "fetch": {
-                        "body": "",
-                        "headers": [
-                          [
-                            "Authorization",
-                            "Bearer ${grafana_secret}"
-                          ]
-                        ],
-                        "method": "POST",
-                        "queryParams": [
-                          [
-                            "reservation_id",
-                            "${__value.raw}"
-                          ]
-                        ],
-                        "url": "${realm_server}_grafana-complete-job"
-                      },
-                      "oneClick": false,
-                      "title": "Delete reservation ${__value.raw}",
-                      "type": "fetch"
+                      "targetBlank": true,
+                      "title": "View logs",
+                      "url": "/d/fetquzizsej28b?${__url_time_range}&var-job_id=${__data.fields.job_id}.${__data.fields.reservation_id}&orgId=1&viewPanel=3"
                     }
                   ]
                 },
@@ -1117,37 +1148,27 @@
                   "value": [
                     {
                       "options": {
-                        "from": 0,
+                        "pattern": "^(.{6}).*$",
                         "result": {
-                          "color": "red",
                           "index": 0,
-                          "text": "Delete"
-                        },
-                        "to": 9999999999999
+                          "text": "View logs ($1)"
+                        }
                       },
-                      "type": "range"
+                      "type": "regex"
                     }
                   ]
-                },
-                {
-                  "id": "displayName",
-                  "value": "Action"
-                },
-                {
-                  "id": "custom.filterable",
-                  "value": false
                 }
               ]
             }
           ]
         },
         "gridPos": {
-          "h": 11,
+          "h": 10,
           "w": 24,
           "x": 0,
-          "y": 49
+          "y": 24
         },
-        "id": 1,
+        "id": 16,
         "options": {
           "cellHeight": "sm",
           "footer": {
@@ -1162,7 +1183,7 @@
           "showHeader": true,
           "sortBy": []
         },
-        "pluginVersion": "10.4.1",
+        "pluginVersion": "12.4.3",
         "targets": [
           {
             "datasource": {
@@ -1172,28 +1193,11 @@
             "editorMode": "code",
             "format": "table",
             "rawQuery": true,
-            "rawSql": "SELECT \n  j.id,\n  COALESCE(jrc.attempt, 0) AS attempt, \n  j.priority, \n  j.job_type, \n  CASE \n    WHEN j.concurrency_group ~ '^indexing:https://[^/]+/.+' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://[^/]+/', '') \n    WHEN j.concurrency_group ~ '^indexing:https://[^/]+/?$' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://', '') \n    ELSE j.concurrency_group \n  END AS concurrency_group, \n  j.status AS status, \n  j.created_at AS created_at, \n\n\n  -- Wait time in seconds\n    CASE \n      WHEN jr.created_at IS NOT NULL \n        THEN EXTRACT(EPOCH FROM (jr.created_at - j.created_at))\n      ELSE \n        EXTRACT(EPOCH FROM (NOW() - j.created_at))\n    END\n    AS wait_seconds,\n\n    jr.created_at AS started_at, \n\n\n  -- Run time in seconds\n    CASE \n      WHEN jr.created_at IS NOT NULL THEN\n        CASE \n          WHEN j.finished_at IS NOT NULL \n            THEN EXTRACT(EPOCH FROM (j.finished_at - jr.created_at))\n          ELSE \n            EXTRACT(EPOCH FROM (NOW() - jr.created_at))\n        END\n      ELSE NULL\n    END\n    AS run_seconds\n, jr.worker_id,\n jr.id as reservation_id \n\nFROM \n  jobs j\nJOIN \n  job_reservations jr ON j.id = jr.job_id AND jr.completed_at IS NULL AND jr.locked_until > NOW()\nLEFT JOIN \n  (SELECT job_id, COUNT(*) AS attempt FROM job_reservations GROUP BY job_id) jrc ON j.id = jrc.job_id\nWHERE j.finished_at IS NULL\nORDER BY \n  jr.created_at DESC\nLIMIT 500;",
-            "refId": "A",
-            "sql": {
-              "columns": [
-                {
-                  "parameters": [],
-                  "type": "function"
-                }
-              ],
-              "groupBy": [
-                {
-                  "property": {
-                    "type": "string"
-                  },
-                  "type": "groupBy"
-                }
-              ],
-              "limit": 50
-            }
+            "rawSql": "SELECT\n  j.id AS job_id,\n  RTRIM(REGEXP_REPLACE(j.concurrency_group, '^indexing:https?://[^/]+/', ''), '/') AS realm,\n  COALESCE(j.args->>'realmURL','') AS realm_url,\n  j.job_type,\n  COALESCE(jp.files_completed, 0) AS files_completed,\n  COALESCE(jp.total_files, 0) AS total_files,\n  CASE WHEN COALESCE(jp.total_files, 0) > 0\n    THEN (jp.files_completed::float / jp.total_files) * 100\n    ELSE 0\n  END AS percent,\n  EXTRACT(EPOCH FROM (NOW() - jr.created_at)) AS elapsed_seconds,\n  jr.created_at AS started_at,\n  jr.worker_id,\n  jr.id AS reservation_id\n FROM jobs j\n JOIN job_reservations jr ON jr.job_id = j.id\n   AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n LEFT JOIN job_progress jp ON jp.job_id = j.id\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n   AND j.finished_at IS NULL\n ORDER BY jr.created_at DESC;",
+            "refId": "A"
           }
         ],
-        "title": "Running Jobs",
+        "title": "Active Indexing",
         "type": "table"
       },
       {
@@ -1201,6 +1205,7 @@
           "type": "grafana-postgresql-datasource",
           "uid": "cef5v5sl9k7i8f"
         },
+        "description": "Per-realm aggregate of indexing-job state. `oldest_pending_seconds` red after 5 min flags realms whose backlog isn't draining.",
         "fieldConfig": {
           "defaults": {
             "color": {
@@ -1213,7 +1218,7 @@
               },
               "filterable": true,
               "inspect": false,
-              "minWidth": 150
+              "minWidth": 100
             },
             "mappings": [],
             "thresholds": {
@@ -1221,10 +1226,6 @@
               "steps": [
                 {
                   "color": "green"
-                },
-                {
-                  "color": "red",
-                  "value": 80
                 }
               ]
             }
@@ -1233,57 +1234,73 @@
             {
               "matcher": {
                 "id": "byName",
-                "options": "worker_id"
+                "options": "realm_url"
               },
               "properties": [
                 {
-                  "id": "links",
-                  "value": [
-                    {
-                      "targetBlank": true,
-                      "title": "View logs",
-                      "url": "/d/fetquzizsej28b?${__url_time_range}&var-job_id=${__data.fields.id}.${__data.fields.reservation_id}&orgId=1&viewPanel=3\n\n\n"
-                    }
-                  ]
+                  "id": "custom.hidden",
+                  "value": true
+                }
+              ]
+            },
+            {
+              "matcher": {
+                "id": "byName",
+                "options": "oldest_pending_seconds"
+              },
+              "properties": [
+                {
+                  "id": "unit",
+                  "value": "s"
                 },
                 {
-                  "id": "mappings",
-                  "value": [
-                    {
-                      "options": {
-                        "pattern": "^(.{6}).*$",
-                        "result": {
-                          "index": 0,
-                          "text": "View logs ($1)"
-                        }
+                  "id": "thresholds",
+                  "value": {
+                    "mode": "absolute",
+                    "steps": [
+                      {
+                        "color": "green"
                       },
-                      "type": "regex"
-                    }
-                  ]
+                      {
+                        "color": "yellow",
+                        "value": 60
+                      },
+                      {
+                        "color": "red",
+                        "value": 300
+                      }
+                    ]
+                  }
                 }
               ]
             },
             {
               "matcher": {
                 "id": "byName",
-                "options": "reservation_id"
+                "options": "realm"
               },
               "properties": [
                 {
-                  "id": "custom.hidden",
-                  "value": true
+                  "id": "links",
+                  "value": [
+                    {
+                      "targetBlank": true,
+                      "title": "View activity feed",
+                      "url": "/d/fetquzizsej28b?${__url_time_range}&var-realm_url=${__data.fields.realm_url:queryparam}&orgId=1&viewPanel=11"
+                    }
+                  ]
                 }
               ]
             }
           ]
         },
         "gridPos": {
-          "h": 18,
+          "h": 6,
           "w": 24,
           "x": 0,
-          "y": 60
+          "y": 34
         },
-        "id": 3,
+        "id": 17,
         "options": {
           "cellHeight": "sm",
           "footer": {
@@ -1298,7 +1315,7 @@
           "showHeader": true,
           "sortBy": []
         },
-        "pluginVersion": "10.4.1",
+        "pluginVersion": "12.4.3",
         "targets": [
           {
             "datasource": {
@@ -1308,28 +1325,11 @@
             "editorMode": "code",
             "format": "table",
             "rawQuery": true,
-            "rawSql": "SELECT \n  j.id, \n jr.id as reservation_id, \n  ROW_NUMBER() OVER (PARTITION BY j.id ORDER BY jr.created_at) AS attempt, \n  j.priority, \n  j.job_type, \n  CASE \n    WHEN j.concurrency_group ~ '^indexing:https://[^/]+/.+' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://[^/]+/', '') \n    WHEN j.concurrency_group ~ '^indexing:https://[^/]+/?$' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://', '') \n    ELSE j.concurrency_group \n  END AS co

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant