Add /metrics-viz page for browsing Prometheus metrics#34968
Add /metrics-viz page for browsing Prometheus metrics#34968antiguru wants to merge 6 commits intoMaterializeInc:mainfrom
Conversation
f3494e4 to
8fb5b03
Compare
SangJunBak
left a comment
There was a problem hiding this comment.
Reviewed the first commit
| .route( | ||
| "/metrics-viz", | ||
| routing::get(metrics_viz::handle_metrics_viz), | ||
| ) |
There was a problem hiding this comment.
Realizing that these interactive JS pages are going to not work if password authentication is enabled. This was already an issue, but might be worth documenting this somewhere
| async function query(sql) { | ||
| const response = await fetch('/api/sql', { | ||
| method: 'POST', | ||
| body: JSON.stringify({ query: sql }), | ||
| headers: { 'Content-Type': 'application/json' }, | ||
| }); | ||
| if (!response.ok) { | ||
| const text = await response.text(); | ||
| throw `request failed: ${response.status} ${response.statusText}: ${text}`; | ||
| } | ||
| const data = await response.json(); | ||
| for (const result of data.results) { | ||
| if (result.error) { | ||
| throw `SQL error: ${result.error.message}`; | ||
| } | ||
| } | ||
| return data; |
There was a problem hiding this comment.
Seeing this repeated in Memory.jsx. and hierarchical-memory.jsx. We should probably extract this into like a top level fetch.js
| } | ||
|
|
||
| async function discoverEndpoints() { | ||
| const endpoints = [{ label: 'environmentd', url: '/metrics' }]; |
There was a problem hiding this comment.
Note that it is possible for the /metrics endpoint to be on a different port than these pages! I think this is in the same vein as the password comment, i.e. this only works with the src/materialized/ci/listener_configs/no_auth.json listener config
| JOIN mz_catalog.mz_clusters c ON c.id = r.cluster_id | ||
| ORDER BY c.name, r.name | ||
| `); | ||
| const rows = data.results[0].rows; |
There was a problem hiding this comment.
results can be empty. Maybe data.results[0]?.rows ?? [];?
| ); | ||
| } | ||
|
|
||
| function CollapsibleScalarTable({ family }) { |
There was a problem hiding this comment.
Maybe we just group this and RawData into an Expandable component?
function Expandable({ children, type, numValues }) {
const [expanded, setExpanded] = useState(false);
return (
<div>
<button style={styles.rawToggle} onClick={() => setExpanded(!expanded)}>
{expanded ? `hide ${type}` : `show ${type}`} ({numValues} values)
</button>
{expanded && children}
</div>
);
}
...
<Expandable type="table" numValues={family.series.length}>
<ScalarTable family={family} />
</Expandable>
| placeholder="Search metrics..." | ||
| value={search} | ||
| onChange={e => setSearch(e.target.value)} | ||
| className="search-box" |
There was a problem hiding this comment.
Weird that we're using a classname
| * @param {string} text - Raw Prometheus metrics text | ||
| * @returns {{ families: Map<string, MetricFamily>, groups: Map<string, string[]> }} | ||
| */ | ||
| function parsePrometheusText(text) { |
There was a problem hiding this comment.
We should write some tests for each of these functions, if anything just for documentation
| if (spaceIdx === -1) continue; | ||
| const name = rest.slice(0, spaceIdx); | ||
| const help = rest.slice(spaceIdx + 1); | ||
| ensureFamily(families, name).help = help; |
There was a problem hiding this comment.
Weird how we're mutating in this function, then mutating right after. Maybe rename "ensureFamily" to "updateFamily" and have help and type as optional variables
| } | ||
| } | ||
|
|
||
| function buildScalarSeries(family) { |
There was a problem hiding this comment.
nit: Maybe make this function immutable then do the mutation in the caller?
| const labels = {}; | ||
| if (!str) return labels; | ||
| let i = 0; | ||
| while (i < str.length) { | ||
| while (i < str.length && (str[i] === ',' || str[i] === ' ')) i++; | ||
| if (i >= str.length) break; | ||
| const eqIdx = str.indexOf('=', i); | ||
| if (eqIdx === -1) break; | ||
| const key = str.slice(i, eqIdx); | ||
| if (str[eqIdx + 1] !== '"') break; | ||
| let j = eqIdx + 2; | ||
| let value = ''; | ||
| while (j < str.length) { | ||
| if (str[j] === '\\' && j + 1 < str.length) { | ||
| value += str[j + 1]; | ||
| j += 2; | ||
| } else if (str[j] === '"') { | ||
| break; | ||
| } else { | ||
| value += str[j]; | ||
| j++; | ||
| } | ||
| } | ||
| labels[key] = value; | ||
| i = j + 1; | ||
| } |
8fb5b03 to
10ccbb0
Compare
SangJunBak
left a comment
There was a problem hiding this comment.
Overall code looks correct! Would like to see my comments addressed, but none of them need to block this from merging.
Some general things I noticed that would help in maintainability but aren't blocking:
- It's not easy to look at each function and predict what the input is going to be. Static type inference via js-docs could help with this for just the function. If Claude can one-shot it, might be worth doing
- A lot of the "aggregation" / math functions are basically just doing a lot of
countBy,sumBy,flatMap, andgroupByoperations. A lot of these functions can become a lot more readable if we implemented it in a more declarative way.
lodash has an implementation for each of these by functions, but they shouldn't be hard to implement either
|
|
||
| // --- Copy to clipboard --- | ||
|
|
||
| function CopyButton({ getText, style: extraStyle }) { |
There was a problem hiding this comment.
Noticing extraStyle isn't being used
| } | ||
| const group = groups.get(key); | ||
| for (const b of series.deCumulatedBuckets) { | ||
| const leKey = b.le === Infinity ? '+Inf' : String(b.le); |
There was a problem hiding this comment.
Seeing a couple of times where we do this cast of '+inf' <-> Infinity. Might be simpler to just do String(b.le) when b.le is Infinity, then display it as +Inf when we're actually displaying it in the html-like components.
| const [data, setData] = useState(null); | ||
| const [loading, setLoading] = useState(true); | ||
| const [error, setError] = useState(null); | ||
| const [search, setSearch] = useState(params.get('search') || ''); | ||
| const [polling, setPolling] = useState(false); | ||
| const [pollInterval, setPollInterval] = useState(5000); | ||
| const snapshotsRef = useRef([]); // [{ families, timestamp }, ...] | ||
| const [snapshotVersion, setSnapshotVersion] = useState(0); // trigger re-render on snapshot update |
There was a problem hiding this comment.
I think we can simplify the state here by combining data, snapshotVersion, and snapshotsRef into one state variable. Then we can derive each of the respective values from this one state variable.
Add an interactive /metrics-viz page that fetches and parses Prometheus exposition format from environmentd and cluster replica endpoints. Features: - Prometheus text format parser with histogram de-cumulation - Grouped metric display with search filtering - Label dimension toggles for rollup granularity - Copy-to-clipboard for tables, histograms, and raw data - URL query param persistence for endpoint and search state - Polling mode with rate/delta computation across snapshots - Save/load metrics snapshots to/from files - Auto-discovery of per-process replica endpoints Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use port 6878 (internal HTTP) instead of 6876 for test base URL - Fix dropdown option visibility check to use count assertion - Add more robust waiting for async endpoint discovery Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dback - Extract shared query() into fetch.js, removing duplicates from metrics.jsx, memory.jsx, and hierarchical-memory.jsx - Create generic Expandable component replacing CollapsibleScalarTable and RawData - Replace className selectors with aria-label for test targeting - Rename ensureFamily to updateFamily with optional help/type params - Make buildScalarSeries return value instead of mutating - Replace manual parseLabels with regex-based implementation - Add unit tests for prometheus.js parsing functions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unused `extraStyle` param from CopyButton - Simplify Infinity handling in aggregateHistogramSeries - Consolidate snapshot state (data, snapshotsRef, snapshotVersion -> snapshots) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add JSDoc @typedef for Labels, Sample, MetricFamily, HistogramSeries, ScalarSeries, and delta-enriched variants - Add @param/@returns annotations to all non-component functions - Introduce groupBy, keyBy, pickLabels utilities to replace manual Map-building patterns in aggregation functions - Refactor buildHistogramSeries/buildScalarSeries to take explicit parameters and return values instead of mutating the family object Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
10ccbb0 to
436e893
Compare
Runs `tsc --checkJs --noEmit` in CI (before Playwright tests) to validate JSDoc type annotations in prometheus.js, metrics.jsx, and fetch.js. Uses @types/react@16, @types/react-dom@16, and @types/d3@5 matching the vendored runtime versions. Fixes minor type issues caught by the new check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
436e893 to
487b6c7
Compare
def-
left a comment
There was a problem hiding this comment.
Test/lint changes lgtm, only one note about helping devs make the lint green locally
| try npm install --prefix test/dataflow-visualizer --silent | ||
| try npx --prefix test/dataflow-visualizer tsc -p test/dataflow-visualizer/tsconfig.typecheck.json |
There was a problem hiding this comment.
We could check that npm exists and if not print a note for how to install it


/metrics-vizpage that parses and visualizes Prometheus metrics from environmentd and clusterd endpoints🤖 Generated with Claude Code