feat(mcp): rewrite dashboard authoring prompts; expose filters on save tool#2264
feat(mcp): rewrite dashboard authoring prompts; expose filters on save tool#2264alex-fedotyev wants to merge 3 commits into
Conversation
…ilters on save tool
The `create_dashboard` prompt now leads with a ten-rule design checklist
(RED columns with aliases, per-series numberFormat for durations,
groupByColumnsOnLeft for inventory tables, dashboard-level filters
instead of per-tile where literals, one-metric-per-tile for metric
sources, containers and tabs for grouping). The wall-of-JSON canonical
example is gone; the four dashboard_examples patterns carry the
concrete shapes.
The dashboard_examples set is replaced with four verified patterns
(service_inventory, service_detail, log_analytics,
backend_dependencies) plus the existing infrastructure_sql. Each
non-SQL example leads with a "When to use" header and a "Why this
shape" note so the model picks by intent, not by surface keyword
match. Examples were built and rendered on a live dev stack before
landing, which surfaced two design issues now captured as rules in
the prompt:
- chart-level numberFormat on a table that mixes counts and durations
formats every value as a duration; per-series numberFormat is the fix.
- metric tiles take exactly one select item (renderer destructures
select[0] and ignores the rest).
The query_guide prompt gains a DASHBOARD FILTERS section that
documents the filters: [{ type, name, expression, sourceId, where?,
whereLanguage? }] shape, a NUMBER FORMAT section that explains the
per-series vs. chart-level distinction, and a PER-TILE TYPE
CONSTRAINTS note about the metric one-select rule.
hyperdx_save_dashboard now accepts `filters` on its input schema,
reusing externalDashboardFilterSchemaWithId so the MCP and REST
surfaces stay in lockstep and the existing
convertExternalFiltersToInternal helper handles the conversion
without translation. Filters round-trip through create, get, and
update.
Voice pass: every prompt string is now em-dash-free, with a snapshot
test guarding regressions.
Drill-down behavior (onClick from a service inventory row into the
service detail dashboard) is a separate follow-up PR; landing the
prompt rewrite first.
🦋 Changeset detectedLatest commit: cd37010 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
🟡 Tier 3 — StandardIntroduces new logic, modifies core functionality, or touches areas with non-trivial risk. Why this tier:
Review process: Full human review — logic, architecture, edge cases. Stats
|
PR Review✅ No critical issues found. Nice surface coverage: the create/update filter-id normalization ( Minor (non-blocking) observations:
|
E2E Test Results✅ All tests passed • 176 passed • 3 skipped • 1176s
Tests ran across 4 shards in parallel. |
The buildSourceSummary helper is the source-list block that buildCreateDashboardPrompt prepends to the prompt body. It still carried em-dashes from before the voice pass, so the assembled `create_dashboard` prompt that the MCP transport returns to a client contained eight em-dashes even though every builder in content.ts is clean. Caught by an end-to-end check that fetched the prompt through the live MCP transport and counted em-dashes in the response payload.
Deep Review🟡 P2 -- recommended
🔵 P3 nitpicks (8)
Reviewers (5): correctness, testing, maintainability, kieran-typescript, api-contract. Testing gaps:
|
…rompts
claude-review flagged:
- `service_detail` example uses `having: "StatusMessage != ''"` on a
table tile, but mcpTableTileSchema.config does not include `having`.
Zod's `.strip()` silently drops the field, so an LLM following the
example through MCP would save a table that includes empty-message
rows. Added `having: z.string().max(10000).optional()` mirroring
externalDashboardTableChartConfigSchema and a round-trip test.
- Heatmap tile's numberFormat describe still referenced
`output: "time"` while the rest of the file uses `output: "duration"`.
Aligned to "duration" for consistency with the prompt guidance.
deep-review additionally flagged:
- query_guide told the model that metric tiles take "1 select item with
metricName + metricType", but mcpTileSelectItemSchema does not expose
those fields, so the keys are silently stripped. Replaced the metric
authoring guidance with an explicit "not currently exposed via the MCP
schema; use raw SQL for infrastructure metrics" note. Same edit on
the design-checklist rule 9 (replaced with a "replace, not merge"
update-semantic note that addresses a separate review concern about
partial-payload data loss on update).
- mcpFiltersParam advertised id as optional via .partial({ id: true }),
but createDashboardBodySchema rejects id (strict, no id) and
updateDashboardBodySchema requires id. So an LLM copy-pasting a
filter from get-dashboard into create, or adding a brand-new filter
on update without an id, would hit a confusing strict-validation
rejection. saveDashboard.ts now normalizes per flow: stripFilterIds()
on create, assignFilterIds() on update.
- Em-dash regression check did not cover buildQueryGuidePrompt (largest
prompt, most likely to regress) or buildSourceSummary (helpers.ts
carried em-dashes through the initial PR until fix `3539649e`). Added
assertions for both.
- Bad-sourceId test asserted only `text.toContain('source')`, which
matches almost any error string. Tightened to assert
"Could not find source IDs" plus the literal bad id.
- Numbered-rule check used `prompt.toContain('${i}.')` which would
match substrings like "1.2s" or "0.000000001". Anchored the check
to line start (`/^${i}\\. /m`) and scoped it to the DESIGN CHECKLIST
section.
- DASHBOARD FILTERS and NUMBER FORMAT body content was not asserted
beyond the heading. Now asserts substantive content (QUERY_EXPRESSION
/ expression / sourceId for filters; factor: 0.000000001 / duration /
per-series for number format).
- Filter round-trip did not cover the freshly-generated-id branch of
convertExternalFiltersToInternal at the MCP layer. Added a test that
ships a new no-id filter on update and asserts saveDashboard assigns
a fresh id.
Summary
I rewrote the three MCP dashboard authoring prompts and added
filterstohyperdx_save_dashboard's input schema. The goal: when an agent calls the MCP server to build a dashboard, it picks the right pattern by intent, follows the renderer's actual constraints, and produces a JSON shape that renders correctly on the first try.What changed in the prompts:
create_dashboardnow leads with a ten-rule design checklist (RED columns with aliases, per-seriesnumberFormatfor durations,groupByColumnsOnLeftfor inventory tables, dashboard-level filters instead of per-tilewhereliterals, one-metric-per-tile for metric sources, containers and tabs for grouping). The wall-of-JSON canonical example is gone; the fourdashboard_examplespatterns carry the concrete shapes. The prompt is shorter and the rules are easier for the model to scan.dashboard_examplesis now four verified patterns (service_inventory,service_detail,log_analytics,backend_dependencies) plus the existinginfrastructure_sql. Each non-SQL example leads with a "When to use" header and a "Why this shape" note so the model picks by intent, not by surface keyword match. I built and rendered every example on a live dev stack before landing the prompt; that surfaced two design issues now captured as rules:numberFormaton a table that mixes counts and durations formats every value as a duration. Per-seriesnumberFormatis the fix.select[0]and ignores the rest; multi-metric authoring needs one tile per metric.query_guidegains aDASHBOARD FILTERSsection that documents thefilters: [{ type, name, expression, sourceId, where?, whereLanguage? }]shape, aNUMBER FORMATsection that explains the per-series vs. chart-level distinction, and aPER-TILE TYPE CONSTRAINTSnote about the metric one-select rule.What changed in
hyperdx_save_dashboard:filtersinput field on the tool'sinputSchema. ReusesexternalDashboardFilterSchemaWithIdfrom@/utils/zodso the MCP and REST surfaces stay in lockstep and the existingconvertExternalFiltersToInternalhelper handles the conversion without any translation layer. Same body schema (createDashboardBodySchema/updateDashboardBodySchema) that the v2 REST handler already uses.Voice pass: every prompt string is now em-dash-free, with a snapshot test guarding regressions.
Drill-down behavior (onClick from a service-inventory row into the service-detail dashboard) is a separate follow-up PR; landing the prompt rewrite first so review surface stays focused.
Test plan
yarn workspace @hyperdx/api jest mcp/__tests__/dashboards(51 passed, 0 failed)yarn workspace @hyperdx/api ci:lint(lint + tsc + openapi lint, clean)sourceIdreturns 4xxfiltersreturnsfilters: []prose-lint.pyon every changed file