Skip to content

feat(mcp): rewrite dashboard authoring prompts; expose filters on save tool#2264

Open
alex-fedotyev wants to merge 3 commits into
mainfrom
alex/HDX-mcp-prompts-overhaul
Open

feat(mcp): rewrite dashboard authoring prompts; expose filters on save tool#2264
alex-fedotyev wants to merge 3 commits into
mainfrom
alex/HDX-mcp-prompts-overhaul

Conversation

@alex-fedotyev
Copy link
Copy Markdown
Contributor

Summary

I rewrote the three MCP dashboard authoring prompts and added filters to hyperdx_save_dashboard's input schema. The goal: when an agent calls the MCP server to build a dashboard, it picks the right pattern by intent, follows the renderer's actual constraints, and produces a JSON shape that renders correctly on the first try.

What changed in the prompts:

  • create_dashboard now leads with a ten-rule design checklist (RED columns with aliases, per-series numberFormat for durations, groupByColumnsOnLeft for inventory tables, dashboard-level filters instead of per-tile where literals, one-metric-per-tile for metric sources, containers and tabs for grouping). The wall-of-JSON canonical example is gone; the four dashboard_examples patterns carry the concrete shapes. The prompt is shorter and the rules are easier for the model to scan.

  • dashboard_examples is now four verified patterns (service_inventory, service_detail, log_analytics, backend_dependencies) plus the existing infrastructure_sql. Each non-SQL example leads with a "When to use" header and a "Why this shape" note so the model picks by intent, not by surface keyword match. I built and rendered every example on a live dev stack before landing the prompt; that surfaced two design issues now captured as rules:

    • chart-level numberFormat on a table that mixes counts and durations formats every value as a duration. Per-series numberFormat is the fix.
    • metric tiles take exactly one select item. The renderer destructures select[0] and ignores the rest; multi-metric authoring needs one tile per metric.
  • query_guide gains a DASHBOARD FILTERS section that documents the filters: [{ type, name, expression, sourceId, where?, whereLanguage? }] shape, a NUMBER FORMAT section that explains the per-series vs. chart-level distinction, and a PER-TILE TYPE CONSTRAINTS note about the metric one-select rule.

What changed in hyperdx_save_dashboard:

  • New filters input field on the tool's inputSchema. Reuses externalDashboardFilterSchemaWithId from @/utils/zod so the MCP and REST surfaces stay in lockstep and the existing convertExternalFiltersToInternal helper handles the conversion without any translation layer. Same body schema (createDashboardBodySchema / updateDashboardBodySchema) that the v2 REST handler already uses.

Voice pass: every prompt string is now em-dash-free, with a snapshot test guarding regressions.

Drill-down behavior (onClick from a service-inventory row into the service-detail dashboard) is a separate follow-up PR; landing the prompt rewrite first so review surface stays focused.

Test plan

  • yarn workspace @hyperdx/api jest mcp/__tests__/dashboards (51 passed, 0 failed)
  • yarn workspace @hyperdx/api ci:lint (lint + tsc + openapi lint, clean)
  • Filter round-trip: save dashboard with two filters via MCP, get back, update with one renamed filter, fetch again, assert shape
  • Filter rejection: save with bogus sourceId returns 4xx
  • Backward compat: save dashboard without filters returns filters: []
  • Snapshot tests: design checklist present, four patterns listed, each example carries "When to use", zero em-dashes in any prompt string
  • Built every example dashboard on a live dev stack (slot 10) and confirmed every tile renders with realistic synthetic data: 90K traces across 8 services with realistic StatusMessage values, 17K logs across 8 services with severity distribution
  • Em-dash-free prose verified by prose-lint.py on every changed file

…ilters on save tool

The `create_dashboard` prompt now leads with a ten-rule design checklist
(RED columns with aliases, per-series numberFormat for durations,
groupByColumnsOnLeft for inventory tables, dashboard-level filters
instead of per-tile where literals, one-metric-per-tile for metric
sources, containers and tabs for grouping). The wall-of-JSON canonical
example is gone; the four dashboard_examples patterns carry the
concrete shapes.

The dashboard_examples set is replaced with four verified patterns
(service_inventory, service_detail, log_analytics,
backend_dependencies) plus the existing infrastructure_sql. Each
non-SQL example leads with a "When to use" header and a "Why this
shape" note so the model picks by intent, not by surface keyword
match. Examples were built and rendered on a live dev stack before
landing, which surfaced two design issues now captured as rules in
the prompt:

- chart-level numberFormat on a table that mixes counts and durations
  formats every value as a duration; per-series numberFormat is the fix.
- metric tiles take exactly one select item (renderer destructures
  select[0] and ignores the rest).

The query_guide prompt gains a DASHBOARD FILTERS section that
documents the filters: [{ type, name, expression, sourceId, where?,
whereLanguage? }] shape, a NUMBER FORMAT section that explains the
per-series vs. chart-level distinction, and a PER-TILE TYPE
CONSTRAINTS note about the metric one-select rule.

hyperdx_save_dashboard now accepts `filters` on its input schema,
reusing externalDashboardFilterSchemaWithId so the MCP and REST
surfaces stay in lockstep and the existing
convertExternalFiltersToInternal helper handles the conversion
without translation. Filters round-trip through create, get, and
update.

Voice pass: every prompt string is now em-dash-free, with a snapshot
test guarding regressions.

Drill-down behavior (onClick from a service inventory row into the
service detail dashboard) is a separate follow-up PR; landing the
prompt rewrite first.
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 12, 2026

🦋 Changeset detected

Latest commit: cd37010

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@hyperdx/api Patch
@hyperdx/app Patch
@hyperdx/otel-collector Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented May 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment May 12, 2026 11:36pm

Request Review

@github-actions github-actions Bot added the review/tier-3 Standard — full human review required label May 12, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

🟡 Tier 3 — Standard

Introduces new logic, modifies core functionality, or touches areas with non-trivial risk.

Why this tier:

  • Diff size: 988 production lines changed (Tier 2 max: < 250)

Review process: Full human review — logic, architecture, edge cases.
SLA: First-pass feedback within 1 business day.

Stats
  • Production files changed: 5
  • Production lines changed: 988 (+ 474 in test files, excluded from tier calculation)
  • Branch: alex/HDX-mcp-prompts-overhaul
  • Author: alex-fedotyev

To override this classification, remove the review/tier-3 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

PR Review

✅ No critical issues found.

Nice surface coverage: the create/update filter-id normalization (stripFilterIds for create, assignFilterIds for update) is the right shape to bridge the mcpFiltersParam.partial({ id: true }) input schema with the stricter createDashboardBodySchema / updateDashboardBodySchema, and both branches are exercised by tests. getMissingSources already takes filters, so source-existence is gated correctly. The having clause on mcpTableTileSchema was a real silent-strip risk in the prior round — good catch and matching round-trip test.

Minor (non-blocking) observations:

  • assignFilterIds populates an id for brand-new filters, but convertExternalFiltersToInternal will then immediately generate yet another fresh id (since the assigned id is not in existingFilterIds). Harmless waste, but the assigned id is never the persisted one — fine to leave.
  • mcpFiltersParam uses .partial({ id: true }) over a .strict() schema. Zod preserves strictness through .partial, so unknown keys still reject — confirmed safe.
  • Prompt voice and em-dash guards look thorough; buildSourceSummary regression coverage is welcome.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

E2E Test Results

All tests passed • 176 passed • 3 skipped • 1176s

Status Count
✅ Passed 176
❌ Failed 0
⚠️ Flaky 3
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

The buildSourceSummary helper is the source-list block that
buildCreateDashboardPrompt prepends to the prompt body. It still
carried em-dashes from before the voice pass, so the assembled
`create_dashboard` prompt that the MCP transport returns to a client
contained eight em-dashes even though every builder in content.ts is
clean. Caught by an end-to-end check that fetched the prompt through
the live MCP transport and counted em-dashes in the response payload.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

Deep Review

🟡 P2 -- recommended

  • packages/api/src/mcp/__tests__/dashboards.test.ts:1448 -- The new filter test block exercises round-trip and "add new filter on update," but never asserts the contract for omitting filters on update (preserve) or sending filters: [] on update (wipe), and updateDashboard at saveDashboard.ts:453 silently gates that behavior on filters !== undefined.
    • Fix: Add two update-path tests against a seeded dashboard -- one that omits filters and asserts the persisted filters are unchanged, one that sends filters: [] and asserts the persisted array is empty.
    • ce-testing-reviewer, ce-api-contract-reviewer
🔵 P3 nitpicks (8)
  • packages/api/src/mcp/tools/dashboards/saveDashboard.ts:124 -- On the update path, an LLM that resends an existing filter without its id triggers assignFilterIds to mint a fresh ObjectId, which convertExternalFiltersToInternal then mints over again, so the persisted filter ends up with a brand-new id and the prior filter id is silently lost; no consumer keys persistent state by filter id today, so the failure is latent rather than active.
    • Fix: Either document this "no id means new filter" semantic on mcpFiltersParam and add a dedicated test, or change assignFilterIds to fall back to content-matching against existingFilterIds so a re-sent filter without an id preserves the original id.
    • ce-correctness-reviewer, ce-maintainability-reviewer, ce-api-contract-reviewer
  • packages/api/src/mcp/tools/dashboards/schemas.ts:468 -- mcpFiltersParam's description lists the filter shape as { type, name, expression, sourceId, where?, whereLanguage? } with no mention of id, so an LLM has no signal about copy-back behavior between create (id stripped) and update (id preserved when provided, re-minted when missing).
    • Fix: Extend the description to say id is optional, ignored on create, and required-for-preservation on update so callers know to round-trip it from hyperdx_get_dashboard.
    • ce-api-contract-reviewer
  • packages/api/src/mcp/tools/dashboards/saveDashboard.ts:45 -- The tool description doesn't say that on update, omitting filters preserves existing filters while filters: [] wipes them; the asymmetric behavior is only encoded in the filters !== undefined guard at saveDashboard.ts:453.
    • Fix: Add a one-line note to the hyperdx_save_dashboard description (or mcpFiltersParam description) clarifying the omit-to-preserve / empty-array-to-wipe contract.
    • ce-api-contract-reviewer
  • packages/api/src/mcp/__tests__/dashboards.test.ts:1449 -- The new having-clause test only exercises the create path, so a future regression where updateDashboardBodySchema drops having from the table-tile config wouldn't be caught.
    • Fix: Extend the test to also update the saved dashboard with a different having value and assert it persists.
    • ce-testing-reviewer
  • packages/api/src/mcp/__tests__/dashboards.test.ts:1322 -- The "reject filter with missing source" test runs against a create call (no id), so the equivalent check on the update path (saveDashboard.ts:358, which calls the same getMissingSources(sources, tilesWithId, filters) helper) is not directly covered.
    • Fix: Add a parallel test that creates a dashboard, then issues an update with a filter whose sourceId doesn't exist, and asserts the same Could not find source IDs envelope.
    • ce-testing-reviewer
  • packages/api/src/mcp/tools/dashboards/saveDashboard.ts:119 -- stripFilterIds and assignFilterIds cast each filter to ExternalDashboardFilterWithId before destructuring id, but the actual input type is the union ExternalDashboardFilter | ExternalDashboardFilterWithId; the cast is fine today because runtime fields overlap, but it sidesteps the type system and would let a future schema split go unnoticed.
    • Fix: Drop the union and type the parameter as (ExternalDashboardFilter & { id?: string })[] (or rely on the inferred zod schema type) so the destructure is type-checked rather than cast-through.
    • ce-maintainability-reviewer
  • packages/api/src/mcp/tools/dashboards/saveDashboard.ts:102 -- The Create helper section divider sits directly above stripFilterIds and assignFilterIds, but only stripFilterIds is used by create; assignFilterIds is used exclusively by update, which is mildly confusing when scanning by section.
    • Fix: Move both helpers above the Create helper divider into a shared Filter normalization section, or move assignFilterIds next to updateDashboard.
    • ce-maintainability-reviewer
  • packages/api/src/mcp/tools/dashboards/saveDashboard.ts:132 -- assignFilterIds treats an empty-string id as "missing" via id.length > 0, but this branch has no test; a future refactor that flips the check to id != null would silently start trying to persist a filter with id: ''.
    • Fix: Add a small update-path test that sends a filter with id: '' and asserts the persisted filter has a valid, non-empty id.
    • ce-testing-reviewer

Reviewers (5): correctness, testing, maintainability, kieran-typescript, api-contract.

Testing gaps:

  • Update-path filter behavior when filters is omitted (preserve) vs [] (wipe).
  • Update-path having round-trip and update-path filter sourceId rejection.
  • assignFilterIds empty-string id branch and the re-mint-on-no-id case for an existing filter.

…rompts

claude-review flagged:
- `service_detail` example uses `having: "StatusMessage != ''"` on a
  table tile, but mcpTableTileSchema.config does not include `having`.
  Zod's `.strip()` silently drops the field, so an LLM following the
  example through MCP would save a table that includes empty-message
  rows. Added `having: z.string().max(10000).optional()` mirroring
  externalDashboardTableChartConfigSchema and a round-trip test.
- Heatmap tile's numberFormat describe still referenced
  `output: "time"` while the rest of the file uses `output: "duration"`.
  Aligned to "duration" for consistency with the prompt guidance.

deep-review additionally flagged:
- query_guide told the model that metric tiles take "1 select item with
  metricName + metricType", but mcpTileSelectItemSchema does not expose
  those fields, so the keys are silently stripped. Replaced the metric
  authoring guidance with an explicit "not currently exposed via the MCP
  schema; use raw SQL for infrastructure metrics" note. Same edit on
  the design-checklist rule 9 (replaced with a "replace, not merge"
  update-semantic note that addresses a separate review concern about
  partial-payload data loss on update).
- mcpFiltersParam advertised id as optional via .partial({ id: true }),
  but createDashboardBodySchema rejects id (strict, no id) and
  updateDashboardBodySchema requires id. So an LLM copy-pasting a
  filter from get-dashboard into create, or adding a brand-new filter
  on update without an id, would hit a confusing strict-validation
  rejection. saveDashboard.ts now normalizes per flow: stripFilterIds()
  on create, assignFilterIds() on update.
- Em-dash regression check did not cover buildQueryGuidePrompt (largest
  prompt, most likely to regress) or buildSourceSummary (helpers.ts
  carried em-dashes through the initial PR until fix `3539649e`). Added
  assertions for both.
- Bad-sourceId test asserted only `text.toContain('source')`, which
  matches almost any error string. Tightened to assert
  "Could not find source IDs" plus the literal bad id.
- Numbered-rule check used `prompt.toContain('${i}.')` which would
  match substrings like "1.2s" or "0.000000001". Anchored the check
  to line start (`/^${i}\\. /m`) and scoped it to the DESIGN CHECKLIST
  section.
- DASHBOARD FILTERS and NUMBER FORMAT body content was not asserted
  beyond the heading. Now asserts substantive content (QUERY_EXPRESSION
  / expression / sourceId for filters; factor: 0.000000001 / duration /
  per-series for number format).
- Filter round-trip did not cover the freshly-generated-id branch of
  convertExternalFiltersToInternal at the MCP layer. Added a test that
  ships a new no-id filter on update and asserts saveDashboard assigns
  a fresh id.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/tier-3 Standard — full human review required

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant