feat(seer): migrate autofix tool to explorer-mode endpoint#961
feat(seer): migrate autofix tool to explorer-mode endpoint#961JoshFerge wants to merge 2 commits into
Conversation
Switch `analyze_issue_with_seer` and the issue-details Seer summary off the legacy `steps[]` payload and onto the explorer schema (`blocks` keyed by `message.metadata.step`, typed `artifacts`, `merged_file_patches`, `repo_pr_states`). Both autofix endpoints now hit `?mode=explorer`, with POSTs driving the run one step at a time (`root_cause`, then `solution`). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/eARu0DHoypI1IqAI03asQcPX__pjX3J8ESCzdONusbs
- Drop the stale `--all-scopes` flag the bin no longer accepts; the scorer-side stdio spawn was printing usage and exiting, surfacing as `MCPClientError: Connection closed` across every eval. - Switch the `predictedTools[*].arguments` field to a JSON-encoded string; `z.record(z.any())` emits `additionalProperties` with no `type`, which OpenAI's strict response_format rejects. With both fixes, `autofix.eval.ts` scores 1.00 / 1.00 against the new explorer-mode tool flow. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| if (!hasSection(sections, step) && runId !== undefined) { | ||
| try { | ||
| await apiService.startAutofix({ | ||
| organizationSlug: orgSlug, | ||
| issueId, | ||
| step, | ||
| runId, | ||
| }); |
There was a problem hiding this comment.
Existing terminal/errored run still triggers solution step start
When an existing autofix run is already in a terminal state (e.g., error) or awaiting_user_input and only has a completed root_cause section, the new loop will still call startAutofix({step: "solution", runId}) on that dead run because it only short-circuits per-section via findCompletedSection, not on the run's overall status. The previous implementation returned terminal results immediately without issuing further POSTs.
Verification
Read lines 238-300 of the new handler. For an existing run with status === "error" and only a completed root_cause section: the loop iteration for root_cause hits findCompletedSection and continues. The iteration for solution sees !hasSection(sections, "solution") (true) and runId !== undefined (true, since runId = state.autofix?.run_id). It therefore POSTs startAutofix with step: "solution" against a run that is already terminal. Old code at the deleted lines 154-176 short-circuited via isTerminalStatus(existingStatus) and returned without further POSTs. No guard in the new flow inspects state.autofix.status before issuing the start, so this regression is reachable from a normal saved-state path.
Identified by Warden code-review · YBH-RMV
|
Superseded by a 3-PR stack for easier review:
Closing here so review happens on the stack. |
Summary
analyze_issue_with_seerand the issue-details Seer summary off the legacysteps[]autofix payload and onto the explorer-mode schema (blockskeyed bymessage.metadata.step, typedartifacts,merged_file_patches,repo_pr_states).?mode=explorer. The POST takes{step, run_id?, user_context?, insert_index?}; the tool drives the run sequentially:root_cause→solution. Arun_idis reused from the GET if one already exists, mirroring the upstreamuseExplorerAutofixflow.getOrderedAutofixSectionsmirrorsgetOrderedAutofixSectionsinstatic/app/components/events/autofix/useExplorerAutofix.tsxand surfaces the typedroot_cause/solutionartifacts plus a synthesizedpull_requestsection fromrepo_pr_states.processing/completed/error/awaiting_user_input).ToolPredictionScorerwas broken in two ways unrelated to autofix (--all-scopesflag no longer exists;z.record(z.any())is rejected by OpenAI strict response_format). Both fixed in a follow-up commit so the eval can actually run.Test plan
pnpm run tscpnpm run lintpnpm run test— 307 tests pass across all packagespnpm run --filter @sentry/mcp-core generate-definitions(regenerated tool description)pnpm vitest run autofix.eval.ts— both cases score 1.00 with the new tool flowNotes for reviewers
api-client/schema.tsare passthrough where the upstream endpoint is marked experimental, so additive fields shouldn't break parsing.waitForSection) rather than a single global terminal-state poll. The 5-minute timeout still applies and renders a partial-output summary on timeout.awaiting_user_inputis surfaced viagetHumanInterventionGuidanceand ends the run gracefully (same retry hint as before).formatSeerSummaryfalls back to the most substantive assistant message in a section when the structured artifact is missing, preserving the prior "give the reader something" behavior.🤖 Generated with Claude Code