Tolerate spec-violating list methods on backend init#5232
Tolerate spec-violating list methods on backend init#5232tgrunnagle wants to merge 1 commit intomainfrom
Conversation
When a backend advertises resources or prompts capability in its initialize response but returns JSON-RPC -32601 to resources/list or prompts/list, treat that as "the backend has no resources/prompts" rather than a fatal init error. This unblocks third-party MCP servers (e.g. Atlassian Rovo) whose initialize response contradicts their implemented method set, so users still get the backend's tools instead of silently losing them. Implements changes for issue #5231: - Recover from errors.Is(err, mcp.ErrMethodNotFound) on resources/list and prompts/list with a WARN log naming the backend and method. - Keep tools/list failures and non-(-32601) errors from list methods fatal so we are not silencing arbitrary failures. - Add table-driven unit tests with a fake JSON-RPC backend covering the recoverable, fatal, and regression-guard cases.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5232 +/- ##
==========================================
+ Coverage 67.86% 67.93% +0.06%
==========================================
Files 610 610
Lines 62522 62536 +14
==========================================
+ Hits 42431 42482 +51
+ Misses 16910 16873 -37
Partials 3181 3181 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
lorr1
left a comment
There was a problem hiding this comment.
Multi-agent review summary
Recommendation: COMMENT (non-blocking).
The fix is correct, well-scoped, and end-to-end safe. errors.Is(listErr, mcp.ErrMethodNotFound) is the right discriminator against the mcp-go v0.49.0 sentinel (JSONRPCErrorDetails.AsError() wraps it via fmt.Errorf("%w: ..."), so errors.Is matches both bare and wrapped variants). Switch ordering is nil-safe. tools/list strictness is preserved. Empty caps.Resources/caps.Prompts flow correctly through pkg/vmcp/session/factory.go and the aggregator merge — a downstream client resources/list against vMCP simply returns an empty merged set.
Four specialist reviewers (Go correctness, MCP protocol, test coverage, general code quality) consulted. Codex cross-review was attempted but timed out; not blocking.
Summary of inline findings
| # | Severity | Theme |
|---|---|---|
| 1 | MEDIUM | Switch default arm (the success-path populate loop) is unverified |
| 2 | LOW | Comment phrasing ("spec violation" / "empty resource set") |
| 3 | LOW | wantToolsCalled bool redundant with int counters |
| 4 | LOW | assert.True(strings.Contains(...)) should be assert.Contains |
| 5 | LOW | WARN log lacks workload name and base URL |
| 6 | LOW | errors.Is won't match HTTP-level method-missing variants (404/501) |
Out-of-scope / informational
- Pre-existing: list calls do not loop on
nextCursor— file a follow-up issue if pagination matters. - PR description references
mcp-go v0.43.2;go.modactually pinsv0.49.0. Sentinel and wrapping semantics are identical, so the code is correct — doc nit only. - WARN may be noisy under reconnect churn; rate-limit per
(backendID, method)only if it shows up in practice. - vMCP does not currently forward
notifications/resources/list_changed, so the empty-recovery path does not create phantom subscriptions today.
| // returns JSON-RPC -32601 to the corresponding list method, init must succeed | ||
| // with an empty capability set rather than aborting and dropping the backend's | ||
| // tools. | ||
| func TestInitAndQueryCapabilities_RecoversFromMethodNotFound(t *testing.T) { |
There was a problem hiding this comment.
[MEDIUM] Success-path (default switch arm) is unverified.
The production refactor moves the populate loop into a new default arm of the switch. None of the cases here exercise that arm with a non-empty payload — every recovery test injects an error, and every fatal-error test aborts before reaching it. A future regression that inverts a condition or drops a field assignment in the populate loop would not be caught.
Consider adding at least one positive subtable row that returns a non-empty resources (and one for prompts, ideally with arguments) and asserts field-by-field mapping including BackendID == target.WorkloadID. The existing fakeBackend already supports this — set advertiseResources: true and populate resources: []mcp.Resource{{...}}.
| // Spec violation: backend advertised the resources capability in its | ||
| // initialize response but does not implement resources/list. Recover | ||
| // with an empty resource set so the backend's tools remain usable | ||
| // instead of failing init outright. See issue #5231. |
There was a problem hiding this comment.
[LOW] Comment phrasing. Three concerns converge on this comment block (mirrored in the prompts case at line 418):
- "Recover with an empty resource set" reads as if a slice is allocated. Actually
caps.Resourcessimply remains at its zero value (nil) — range over nil andlenare both zero, so the behavior is correct, but the wording mildly misleads about state. - The MCP spec doesn't contain a normative MUST for "if you advertise capability X you must implement X/list." The label "spec violation" is rhetorically reasonable but not directly citable from the spec.
- The "why" (keep tools usable) is the third sentence in; the first two restate what the surrounding
errors.Is(...)andslog.Warnalready convey.
Suggest tightening to one rationale-focused sentence, e.g.:
| // Spec violation: backend advertised the resources capability in its | |
| // initialize response but does not implement resources/list. Recover | |
| // with an empty resource set so the backend's tools remain usable | |
| // instead of failing init outright. See issue #5231. | |
| // Tolerate -32601 here so a backend that advertises the resources | |
| // capability but does not implement resources/list (e.g. Atlassian Rovo, | |
| // see #5231) still contributes its tools instead of being dropped. |
Apply symmetrically to the prompts block at line 418.
| wantTools int | ||
| wantResources int | ||
| wantPrompts int | ||
| wantToolsCalled bool |
There was a problem hiding this comment.
[LOW] wantToolsCalled bool is redundant and asymmetric.
Every populated row sets wantToolsCalled: true and the assertion at line 305 is hard-coded to 1, so the bool encodes no information beyond "do this assert." Meanwhile resources and prompts use explicit int counters (wantResListCalls, wantPromptCalls).
Suggest replacing with wantToolsCalls int, set to 1 in each row, and applying the same pattern as the other counters at the assertion site:
if tc.wantToolsCalls > 0 {
assert.Equal(t, tc.wantToolsCalls, tc.fb.callCount(string(mcp.MethodToolsList)))
}Makes future cases that assert tools/list was not called (e.g., when neither resources nor prompts is advertised) trivial to add.
|
|
||
| _, err := initAndQueryCapabilities(context.Background(), c, target) | ||
| require.Error(t, err) | ||
| assert.True(t, strings.Contains(err.Error(), tc.errSubstr), |
There was a problem hiding this comment.
[LOW] Use assert.Contains (or require.ErrorContains) instead of reinventing it.
Gives a clearer failure message (testify shows both haystack and needle automatically) and lets you drop the strings import.
| assert.True(t, strings.Contains(err.Error(), tc.errSubstr), | |
| assert.ErrorContains(t, err, tc.errSubstr) |
| slog.Warn("backend advertised resources capability but does not implement resources/list", | ||
| "backendID", target.WorkloadID, | ||
| "method", "resources/list", | ||
| ) |
There was a problem hiding this comment.
[LOW] WARN log lacks workload name and base URL.
When this WARN fires in production it's the operator's primary breadcrumb that a backend silently produced an empty resource/prompt set. backendID (= target.WorkloadID) requires an extra ID-to-name lookup to correlate to a specific MCP server. Adding name and baseURL makes the log directly greppable. Same applies to the prompts WARN at line 422.
| slog.Warn("backend advertised resources capability but does not implement resources/list", | |
| "backendID", target.WorkloadID, | |
| "method", "resources/list", | |
| ) | |
| slog.Warn("backend advertised resources capability but does not implement resources/list", | |
| "backendID", target.WorkloadID, | |
| "name", target.WorkloadName, | |
| "baseURL", target.BaseURL, | |
| "method", "resources/list", | |
| ) |
| resResult, listErr := c.ListResources(ctx, mcp.ListResourcesRequest{}) | ||
| if listErr != nil { | ||
| switch { | ||
| case errors.Is(listErr, mcp.ErrMethodNotFound): |
There was a problem hiding this comment.
[LOW] errors.Is will not match HTTP-level method-missing variants.
Confirmed mcp-go@v0.49.0's JSONRPCErrorDetails.AsError() wraps mcp.ErrMethodNotFound for the -32601 JSON-RPC envelope, so this discriminator is correct for the Atlassian Rovo case (which returns a proper JSON-RPC error response).
Caveat worth noting: some real backends behind reverse proxies or API gateways surface "this method isn't here" as HTTP 404 / 405 / 501 rather than a JSON-RPC -32601 envelope. Those return non-sentinel errors from the transport layer, so they will fall through to the fatal case listErr != nil arm under this PR. The narrow scope is defensible (the issue specifically calls out -32601), but the next operator hitting an HTTP-404 backend will wonder why their case wasn't recovered.
No change required for this PR. Worth either (a) a follow-up issue to enumerate which transport-level signals should also be treated as method-missing, or (b) a half-line in the recovery comment noting that only JSON-RPC -32601 envelopes are recovered (HTTP-level method absence is intentionally fatal).
Summary
A vMCP backend that advertises
capabilities.resources(orcapabilities.prompts) in itsinitializeresponse but does not actually implementresources/list(orprompts/list) currently fails per-session backend init outright. Every tool from that backend silently disappears fromtools/list, leaving users with no signal that anything went wrong — a real problem against third-party servers like Atlassian's Rovo MCP that we don't control. This PR makes the per-session bootstrap tolerate that specific spec violation while keeping all other failure modes fatal.In
initAndQueryCapabilities, a JSON-RPC-32601 Method not foundfromresources/listorprompts/listis now treated as "the backend has no resources/prompts" rather than a fatal init error, with a WARN log recording the spec violation.tools/listerrors and any non--32601error from the resources/prompts list calls remain fatal.Closes #5231
Changes Made
pkg/vmcp/session/internal/backend/mcp_session.gomcp.ErrMethodNotFoundfromresources/listandprompts/listviaerrors.Isand recover with an empty result set instead of returning an error.backendIDand the offending list method, so operators can flag the upstream bug without ERROR-level noise.tools/liststrict (a backend with no tool surface is not useful to expose) and leave non--32601list errors fatal (we are not silencing arbitrary failures).pkg/vmcp/session/internal/backend/mcp_session_capabilities_test.go(new)Implementation Details
-32601detection relies onmcp-go'sJSONRPCErrorDetails.AsError()wrapping the sentinelmcp.ErrMethodNotFound, soerrors.Is(listErr, mcp.ErrMethodNotFound)is the correct discriminator across upstream message variations (e.g., the doubled"method not found: Method not found"string seen in production).Testing
pkg/vmcp/session/internal/backend/mcp_session_capabilities_test.gocover the acceptance criteria:resources/listreturns-32601after the server advertisedresources-> init succeeds with no resources, tools remain reachable.prompts/listreturns-32601after the server advertisedprompts-> init succeeds with no prompts, tools remain reachable.-32601simultaneously -> init still succeeds, tools remain reachable.tools/listreturns-32601-> init still fails (regression guard).resources/listreturns-32601INTERNAL_ERROR(non--32601) -> init still fails (regression guard).prompts/listreturnsINVALID_PARAMS(non--32601) -> init still fails (regression guard).task test(unit tests) andtask lint-fix(linting) run clean on the touched package.Additional Notes
https://mcp.atlassian.com/v1/mcp) called out in the issue's acceptance criteria has not been performed in this PR; the unit tests exercise the same code path with a fake backend that produces the equivalent-32601response. Reviewers who have the Rovo wiring available should confirmtools/listreturns the workload-prefixed Atlassian tools after OAuth completes.initializeresponse and then unconditionally call X. Out of scope for this PR.