Tolerate spec-violating list methods on backend init by tgrunnagle · Pull Request #5232 · stacklok/toolhive

tgrunnagle · 2026-05-08T20:20:24Z

Summary

A vMCP backend that advertises capabilities.resources (or capabilities.prompts) in its initialize response but does not actually implement resources/list (or prompts/list) currently fails per-session backend init outright. Every tool from that backend silently disappears from tools/list, leaving users with no signal that anything went wrong — a real problem against third-party servers like Atlassian's Rovo MCP that we don't control. This PR makes the per-session bootstrap tolerate that specific spec violation while keeping all other failure modes fatal.

In initAndQueryCapabilities, a JSON-RPC -32601 Method not found from resources/list or prompts/list is now treated as "the backend has no resources/prompts" rather than a fatal init error, with a WARN log recording the spec violation. tools/list errors and any non--32601 error from the resources/prompts list calls remain fatal.

Closes #5231

Changes Made

`pkg/vmcp/session/internal/backend/mcp_session.go`

Detect mcp.ErrMethodNotFound from resources/list and prompts/list via errors.Is and recover with an empty result set instead of returning an error.
Emit a WARN log on recovery, including backendID and the offending list method, so operators can flag the upstream bug without ERROR-level noise.
Leave tools/list strict (a backend with no tool surface is not useful to expose) and leave non--32601 list errors fatal (we are not silencing arbitrary failures).

`pkg/vmcp/session/internal/backend/mcp_session_capabilities_test.go` (new)

Table-driven tests covering each acceptance-criteria scenario via a fake streamable-http MCP backend.

Implementation Details

The -32601 detection relies on mcp-go's JSONRPCErrorDetails.AsError() wrapping the sentinel mcp.ErrMethodNotFound, so errors.Is(listErr, mcp.ErrMethodNotFound) is the correct discriminator across upstream message variations (e.g., the doubled "method not found: Method not found" string seen in production).
The recovery is scoped narrowly to the spec-violation case. Transport errors, timeouts, and other JSON-RPC error codes still abort init, preserving visibility into genuine problems.

Testing

New unit tests in pkg/vmcp/session/internal/backend/mcp_session_capabilities_test.go cover the acceptance criteria:
- resources/list returns -32601 after the server advertised resources -> init succeeds with no resources, tools remain reachable.
- prompts/list returns -32601 after the server advertised prompts -> init succeeds with no prompts, tools remain reachable.
- Both list methods return -32601 simultaneously -> init still succeeds, tools remain reachable.
- tools/list returns -32601 -> init still fails (regression guard).
- resources/list returns -32601 INTERNAL_ERROR (non--32601) -> init still fails (regression guard).
- prompts/list returns INVALID_PARAMS (non--32601) -> init still fails (regression guard).
task test (unit tests) and task lint-fix (linting) run clean on the touched package.

Additional Notes

Manual verification against the Atlassian Rovo MCP server (https://mcp.atlassian.com/v1/mcp) called out in the issue's acceptance criteria has not been performed in this PR; the unit tests exercise the same code path with a fake backend that produces the equivalent -32601 response. Reviewers who have the Rovo wiring available should confirm tools/list returns the workload-prefixed Atlassian tools after OAuth completes.
The same fragility class is worth a follow-up audit elsewhere in the session bootstrap (subscriptions, completions) — anywhere we derive "the server supports method X" from the initialize response and then unconditionally call X. Out of scope for this PR.

When a backend advertises resources or prompts capability in its initialize response but returns JSON-RPC -32601 to resources/list or prompts/list, treat that as "the backend has no resources/prompts" rather than a fatal init error. This unblocks third-party MCP servers (e.g. Atlassian Rovo) whose initialize response contradicts their implemented method set, so users still get the backend's tools instead of silently losing them. Implements changes for issue #5231: - Recover from errors.Is(err, mcp.ErrMethodNotFound) on resources/list and prompts/list with a WARN log naming the backend and method. - Keep tools/list failures and non-(-32601) errors from list methods fatal so we are not silencing arbitrary failures. - Add table-driven unit tests with a fake JSON-RPC backend covering the recoverable, fatal, and regression-guard cases.

codecov · 2026-05-08T20:45:02Z

Codecov Report

❌ Patch coverage is 86.48649% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.93%. Comparing base (9211a36) to head (b8c1d63).

Files with missing lines	Patch %	Lines
pkg/vmcp/session/internal/backend/mcp_session.go	86.48%	4 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5232      +/-   ##
==========================================
+ Coverage   67.86%   67.93%   +0.06%     
==========================================
  Files         610      610              
  Lines       62522    62536      +14     
==========================================
+ Hits        42431    42482      +51     
+ Misses      16910    16873      -37     
  Partials     3181     3181

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lorr1

Multi-agent review summary

Recommendation: COMMENT (non-blocking).

The fix is correct, well-scoped, and end-to-end safe. errors.Is(listErr, mcp.ErrMethodNotFound) is the right discriminator against the mcp-go v0.49.0 sentinel (JSONRPCErrorDetails.AsError() wraps it via fmt.Errorf("%w: ..."), so errors.Is matches both bare and wrapped variants). Switch ordering is nil-safe. tools/list strictness is preserved. Empty caps.Resources/caps.Prompts flow correctly through pkg/vmcp/session/factory.go and the aggregator merge — a downstream client resources/list against vMCP simply returns an empty merged set.

Four specialist reviewers (Go correctness, MCP protocol, test coverage, general code quality) consulted. Codex cross-review was attempted but timed out; not blocking.

Summary of inline findings

#	Severity	Theme
1	MEDIUM	Switch `default` arm (the success-path populate loop) is unverified
2	LOW	Comment phrasing ("spec violation" / "empty resource set")
3	LOW	`wantToolsCalled bool` redundant with int counters
4	LOW	`assert.True(strings.Contains(...))` should be `assert.Contains`
5	LOW	WARN log lacks workload name and base URL
6	LOW	`errors.Is` won't match HTTP-level method-missing variants (404/501)

Out-of-scope / informational

Pre-existing: list calls do not loop on nextCursor — file a follow-up issue if pagination matters.
PR description references mcp-go v0.43.2; go.mod actually pins v0.49.0. Sentinel and wrapping semantics are identical, so the code is correct — doc nit only.
WARN may be noisy under reconnect churn; rate-limit per (backendID, method) only if it shows up in practice.
vMCP does not currently forward notifications/resources/list_changed, so the empty-recovery path does not create phantom subscriptions today.

lorr1 · 2026-05-08T23:24:26Z

+// returns JSON-RPC -32601 to the corresponding list method, init must succeed
+// with an empty capability set rather than aborting and dropping the backend's
+// tools.
+func TestInitAndQueryCapabilities_RecoversFromMethodNotFound(t *testing.T) {


[MEDIUM] Success-path (default switch arm) is unverified.

The production refactor moves the populate loop into a new default arm of the switch. None of the cases here exercise that arm with a non-empty payload — every recovery test injects an error, and every fatal-error test aborts before reaching it. A future regression that inverts a condition or drops a field assignment in the populate loop would not be caught.

Consider adding at least one positive subtable row that returns a non-empty resources (and one for prompts, ideally with arguments) and asserts field-by-field mapping including BackendID == target.WorkloadID. The existing fakeBackend already supports this — set advertiseResources: true and populate resources: []mcp.Resource{{...}}.

lorr1 · 2026-05-08T23:24:26Z

+			// Spec violation: backend advertised the resources capability in its
+			// initialize response but does not implement resources/list. Recover
+			// with an empty resource set so the backend's tools remain usable
+			// instead of failing init outright. See issue #5231.


[LOW] Comment phrasing. Three concerns converge on this comment block (mirrored in the prompts case at line 418):

"Recover with an empty resource set" reads as if a slice is allocated. Actually caps.Resources simply remains at its zero value (nil) — range over nil and len are both zero, so the behavior is correct, but the wording mildly misleads about state.

The MCP spec doesn't contain a normative MUST for "if you advertise capability X you must implement X/list." The label "spec violation" is rhetorically reasonable but not directly citable from the spec.

The "why" (keep tools usable) is the third sentence in; the first two restate what the surrounding errors.Is(...) and slog.Warn already convey.

Suggest tightening to one rationale-focused sentence, e.g.:

Suggested change

// Spec violation: backend advertised the resources capability in its

// initialize response but does not implement resources/list. Recover

// with an empty resource set so the backend's tools remain usable

// instead of failing init outright. See issue #5231.

// Tolerate -32601 here so a backend that advertises the resources

// capability but does not implement resources/list (e.g. Atlassian Rovo,

// see #5231) still contributes its tools instead of being dropped.

Apply symmetrically to the prompts block at line 418.

lorr1 · 2026-05-08T23:24:26Z

+		wantTools        int
+		wantResources    int
+		wantPrompts      int
+		wantToolsCalled  bool


[LOW] wantToolsCalled bool is redundant and asymmetric.

Every populated row sets wantToolsCalled: true and the assertion at line 305 is hard-coded to 1, so the bool encodes no information beyond "do this assert." Meanwhile resources and prompts use explicit int counters (wantResListCalls, wantPromptCalls).

Suggest replacing with wantToolsCalls int, set to 1 in each row, and applying the same pattern as the other counters at the assertion site:

if tc.wantToolsCalls > 0 { assert.Equal(t, tc.wantToolsCalls, tc.fb.callCount(string(mcp.MethodToolsList))) }

Makes future cases that assert tools/list was not called (e.g., when neither resources nor prompts is advertised) trivial to add.

lorr1 · 2026-05-08T23:24:26Z

+
+			_, err := initAndQueryCapabilities(context.Background(), c, target)
+			require.Error(t, err)
+			assert.True(t, strings.Contains(err.Error(), tc.errSubstr),


[LOW] Use assert.Contains (or require.ErrorContains) instead of reinventing it.

Gives a clearer failure message (testify shows both haystack and needle automatically) and lets you drop the strings import.

Suggested change

assert.True(t, strings.Contains(err.Error(), tc.errSubstr),

assert.ErrorContains(t, err, tc.errSubstr)

lorr1 · 2026-05-08T23:24:26Z

+			slog.Warn("backend advertised resources capability but does not implement resources/list",
+				"backendID", target.WorkloadID,
+				"method", "resources/list",
+			)


[LOW] WARN log lacks workload name and base URL.

When this WARN fires in production it's the operator's primary breadcrumb that a backend silently produced an empty resource/prompt set. backendID (= target.WorkloadID) requires an extra ID-to-name lookup to correlate to a specific MCP server. Adding name and baseURL makes the log directly greppable. Same applies to the prompts WARN at line 422.

Suggested change

slog.Warn("backend advertised resources capability but does not implement resources/list",

"backendID", target.WorkloadID,

"method", "resources/list",

)

slog.Warn("backend advertised resources capability but does not implement resources/list",

"backendID", target.WorkloadID,

"name", target.WorkloadName,

"baseURL", target.BaseURL,

"method", "resources/list",

)

lorr1 · 2026-05-08T23:24:26Z

 		resResult, listErr := c.ListResources(ctx, mcp.ListResourcesRequest{})
-		if listErr != nil {
+		switch {
+		case errors.Is(listErr, mcp.ErrMethodNotFound):


[LOW] errors.Is will not match HTTP-level method-missing variants.

Confirmed mcp-go@v0.49.0's JSONRPCErrorDetails.AsError() wraps mcp.ErrMethodNotFound for the -32601 JSON-RPC envelope, so this discriminator is correct for the Atlassian Rovo case (which returns a proper JSON-RPC error response).

Caveat worth noting: some real backends behind reverse proxies or API gateways surface "this method isn't here" as HTTP 404 / 405 / 501 rather than a JSON-RPC -32601 envelope. Those return non-sentinel errors from the transport layer, so they will fall through to the fatal case listErr != nil arm under this PR. The narrow scope is defensible (the issue specifically calls out -32601), but the next operator hitting an HTTP-404 backend will wonder why their case wasn't recovered.

No change required for this PR. Worth either (a) a follow-up issue to enumerate which transport-level signals should also be treated as method-missing, or (b) a half-line in the recovery comment noting that only JSON-RPC -32601 envelopes are recovered (HTTP-level method absence is intentionally fatal).

github-actions Bot added the size/M Medium PR: 300-599 lines changed label May 8, 2026

tgrunnagle marked this pull request as ready for review May 8, 2026 20:36

tgrunnagle requested review from JAORMX, amirejaz, jerm-dro, jhrozek and yrobla as code owners May 8, 2026 20:36

lorr1 reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tolerate spec-violating list methods on backend init#5232

Tolerate spec-violating list methods on backend init#5232
tgrunnagle wants to merge 1 commit intomainfrom
issue_5231

tgrunnagle commented May 8, 2026

Uh oh!

codecov Bot commented May 8, 2026

Uh oh!

lorr1 left a comment

Uh oh!

lorr1 May 8, 2026

Uh oh!

lorr1 May 8, 2026

Uh oh!

lorr1 May 8, 2026

Uh oh!

lorr1 May 8, 2026

Uh oh!

lorr1 May 8, 2026

Uh oh!

lorr1 May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	assert.True(t, strings.Contains(err.Error(), tc.errSubstr),
	assert.ErrorContains(t, err, tc.errSubstr)

Conversation

tgrunnagle commented May 8, 2026

Summary

Changes Made

pkg/vmcp/session/internal/backend/mcp_session.go

pkg/vmcp/session/internal/backend/mcp_session_capabilities_test.go (new)

Implementation Details

Testing

Additional Notes

Uh oh!

codecov Bot commented May 8, 2026

Codecov Report

Uh oh!

lorr1 left a comment

Choose a reason for hiding this comment

Multi-agent review summary

Summary of inline findings

Out-of-scope / informational

Uh oh!

lorr1 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

lorr1 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

lorr1 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

lorr1 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

lorr1 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

lorr1 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`pkg/vmcp/session/internal/backend/mcp_session.go`

`pkg/vmcp/session/internal/backend/mcp_session_capabilities_test.go` (new)