Skip to content

feat: Add requireConfirmation elicitation helper for write gating#75

Open
egorpavlikhin wants to merge 7 commits intomainfrom
egorp/aif-356-require-confirmation
Open

feat: Add requireConfirmation elicitation helper for write gating#75
egorpavlikhin wants to merge 7 commits intomainfrom
egorp/aif-356-require-confirmation

Conversation

@egorpavlikhin
Copy link
Copy Markdown
Contributor

Summary

  • New src/helpers/requireConfirmation.ts — single shared helper for write-gating tool calls. Resolves through three branches: OCTOPUS_SKIP_ELICITATION=true env override, client elicitation capability (elicitation/create), or fallback to a confirm arg on the tool's own schema.
  • Returns a discriminated ConfirmationResult so callers can tell an explicit user "no" (declined / cancelled) apart from "the user was never asked" (confirmationRequired). Without this distinction the LLM can mistake the no-elicitation fallback for a real cancellation, or — worse — retry blindly with confirm: true.
  • Wires the helper into create_release and deploy_release. The confirmationRequired branch returns isError: true with directive prose so the LLM stops and asks the user instead of treating it as a user cancellation.
  • Documents the pattern in CLAUDE.md next to the existing MCP SDK guidance.

Refs AIF-356.

Scope notes

  • The Linear issue declares wiring as out-of-scope; per direct user instruction, create_release and deploy_release are wired here. Other write tools (run_runbook, submit_interruption, non-GET execute) remain out of scope and will land per-tool.
  • Existing server.tool(...) deprecated form is kept on the wired tools to minimize diff. Migrating those to registerTool is a separate cleanup.
  • Stacked on egorp/task-resources — merge the parent first.

Test plan

  • npm run build clean
  • npm run lint clean
  • npm test — 21 new helper unit tests cover all three branches plus the new reason discriminator (envSkip / accepted / fallbackConfirm / declined / cancelled / confirmationRequired). All 58 non-integration tests pass; the 13 *.integration.test.ts failures are pre-existing and need a live Octopus at localhost:8065.
  • Manual sanity in a real client:
    • Claude Code (advertises elicitation): create_release → Accept / Decline / Cancel each surface the right outcome; deploy_release likewise.
    • A client without elicitation: omitting confirm returns confirmationRequired with isError: true; passing confirm: true proceeds; passing confirm: false cancels.
    • OCTOPUS_SKIP_ELICITATION=true in server env bypasses the prompt entirely.

🤖 Generated with Claude Code

egorpavlikhin and others added 7 commits May 5, 2026 16:49
Task data now flows through three space-scoped resource templates instead
of three dedicated tools. Bulky activity logs and step timings only travel
when the agent explicitly fetches the URI, so per-call response weight
drops dramatically on the common deployment-investigation path.

URIs (registered into the existing release/dispatch registry):
- octopus://spaces/{spaceName}/tasks/{taskId}         -> ServerTask metadata (JSON)
- octopus://spaces/{spaceName}/tasks/{taskId}/details -> ServerTaskDetails (JSON)
- octopus://spaces/{spaceName}/tasks/{taskId}/log     -> raw activity log (text/plain)

BREAKING CHANGE: removes get_task_by_id, get_task_details, and get_task_raw.
Callers should switch to resources/read on the URI above (or the existing
read_resource tool backstop on clients without native resource support).
get_task_from_url is unchanged; its consolidation is tracked separately
under the from_url issue.

Discoverability:
- Adds a top-level instructions string on the McpServer so dynamic-discovery
  clients learn the resourceUri pattern and the read_resource backstop at
  initialize time, before any tool list is fetched.
- Tightens read_resource's tool description with concrete URI examples and
  cross-references to the tools that emit URIs.

References on existing tools updated to point at resource URIs:
- get_deployment_from_url's nextSteps now returns taskResourceUri /
  taskLogResourceUri instead of suggesting the deleted get_task_details.
- deploy_release's post-deploy helpText points at the task resource URI.
- get_task_from_url's error and description references updated.
- README and docs/working-with-urls.md workflows rewritten to dereference
  URIs via resources/read or read_resource.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The /log resource returns the entire activity log as plain text, which is
fine for short tasks but expensive for long-running deployments. When the
agent already knows what it is looking for (an error string, a step name,
a regex), grep_task_log returns only matching lines instead.

Parameter shape mirrors GNU grep so the schema is self-explanatory to any
LLM that knows grep:

  pattern (regex by default; fixedString:true for literal text)
  caseInsensitive  (-i)
  invertMatch      (-v)
  fixedString      (-F)
  beforeContext    (-B)
  afterContext     (-A)
  maxCount         (-m)

The response includes totalMatches across the whole log (so the caller
sees the true count even when truncated), 1-indexed line numbers, optional
before/after context arrays, and the fullLogResourceUri for fall-through
to the resource when grep was too narrow.

Implementation reuses SpaceServerTaskRepository.getRaw — no new HTTP code.
Pure-function grepLines() exported for unit tests; 12 tests cover each
flag, context-window edge cases, regex error handling, and totalMatches
accounting under maxCount.

Server-level instructions and README updated so dynamic-discovery clients
discover the tool alongside the /log resource.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…al path

The octopus://spaces/{spaceName}/tasks/{taskId}/log resource is deliberately
removed. Reasoning:

- Activity logs can be multi-megabyte. Exposing them as an addressable
  resource invites agents to fetch the entire body when grep_task_log
  returns only the matching lines (and a totalMatches count) at a tiny
  fraction of the context cost.
- The "one canonical way to fetch any given shape" rule from the proposed
  architecture: a single primitive per use case is easier to discover
  correctly than two paths with subtly different cost profiles.
- The /tasks/{id}/details resource already covers structured access (the
  ActivityLogs[] tree with categories, timings, and embedded entries) for
  callers that need programmatic traversal rather than text search.

Cross-references swept:

- get_deployment_from_url's nextSteps payload now returns a grepTaskLogHint
  object (pre-filled tool name, spaceName, taskId) instead of a
  taskLogResourceUri field. Integration test updated.
- getTaskFromUrl, deployRelease, README, docs/working-with-urls.md and the
  server-level instructions all rewritten to point at grep_task_log for
  log search and the /details URI for structured traversal.
- read_resource's description explicitly notes there is no /log URI and
  redirects callers to grep_task_log.
- grepTaskLog response field renamed fullLogResourceUri -> taskDetailsResourceUri
  (pointing at /details) since the previous URI no longer exists.
- task.test.ts replaces the /log descriptor test with two guard tests:
  (1) no descriptor named "task-log" is registered, (2) dispatching a /log
  URI returns null. Re-introducing the resource would fail both.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The server-level `instructions` string sent at MCP handshake now includes
the configured Octopus server URL on its first line. Lets agents tell
the user (or themselves, when reasoning about scope) which Octopus
instance they are talking to without an extra round trip.

Resolution mirrors getClientConfigurationFromEnvironment's precedence:
--server-url flag wins, then OCTOPUS_SERVER_URL env var. If neither is
set, the instructions show a placeholder pointing at the right
configuration entry points instead of failing silently — the actual
connection still fails at runServer time with the existing error path.

Pulled the CLI_SERVER_URL assignment earlier in src/index.ts so the
instructions are constructed after the resolved URL is known. Removed
the previous duplicate assignment further down.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single shared helper that any write tool can call before performing a
mutation. Negotiates against client capabilities and degrades gracefully:

1. OCTOPUS_SKIP_ELICITATION=true env var bypasses the gate (automation/CI).
2. Client advertises elicitation capability -> SDK emits elicitation/create
   and the helper resolves accept/decline/cancel from the user response.
3. Client without elicitation -> helper falls back to a confirm: boolean
   arg the tool surfaces in its own input schema.

Returns a discriminated ConfirmationResult so callers can tell an explicit
user "no" (declined / cancelled) apart from "the user was never asked"
(confirmationRequired). Tools surface the latter as isError: true so the
LLM stops and asks the user instead of treating it as a real cancellation.

Wires the helper into create_release and deploy_release, and documents
the pattern in CLAUDE.md alongside the existing MCP SDK guidance.

Refs AIF-356.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e prose

The confirmationRequired / cancelled response shapes were duplicated
verbatim in createRelease and deployRelease, with only the action noun
("release creation" vs "deployment") differing. As more write tools get
gated, this would multiply.

Move the response shapes into requireConfirmation.ts as a single
unconfirmedResponse(result, { action }) builder. The builder picks the
right shape (isError: true for confirmationRequired, soft cancellation
for declined/cancelled) and capitalizes the action for the cancelled
message. Tools collapse to two lines:

    if (!confirmation.confirmed) {
      return unconfirmedResponse(confirmation, { action: "deployment" });
    }

Adds 6 builder tests covering both branches and the capitalization rule.
Updates CLAUDE.md to show the simpler pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@egorpavlikhin egorpavlikhin changed the base branch from egorp/task-resources to main May 6, 2026 07:06
@egorpavlikhin
Copy link
Copy Markdown
Contributor Author

Works in Cursor that supports elicitation. Claude Desktop skips elicitation but I tried to harden the response it gets from the MCP so it goes back and asks the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant