feat(agents): Agent Studio command tree (+ provider CLI flags) by cmarguta-alg · Pull Request #214 · algolia/cli

cmarguta-alg · 2026-05-07T12:21:32Z

Summary

Full Agent Studio CLI surface (algolia agents, api/agentstudio/) as developed on lab_week_3, plus provider flag shortcuts for common agents providers create|update flows (see docs/agents.md and docs/qa/arg_friendly_providers_SIGNOFF.md).

Notes

Intended to land as the main integration PR if PR #212 is superseded by this branch, or to be reconciled during review with that PR’s commits.
Anya QA vet notes remain local under tmp/ only (not committed).

Test plan

task test / go test ./pkg/cmd/agents/... and go test ./api/agentstudio/...
go test ./e2e -tags=e2e -run TestAgents with ALGOLIA_APPLICATION_ID + ALGOLIA_API_KEY

Introduces a stand-alone, hand-written client in api/agentstudio/ that later cmd/agent-studio/* commands will sit on top of. - host.go: resolve base URL from --agent-studio-url override, region (eu/us, staging only in eu), or {appID}.algolia.net/agent-studio cluster-proxy fallback. - types.go: Go structs for Agent and PaginatedAgentsResponse, matching the camelCase wire format produced by the conversational-ai backend. - errors.go: structured APIError + sentinels (ErrUnauthorized, ErrForbidden, ErrFeatureDisabled, ErrNotFound, ErrServer) so callers can branch without parsing strings. - client.go: ListAgents end-to-end, with X-Algolia-Application-Id / X-Algolia-API-Key / X-Algolia-User-ID headers and JSON-error parsing. Tests cover region+env+override+fallback resolution, request/response round-trips, pagination params, context cancellation, and every documented HTTP error mapping. Verified live against a beta app on the deployed conversational-ai backend; no factory wiring yet. A TODO marks future replacement with a generated client once the backend publishes one from specs/agent-studio/spec.yml. Co-authored-by: Cursor <cursoragent@cursor.com>

Wires a new agentstudio client into the factory and exposes the first two read-only commands. `list` supports --page, --per-page, --provider-id, and --output (table on TTY, JSON for piping). `get <id>` defaults to JSON. Host resolution uses, in order: 1. ALGOLIA_AGENT_STUDIO_URL env var, or the profile's "agent_studio_url" field 2. Build-time agentstudio.DefaultBaseURL (ldflag-injected from .env's ALGOLIA_AGENT_STUDIO_URL — internal beta builds set this to https://agent-studio.staging.eu.algolia.com) 3. Cluster-proxy fallback https://<appID>.algolia.net/agent-studio (recommended for production end-users — the application's own cluster routes the request to the right region, per the conversational-ai README) Verified live against the beta backend (staging EU) using only `task build` defaults; fixes a regression where 422 validation errors were surfaced with the generic "Input is invalid, see detail/body:" prefix instead of the structured FastAPI detail.msg — extractDetail now prefers structured detail[] over the generic message field. Co-authored-by: Cursor <cursoragent@cursor.com>

Six new mutating commands under `algolia agents`, completing parity with the conversational-ai backend's `/v1/agents/*` surface. Body for `create`/`update` is passed as opaque `json.RawMessage`. The AgentConfigCreate schema is large, deeply validated, and evolves frequently — the CLI is a pass-through; the backend validates; our 422-detail surfacing (Phase 2 fix) makes errors actionable. Mirroring the schema in Go would lie about parity and force a CLI release on every backend change. `--dry-run` shows the resolved request body in human mode (more useful than the existing `objects/update` count-only summary), and emits a structured `{action, request, source, bytes, body, dryRun}` envelope only when `--output` is *explicitly* set — `WithDefaultOutput ("json")` from the success path no longer steals the human dry-run path silently. `delete` follows `objects/delete` exactly for the confirmation contract: `-y/--confirm` skips the prompt; non-TTY without `--confirm` refuses with `cmdutil.FlagErrorf`; `--dry-run` GETs the agent and prints "would DELETE" without calling DELETE. Pre-fetching lets us show name + status both at the prompt and in dry-run output. DELETE handles 204 No Content correctly (no body decode); the other five mutating endpoints share `doAgentMutation` (POST/PATCH + JSON Agent response). Live-verified end-to-end against staging EU: 13/13 checks pass — real create, get, update (rename), duplicate, delete; publish correctly surfaces backend `409 Cannot publish an agent without a model or provider` for a model-less smoke agent. Co-authored-by: Cursor <cursoragent@cursor.com>

Two new commands close the developer loop on the agents tree: `agents test` for iterating on an in-memory agent configuration without persisting it (backend special-cases agent_id="test"), and `agents run <id>` for invoking a persisted agent — the same call shape downstream apps make in production. The wire format is not standard SSE. Recon caught this before code was written: v5 IS standard SSE (`data: <json>\n\n` with a `[DONE]` sentinel), but v4 is a Vercel-AI bespoke protocol — `<type>:<json>\n` per line, no terminator, with 16 single-character type codes mapped to names (`0`=text, `9`=tool-call, `d`=finish-message, …). The `compatibilityMode` query parameter is mandatory server-side with no default; the CLI defaults to v5 (standard, easier to parse defensively) and exposes `--compatibility v4|v5`. The new `api/agentstudio/sse.go` parser sniffs the line prefix and handles both formats transparently — same approach the backend's own acceptance test helpers use. `client.Completions(...)` returns the raw `*http.Response` so the caller decides streaming vs buffered based on `Content-Type`. One method instead of two avoids duplicating request building for one endpoint. The error path closes the body explicitly (the existing `checkResponse` only drains a 64 KiB prefix). Streaming output is NDJSON regardless of TTY — one parsed event per line as `{"type":"...","data":{...}}`. Predictable; pipes cleanly into `jq -r 'select(.type=="text-delta") | .data.delta'` for the "just show me the text" use case. Rich TTY rendering (incremental token print + dim tool-calls) is deferred — not worth growing this PR another ~150 LOC for a UX nicety. `--no-stream` returns the buffered JSON response verbatim. `--dry-run` prints the resolved body and skips the HTTP call. `--message` is a one-shot convenience that wraps as `[{"role":"user","content":"<text>"}]`; `--input` reads a messages JSON array from a file or `-` (stdin); cobra's `MarkFlagsMutuallyExclusive` enforces exactly one. `signal.NotifyContext(SIGINT)` is wired locally inside `test/run`, not in `root.go`. Streaming commands are the only ones that need mid-call cancellation; touching the global root context risked silently regressing every other command's signal behavior. Backend's discriminated `messages` union is pre-validated client- side with a cheap `[]any` shape check so the obvious "passed an object instead of an array" case names the file in the error instead of producing a 422. Live-verified end-to-end against staging EU: 11/11 checks pass. The test app has no LLM provider configured, so a real completion can't run — but the 422s we get back ("Field required" from the test endpoint, "Agent is not published" from `agents run` against the existing draft agent) confirm URL routing, query params, headers, body framing, and 422-detail extraction all land where intended. The streaming parser path is unit-tested against a synthetic SSE source (8 cases covering both wire formats, malformed lines, large frames, [DONE], onEvent error propagation). Other pieces: - `NormalizeCompatibility` lives in `pkg/cmd/agents/shared/` (extracted on second use, same rule as `printDryRun` in Phase 3). - `BuildMessages` / `ReadJSONFile` / `MarshalCompletionBody` / `RenderCompletion` shared between the two commands; all unit- tested in isolation. Co-authored-by: Cursor <cursoragent@cursor.com>

…idation order Three pieces that round out the agents tree for review. AGENTS.md grew an "Agent Studio" section so the next contributor doesn't replay the recon: architecture, auth model, base-URL resolution priority chain, SSE wire format gotchas (v4 IS NOT standard SSE; `compatibilityMode` is mandatory; CLI defaults to v5), dry-run convention (`cmd.Flags().Changed("output")` gating, not `PrintFlags.HasStructuredOutput()`), and a brief on why no bespoke telemetry events were added. Two new e2e testscripts: - testscripts/agents/dry-run.txtar (ungated): the contract test that the dry-run paths still produce stable output through the full cobra tree under the e2e binary. Exercises create/update structured + human modes, test with --input/--message, run with the agent id in the path, and the flag-validation guards. Covers the create/update/test/run dry-run paths in one file (delete dry-run is exercised in unit tests; it doesn't fit the same shape because it pre-fetches the agent and so still needs network). - testscripts/agents/list.txtar (env-gated on ALGOLIA_AGENT_STUDIO_E2E=1): read-only smoke confirming the live backend still emits the wire format we parse against. Gated because the standard e2e test app doesn't necessarily have the gen_ai feature flag enabled — running this against the wrong app would 403 and fail the suite. To make the env-gating possible, registered an `envCondition` in e2e_test.go that exposes `[env:VAR]` and `[!env:VAR]` to all testscripts. The cli/go-internal testscript framework only ships GOOS/GOARCH conditions out of the box; this is a one-liner that treats "" and "0" as falsy (matches how callers actually set ALGOLIA_AGENT_STUDIO_E2E=1). Behavior fix discovered while writing dry-run.txtar: `--compatibility` was validated AFTER the dry-run short-circuit in `agents test` and `agents run`, so `--compatibility v9 --dry-run` silently produced a "valid-looking" preview for an invalid request. Validation now runs first; same rationale as rejecting --input + --message together at flag-parse time. The unit test that asserted "invalid value rejected" had been running in non-dry-run mode and thus already covered the new behavior; no test churn needed. Telemetry deliberately NOT added. The existing `Command Invoked` event from `root.PersistentPreRunE` carries `{command: cmd.CommandPath(), flags: [<changed flag names>]}` which already attributes per verb (`algolia agents create`) and per dry-run (flags contains "dry-run"). Adding bespoke `agents.<verb>.{start,success,error,dry_run}` events would be the first in-command telemetry calls in the entire codebase, diverging from convention for one feature only. Outcome (success/error) is a separate, all-commands refactor. Docs generator (`go run ./cmd/docs --app_data-path <dir>`) verified to serialize all 10 agents commands cleanly with the expected frontmatter / examples / usage strings. Output is consumed downstream by the docs site and is intentionally not committed. Live-CRUD testscript (create→get→update→delete chained on the same id) deferred — it needs an id-extraction helper in the testscript framework that we don't have yet. The shell smokes in tmp/qa/ already exercise that path against staging. Co-authored-by: Cursor <cursoragent@cursor.com>

User caught a real terminological collision in code review. The brief said "tools to test and debug their Agent Studio setups with --dry-run". I had read that as a literal flag spec on every relevant command. Cross-checking the backend disproved that read: - Strings `dry_run`/`dryRun`/`dry-run` in `algolia/conversational-ai@development`: zero hits. - The closest backend primitive is `AgentTestConfiguration` + the `agent_id="test"` route — a real completion against an in-memory configuration, nothing persisted. That route is what `agents test` (now renamed `agents try`) wraps. The whole command IS the conversational-ai-side "dry-run" — running unsaved config against the real backend. So `agents test --dry-run` read literally as "dry-run a dry-run", which is the awkward case the user pointed at. Resolution: 1. Rename `agents test` → `agents try`. "Try" is single-word (preserves the CLI-wide convention — zero hyphenated subcommands exist anywhere in the tree), reads as "experiment without commitment", and avoids the pytest/unit-test connotations of "test" that don't apply here. `try` is registered in `agents.go` via the `trycmd` import alias to avoid clashing with Go's `try` keyword candidate / common variable name. 2. Drop the `--dry-run` flag (and its branch in `runTryCmd`) from the renamed command. There's nothing useful to "preview without HTTP" — the command is already side-effect-free. Dropped the corresponding unit test and replaced it with a comment block explaining why the test no longer exists. 3. Keep `--dry-run` on `create / update / delete / run`. That flag is the standard CLI preview convention (matches `objects/update --dry-run`); it lives on commands that DO have side effects (`create`/`update`/`delete` mutate state, `run` invokes a persisted agent which logs conversations + analytics + tokens). No collision with the conversational-ai concept on any of those. 4. Updated `e2e/testscripts/agents/dry-run.txtar`: removed the `agents test --dry-run` cases, added a regression assertion that `agents try --dry-run` is rejected as `unknown flag` so a future contributor doesn't re-add it without seeing this rationale. New `try.txtar` covers the four flag-validation guards on `agents try` (missing --config, missing --input/-m, --input/-m together, invalid --compatibility) — none of which need network, all fire before client construction. 5. AGENTS.md grew an "On `--dry-run`" section spelling out the two distinct concepts and why they must stay separate. Future contributors see this before touching the agents tree. What I got wrong originally: I read the brief's "--dry-run" literally instead of as shorthand for "the test/debug capability the spec describes". Once `agents try` exists, the brief's intent is satisfied by the command itself; layering a CLI-convention `--dry-run` on top of it was duplicating the deliverable. The unit tests I'd written for `agents test --dry-run` had been passing because the dry-run branch was correctly implemented — not because it was the right thing to implement. Pattern noted: when a feature already satisfies an intent, double-check before adding a flag that "completes" it. Net diff: +28 / -378. The body-preview semantics are still available everywhere they make sense; only the redundant collision case is gone. Co-authored-by: Cursor <cursoragent@cursor.com>

Closes the four completion query-param gaps and the cache-invalidate endpoint flagged by the parity audit. Smallest delta of the planned phases — chosen to land first because it validates the per-phase-PR rhythm without introducing brand-new architectural surface. What's in: - `agents try` and `agents run` learn `--no-cache`, `--no-memory`, `--no-analytics`, and `--secure-user-token <jwt>`. Maps onto the backend's documented completion query params and the X-Algolia-Secure-User-Token header. - New `agents cache invalidate <agent-id> [--before YYYY-MM-DD]` wraps `DELETE /1/agents/{id}/cache`. Mirrors `agents delete`'s confirmation contract: TTY prompts, non-TTY refuses without `--confirm`, `--dry-run` bypasses both since it's non-destructive. Polarity calls (worth knowing for future contributors): - The CompletionOptions.No* fields are inverted from the wire on purpose. Backend defaults all three to true; only the negative is interesting to expose at the CLI surface, and `memory=true` would actually 422 (the schema is `anyOf [{const false}, {type null}]`). So the wire form omits the param when the No* field is false and sends `<param>=false` when true. Polarity is pinned end-to-end by a table-driven wire-mapping test in completions_test.go and smaller cmd-level guards in try/run. - Date validation on `--before` is deliberately the backend's job. Mirroring Pydantic's date parser in Go would create silent skew on minor backend bumps; the 422-detail surfacing already turns bad input into an actionable message verbatim. Architecture: introduces the first nested cobra group under `agents` (`agents cache <verb>`). Phase 6+ reuse this for providers / conversations / keys / domains. Per-verb sub-package wasn't worth it for one verb; if `cache stats` etc. land later, split then. Refactor on second use deferred deliberately: 4 new flags duplicate mechanically across try.go and run.go (8 lines per command). Could be extracted into a `RegisterCompletionFlags` helper, but try and run are exactly two consumers and the duplication is straight-line. If a third consumer appears, extract following the rule we've used since Phase 3 (`PrintDryRun`, `NormalizeCompatibility`). Coverage: - `api/agentstudio/`: 1 new client method (`InvalidateAgentCache`), extended `CompletionOptions`, table-driven wire-mapping test covering all four flag combinations + the secure-user-token header, four new cache-endpoint tests (no-before / with-before / 404→ErrNotFound / structured-422-detail surfaces verbatim). - `pkg/cmd/agents/cache/`: new package, 6 unit tests covering happy path, before-flag, dry-run with and without `--before`, missing agent-id, non-TTY-without-confirm, and 404 propagation. - `pkg/cmd/agents/try/` and `run/`: one targeted end-to-end test each, asserting cobra→opts→client wiring is intact for all four new flags (catches polarity transposition, missed forwarding). - e2e: new `cache.txtar` (5 contract checks); `dry-run.txtar` extended with a regression that the new `--no-*` flags are wire-only and don't leak into the dry-run body preview. Live verification deferred. Staging test app still has no LLM provider configured, so a green completion isn't possible — but cache invalidation doesn't depend on a provider, and the unit + e2e coverage is exhaustive for the wire shapes. A future Phase 6 commit that lands provider CRUD will revive the live-completion smoke. Strategy: per-phase PR from here on. This branch (`lab_week_3_phase5`) is cut from `lab_week_3` HEAD; the PR will target `cmarguta-alg:lab_week_3` so the diff is bounded to Phase 5 only. When PR algolia#212 merges upstream, this PR's base re-targets to `algolia:main`. Net diff: +390 / −21 across 10 files, 2 new files (cache package, cache.txtar). 13/13 agents packages green, all 4 e2e testscripts pass (try / cache / dry-run / list-skipped-as-gated), gofumpt clean, golangci-lint clean. Co-authored-by: Cursor <cursoragent@cursor.com>

Adds two new nested command groups under `algolia agents`, lifting the CLI from 9/41 to 19/41 documented Agent Studio endpoints (~46%): agents providers <list|get|create|update|delete|models> agents config <get|set> API client (api/agentstudio/): - providers.go: ListProviders/GetProvider/CreateProvider/UpdateProvider/ DeleteProvider + ListProviderModels (catalog) + ListModelsForProvider (configured-provider). Provider.Input is json.RawMessage to mirror the 6-way discriminated input union (openai/azure_openai/google_genai/ deepseek/openai_compatible/anthropic) — same rationale as Agent.Config. - configuration.go: GetConfiguration/UpdateConfiguration. Body kept as json.RawMessage for forward-compat even though today's schema is a single int (maxRetentionDays). CLI (pkg/cmd/agents/{providers,config}/): - providers.go: 6 verbs in one file extending the cache-group precedent (rather than per-verb subpackages). `models` is one command with two routes — no flag hits the static catalog, --provider-id hits the configured-provider listing. - mask.go: redacts `apiKey` to "***" by default in providers list/get/ create/update success output. --show-secret opts out for scripted exports. Sets the convention for Phase 8 (`agents keys`). - config.go: `set` accepts either `--retention-days N` or `-F file.json` (mutually exclusive). File path is forward-compat for future fields. Docs + tests: - AGENTS.md: updated sub-group listing, provider client conventions, and a new "Provider secret masking" section explaining the redaction contract. - 27 client tests + 16 cmd tests + 2 e2e testscripts (providers.txtar, config.txtar) covering flag validation, dry-run output shape, masking on/off, and route picking. - Live-verified against staging EU: config get returns the live retention default, providers list returns proper paginated shape on empty, models catalog returns ~50 entries across 4+ provider types. Co-authored-by: Cursor <cursoragent@cursor.com>

Refactor only — no behaviour changes. Both the API client layer and the providers command package now mirror the OpenAPI spec's tag structure (one source file per tag), which makes it easier to add a new endpoint or verb later: a single new file plus its sibling test file. api/agentstudio/ client.go 568 LOC monolith → 193 LOC infra-only (Config/Client/NewClient/setHeaders/checkResponse/ extractDetail/sentinelFor) agents.go NEW — Agents tag (List/Get/Create/Update/Delete/ Publish/Unpublish/Duplicate/InvalidateAgentCache + doAgentMutation/doAgentLifecycle helpers) completions.go NEW — Completions tag (Completions + boolToWire) agents_test.go (renamed from mutations_test.go) absorbs GetAgent_* from get_agent_test.go (deleted) and the ListAgents happy-path tests previously in client_test.go. client_test.go now holds only the cross-cutting infra tests (NewClient validation, checkResponse error mapping, ctx cancellation, header omission). The error-mapping suite still uses ListAgents as a vehicle since it's the simplest GET shape; the file header explains why. pkg/cmd/agents/providers/ providers.go 738 LOC → 93 LOC (parent + readBody/ctxOrBackground/ relTimeOrDash helpers) list.go, get.go, create.go, update.go, delete.go, models.go NEW — one file per verb, same package so MaskInput and the helpers stay un-exported providers_test.go shrinks to the test harness (newClientForServer + writeTempJSON); each verb gets a sibling _test.go with its own scenarios. AGENTS.md gains a "File organisation" section documenting both layers' tag-aligned conventions and the explicit decision NOT to promote sub-group verbs to their own subpackages. All gates green: gofumpt, go vet, go test ./..., e2e TestAgents/{cache, try,dry-run,config,providers}, task build, golangci-lint on touched dirs. Co-authored-by: Cursor <cursoragent@cursor.com>

Adds 5 endpoints under `agents conversations <verb>`: list, get, delete, purge, export. Brings CLI parity to 24/41 documented Agent Studio endpoints (~59%, ~71% of publicly visible). Client (api/agentstudio/conversations.go): - ListConversations → typed PaginatedConversationsResponse - GetConversation → json.RawMessage (full shape has discriminated message-role union; passthrough is honest) - DeleteConversation → 204 sentinel-mapped - PurgeConversations → 204 (DELETE on the list endpoint) - ExportConversations → json.RawMessage (spec leaves shape unspecified) ListConversationsParams.FeedbackVote is *int because nil = no filter while 0 (downvote) is a meaningful value. Regression-tested. Commands (pkg/cmd/agents/conversations/, per-verb files following the providers/ pattern from the previous refactor): - list: table view + --output json passthrough; supports --start-date/--end-date/--include-feedback/--feedback-vote. --feedback-vote requires --include-feedback (backend constraint, rejected at CLI for a clearer error). - get: raw JSON passthrough (pretty-printed by default, structured via --output json), --include-feedback toggle. - delete: single-record; mirrors `agents delete` confirm contract (--confirm required non-TTY, --dry-run previews). - purge: bulk; two extra guardrails on top of the confirm contract: (a) dateless purge requires explicit --all (otherwise a typo on --start-date would silently turn a range purge into a wipe-all); (b) --all is mutually exclusive with --start-date/--end-date. - export: --output-file writes raw bytes (jq-friendly), stdout pretty-prints; --start-date/--end-date scope. All verbs take agent-id as the first positional argument, matching `agents publish/run/cache invalidate`. Tests: - 14 client unit tests (incl. *int FeedbackVote zero-value regression, feature-disabled propagation, ID validation, dateless-purge wire shape). - 16 cmd-level tests (incl. dateless-purge guardrail rejection, --all + date mutex rejection, feedback-vote-without-include rejection, dry-run URL+scope shapes). - e2e/testscripts/agents/conversations.txtar (no registration needed — TestAgents glob-runs everything in testscripts/agents/). AGENTS.md gains a "Conversations: purge vs delete" section documenting both guardrails and why they live at the cmd layer (the wire client mirrors the spec). Gates green: gofumpt, go vet, go test ./... agent dirs (16/16 packages), e2e TestAgents/{cache,try,dry-run,config,providers,conversations}, golangci-lint, task build. Co-authored-by: Cursor <cursoragent@cursor.com>

Spec-vs-reality mismatch caught by Anya's Phase 7 live vet against staging EU. OpenAPI declares both `startDate` and `endDate` on `DELETE /1/agents/{id}/conversations` as `required: false`, which reads as "dateless purge wipes everything." The live backend disagrees: dateless DELETE returns `400 "At least one filter is required"`. The previously-shipped `--all` flag was therefore non-functional — every invocation produced a request the backend always rejected. Changes: - Remove `--all` (and the dead `PurgeOptions.All` field). - Require at least one of `--start-date`/`--end-date` at the CLI; error message names the backend constraint so future spec drift stays debuggable. - Document the open-ended-bound workaround (--start-date 1970-01-01) in help text for genuine wipe-all needs. Verified live (exit 0). - Tests realigned: drop the obsolete --all/dateless-without-all cases, add Test_runPurgeCmd_RefusesDateless and Test_runPurgeCmd_AcceptsOpenEndedStart. - e2e/testscripts/agents/conversations.txtar updated to match. - AGENTS.md gains an explicit "Spec vs reality on purge" callout with the failing wire trace from the vet. Lesson noted in tmp/lab_week_3_plan.md and tmp/qa/anya_phase7_vet/ REPORT.md: trust live curl over the FastAPI OpenAPI spec, which reflects the function signature rather than the validation logic. Gates green post-fix: gofumpt, go vet, conversations unit tests, TestAgents/conversations e2e, golangci-lint on agents tree, task build. Co-authored-by: Cursor <cursoragent@cursor.com>

Adds 6 endpoints under `agents domains <verb>`: list, get, create, delete, bulk-insert, bulk-delete. Per-agent CORS allowlist; agent ID positional first arg as in conversations/cache. Client (api/agentstudio/domains.go): - Typed AllowedDomain + AllowedDomainListResponse (no opaque JSON, no pagination — list returns {domains:[]} flat). - Bulk insert wants {domains:[]string}, bulk delete wants {domainIds:[]string} as a request body (DELETE with body). Commands (pkg/cmd/agents/domains/): - list, get, create (--domain), delete (-y/--dry-run), bulk-insert (--domain repeated or -F file), bulk-delete (--domain-id repeated or -F file). Bulk verbs accept a JSON array of strings via -F for scripted use; --domain* and -F are mutually exclusive. - Live-probed against staging EU before coding; wire shapes match the spec exactly. Gates: gofumpt, vet, unit, e2e (TestAgents/domains), lint, build. Co-authored-by: Cursor <cursoragent@cursor.com>

… 8b) 5 endpoints under `agents keys <verb>`: list, get, create, update, delete. Per-app scope, optional per-agent allowlist via agentIds. Mutating verbs require an admin API key. Live-probed against staging EU: GET /1/secret-keys returns 200 with empty list, all mutating verbs return 403 {"message":"Admin API key required."} with a non-admin key (extractDetail already maps {message} envelopes — gives ErrForbidden + clear detail). Documented in command long-help. MaskInput moved from pkg/cmd/agents/providers/ to pkg/cmd/agents/shared/ since two consumers now redact secrets (provider input.apiKey, secret-key Value). Adds MaskString for top-level scalar secrets. Update semantics: - --name and --agent-id are independently set; nil pointer = leave field unchanged in the PATCH body (matches OpenAPI's anyOf-null). - Repeated --agent-id replaces (not merges) the allowlist. - --agent-id="" clears the allowlist (empty strings filtered out; if all values are empty the resulting [] still gets sent through). Output safety: - SecretKey.Value is redacted as "***" by default in list/get/ create/update output. - --show-secret opts in to raw value (e.g. for piping into a vault). Gates: gofumpt, vet, unit (api+all agents pkgs), e2e (TestAgents/keys), lint, build. Co-authored-by: Cursor <cursoragent@cursor.com>

Two single-tag sub-groups: agents feedback create POST /1/feedback agents user-data get GET /1/user-data/{user_token} agents user-data delete DELETE /1/user-data/{user_token} Live-probed against staging EU. Wire shapes match the spec; the feedback endpoint returns 404 with {message:...} envelope for unknown message IDs (already mapped via extractDetail). Notes: - feedback validates locally: vote ∈ {0,1}, tags ≤10 each ≤50 chars, notes ≤1000 chars (matches OpenAPI bounds; saves a round-trip on common mistakes). - user-data is the GDPR right-to-access (get) and right-to-be- forgotten (delete) endpoints; delete is irreversible and erases across every agent in the app, so the prompt is explicit and --confirm is enforced in non-interactive shells. - GetUserData returns the full envelope with []json.RawMessage inner items (conversations have a discriminated message-role union, memories have an unspecified shape — same convention as GetConversation in Phase 7). Gates: gofumpt, vet, unit (api+all agents pkgs), e2e (TestAgents/ feedback + TestAgents/userdata), lint, build. Co-authored-by: Cursor <cursoragent@cursor.com>

Closes 100% of the Agent Studio OpenAPI surface (41/41 endpoints). New top-level group (parent hidden, children visible): agents internal status GET /status agents internal memorize POST /1/agents/agents/{id}/memorize agents internal ponder POST /1/agents/agents/{id}/ponder agents internal consolidate POST /1/agents/agents/{id}/consolidate New providers verb: agents providers defaults GET /1/providers/models/defaults Path quirk verified live: memorize/ponder/consolidate live under the doubled prefix /1/agents/agents/ (the spec is correct, not a typo). The single-prefix equivalents 404 with FastAPI's standard {detail: "Not Found"} envelope; the doubled prefix routes to the actual handler. Pinned in the godoc on api/agentstudio/internal.go. Hidden semantics: parent has Hidden:true so it doesn't pollute `agents --help`, but children are visible — running `agents internal --help` still surfaces what's available to anyone who knows to look. Pinned by a unit test. Memory verbs (memorize/ponder/consolidate) take the agent ID positional and a JSON body via --body or -F ("-" for stdin), mirroring the agents try input contract. Body shapes are evolving (messages is a v4/v5 discriminated role union per the spec) so the client passes the body through as-is and returns the response as json.RawMessage — no premature typing. defaults: small read returning {provider_type → recommended_model}. Sibling of `agents providers models` — different question, different shape, didn't fit as a flag on the existing models verb. Gates: gofumpt, vet, unit (api+all agents pkgs), e2e (TestAgents/*), lint, build. Co-authored-by: Cursor <cursoragent@cursor.com>

…ya F-1) Anya's Phase 8 vet caught an asymmetric 404 from the staging gateway: `agents user-data get` and `delete` against any token containing "/" return `agent studio: 404 Not Found: Not Found` while every other unknown token (including ones with +, ?, #, &, =) returns a clean 200 empty / 204. Root cause is server-side — the gateway decodes %2F before routing so the path no longer matches /1/user-data/{token}. The CLI does the right thing with url.PathEscape; the symptom is a misleading 404 that an operator processing a GDPR deletion request would reasonably read as "data was never stored." Fix: refuse "/"-bearing tokens at the CLI layer with a precise error message that names the gateway behavior, on both verbs. One shared const, two two-line guards. No behavior change for tokens without "/". Live-verified: `./algolia agents user-data get "ab/cd"` now prints the explicit message instead of "404 Not Found." Gates: gofumpt, vet, unit (api+all agents pkgs), e2e (TestAgents/userdata), lint, build. Co-authored-by: Cursor <cursoragent@cursor.com>

Three small follow-ups to PR algolia#212. * `agents try` / `agents run`: TTY-attached stdout now renders a human-readable transcript (text-deltas inline, tool calls/results as dim annotations, errors red) instead of NDJSON. Non-TTY paths keep emitting NDJSON unchanged so existing pipelines (jq, log capture, scripts) are not broken. New `--ndjson` flag forces NDJSON on a TTY for users who want machine output on screen. * `agents domains bulk-delete`: now pre-fetches the existing list (TTY-only, one extra GET) so the success line can split requested IDs into removed-vs-already-absent. Non-TTY behavior unchanged. Backend returns 204 with no body, so this is the only signal we can give users about whether their delete actually did anything. * `agents keys list`: lock the table-render mask path with two new tests (default masks, --show-secret reveals). Existing tests only covered --output json — the table writes k.Value verbatim, so the mask must apply before the row is added. Co-authored-by: Cursor <cursoragent@cursor.com>

…ents Pulls the long-form Agent Studio rationale (file layout, auth, host resolution, streaming protocols, dry-run semantics, secret masking, backend gotchas) out of multi-paragraph in-source comments and out of AGENTS.md, into a single reference at docs/agents.md. Source files now keep one-line godoc comments only; gotchas that genuinely have to live next to the code (purge requires a date, slash in user-tokens, doubled memory-op path, memory polarity, ACL on configuration) are kept as 1-2 line hints with a pointer to docs/agents.md. AGENTS.md drops the agent-studio sections and links to the doc. Net: -945 / +219 lines across 25 files, no behavior change. Docs generator + full test sweep + gofumpt all clean. Co-authored-by: Cursor <cursoragent@cursor.com>

Five low-risk DRY passes across `pkg/cmd/agents/`: - `newClientForServer` × 14 test files → `sharedtest.NewClient` (new `pkg/cmd/agents/sharedtest/` package, kept out of production imports). `writeTempJSON` × 6 → `sharedtest.WriteTempJSON`. - `ctxOrBackground` × 7 sub-group files → `shared.OrBackground` and inline `if ctx == nil { ctx = context.Background() }` blocks in delete/cache/try/run. - `--confirm` / "non-TTY refuses without -y" pattern × 9 commands → `shared.AddConfirmFlag` + `shared.ResolveConfirm` + `shared.Confirm`. - File-reading helpers (`readBody`, `readJSONBody`, inline `cmdutil.ReadFile + TrimUTF8BOM + json.Valid` in config/update/create/domains) → `shared.ReadJSONFile`. - Completion-flag block (8 flags + the input/message mutex) duplicated in `try` and `run` → `shared.CompletionInputs` + `shared.RegisterCompletionFlags`. Net: 102 files changed, +587 / -770. Behavior unchanged; full suite green, golangci-lint clean. Co-authored-by: Cursor <cursoragent@cursor.com>

Allow common provider flows without -F: --name/--provider plus --api-key, --api-key-stdin, or --api-key-env for create (openai, anthropic, google_genai, deepseek); patches via --name, API key flags, and optional --base-url for update. Azure and openai_compatible still require JSON files. Keeps file and shortcut paths mutually exclusive. Updates agents e2e providers.txtar for the relaxed -F requirement. Co-authored-by: Cursor <cursoragent@cursor.com>

Adds docs/agents.md section on providers -F vs flags; links to docs/qa/arg_friendly_providers_SIGNOFF.md (Lab Week planning-team alignment + ordered test commands). Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

codacy-production · 2026-05-07T12:22:29Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 1479 complexity · 936 duplication

Metric Results

Complexity 1479

Duplication 936

View in Codacy

_{TIP This summary will be updated as you push new changes.}

Co-authored-by: Cursor <cursoragent@cursor.com>

tmp is cleaned by Taskfile and may contain local QA Go packages outside review; use issues.exclude-dirs instead of deprecated run.skip-dirs. Co-authored-by: Cursor <cursoragent@cursor.com>

cmarguta-alg · 2026-05-07T12:37:42Z

Retargeting stacked work: the arg-friendly provider flags (and follow-up lint/docs) now live in a fork PR with base lab_week_3 → head feat/agents-providers-arg-friendly, so the upstream diff is only those commits — not the whole Agent Studio branch vs main. Closing this PR; use the fork link in the next comment once opened.

cmarguta-alg · 2026-05-07T12:37:57Z

Continued as fork PR: cmarguta-alg#4 (lab_week_3 ← feat/agents-providers-arg-friendly).

cmarguta-alg and others added 22 commits May 6, 2026 15:23

docs: provider flag ergonomics + QA sign-off sheet for Anya

750f1d2

Adds docs/agents.md section on providers -F vs flags; links to docs/qa/arg_friendly_providers_SIGNOFF.md (Lab Week planning-team alignment + ordered test commands). Co-authored-by: Cursor <cursoragent@cursor.com>

docs(qa): keep Anya's vet report local under tmp/, not committed

ee6e906

Co-authored-by: Cursor <cursoragent@cursor.com>

cmarguta-alg mentioned this pull request May 7, 2026

feat(agents): providers create/update flag shortcuts (draft) cmarguta-alg/cli#3

Closed

cmarguta-alg and others added 2 commits May 7, 2026 15:25

style: gofumpt providers flag helpers and tests

abc737d

Co-authored-by: Cursor <cursoragent@cursor.com>

chore(lint): exclude tmp from golangci paths

ea5d644

tmp is cleaned by Taskfile and may contain local QA Go packages outside review; use issues.exclude-dirs instead of deprecated run.skip-dirs. Co-authored-by: Cursor <cursoragent@cursor.com>

cmarguta-alg mentioned this pull request May 7, 2026

feat(agents): providers create/update CLI flags (stacked on lab_week_3) cmarguta-alg/cli#4

Open

cmarguta-alg closed this May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agents): Agent Studio command tree (+ provider CLI flags)#214

feat(agents): Agent Studio command tree (+ provider CLI flags)#214
cmarguta-alg wants to merge 24 commits intoalgolia:mainfrom
cmarguta-alg:feat/agents-providers-arg-friendly

cmarguta-alg commented May 7, 2026

Uh oh!

codacy-production Bot commented May 7, 2026 •

edited

Loading

Uh oh!

cmarguta-alg commented May 7, 2026

Uh oh!

cmarguta-alg commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cmarguta-alg commented May 7, 2026

Summary

Notes

Test plan

Uh oh!

codacy-production Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Up to standards ✅

Uh oh!

cmarguta-alg commented May 7, 2026

Uh oh!

cmarguta-alg commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codacy-production Bot commented May 7, 2026 •

edited

Loading