feat(agents): Agent Studio command tree (+ provider CLI flags)#214
Closed
cmarguta-alg wants to merge 24 commits intoalgolia:mainfrom
Closed
feat(agents): Agent Studio command tree (+ provider CLI flags)#214cmarguta-alg wants to merge 24 commits intoalgolia:mainfrom
cmarguta-alg wants to merge 24 commits intoalgolia:mainfrom
Conversation
Introduces a stand-alone, hand-written client in api/agentstudio/ that
later cmd/agent-studio/* commands will sit on top of.
- host.go: resolve base URL from --agent-studio-url override, region
(eu/us, staging only in eu), or {appID}.algolia.net/agent-studio
cluster-proxy fallback.
- types.go: Go structs for Agent and PaginatedAgentsResponse, matching
the camelCase wire format produced by the conversational-ai backend.
- errors.go: structured APIError + sentinels (ErrUnauthorized,
ErrForbidden, ErrFeatureDisabled, ErrNotFound, ErrServer) so callers
can branch without parsing strings.
- client.go: ListAgents end-to-end, with X-Algolia-Application-Id /
X-Algolia-API-Key / X-Algolia-User-ID headers and JSON-error parsing.
Tests cover region+env+override+fallback resolution, request/response
round-trips, pagination params, context cancellation, and every
documented HTTP error mapping. Verified live against a beta app on the
deployed conversational-ai backend; no factory wiring yet.
A TODO marks future replacement with a generated client once the
backend publishes one from specs/agent-studio/spec.yml.
Co-authored-by: Cursor <cursoragent@cursor.com>
Wires a new agentstudio client into the factory and exposes the first
two read-only commands. `list` supports --page, --per-page,
--provider-id, and --output (table on TTY, JSON for piping).
`get <id>` defaults to JSON.
Host resolution uses, in order:
1. ALGOLIA_AGENT_STUDIO_URL env var, or the profile's
"agent_studio_url" field
2. Build-time agentstudio.DefaultBaseURL (ldflag-injected from
.env's ALGOLIA_AGENT_STUDIO_URL — internal beta builds set
this to https://agent-studio.staging.eu.algolia.com)
3. Cluster-proxy fallback https://<appID>.algolia.net/agent-studio
(recommended for production end-users — the application's own
cluster routes the request to the right region, per the
conversational-ai README)
Verified live against the beta backend (staging EU) using only `task
build` defaults; fixes a regression where 422 validation errors were
surfaced with the generic "Input is invalid, see detail/body:" prefix
instead of the structured FastAPI detail.msg — extractDetail now
prefers structured detail[] over the generic message field.
Co-authored-by: Cursor <cursoragent@cursor.com>
Six new mutating commands under `algolia agents`, completing parity
with the conversational-ai backend's `/v1/agents/*` surface.
Body for `create`/`update` is passed as opaque `json.RawMessage`. The
AgentConfigCreate schema is large, deeply validated, and evolves
frequently — the CLI is a pass-through; the backend validates; our
422-detail surfacing (Phase 2 fix) makes errors actionable. Mirroring
the schema in Go would lie about parity and force a CLI release on
every backend change.
`--dry-run` shows the resolved request body in human mode (more
useful than the existing `objects/update` count-only summary), and
emits a structured `{action, request, source, bytes, body, dryRun}`
envelope only when `--output` is *explicitly* set — `WithDefaultOutput
("json")` from the success path no longer steals the human dry-run
path silently.
`delete` follows `objects/delete` exactly for the confirmation
contract: `-y/--confirm` skips the prompt; non-TTY without `--confirm`
refuses with `cmdutil.FlagErrorf`; `--dry-run` GETs the agent and
prints "would DELETE" without calling DELETE. Pre-fetching lets us
show name + status both at the prompt and in dry-run output.
DELETE handles 204 No Content correctly (no body decode); the other
five mutating endpoints share `doAgentMutation` (POST/PATCH + JSON
Agent response).
Live-verified end-to-end against staging EU: 13/13 checks pass —
real create, get, update (rename), duplicate, delete; publish
correctly surfaces backend `409 Cannot publish an agent without a
model or provider` for a model-less smoke agent.
Co-authored-by: Cursor <cursoragent@cursor.com>
Two new commands close the developer loop on the agents tree:
`agents test` for iterating on an in-memory agent configuration
without persisting it (backend special-cases agent_id="test"), and
`agents run <id>` for invoking a persisted agent — the same call
shape downstream apps make in production.
The wire format is not standard SSE. Recon caught this before code
was written: v5 IS standard SSE (`data: <json>\n\n` with a `[DONE]`
sentinel), but v4 is a Vercel-AI bespoke protocol — `<type>:<json>\n`
per line, no terminator, with 16 single-character type codes mapped
to names (`0`=text, `9`=tool-call, `d`=finish-message, …). The
`compatibilityMode` query parameter is mandatory server-side with
no default; the CLI defaults to v5 (standard, easier to parse
defensively) and exposes `--compatibility v4|v5`. The new
`api/agentstudio/sse.go` parser sniffs the line prefix and handles
both formats transparently — same approach the backend's own
acceptance test helpers use.
`client.Completions(...)` returns the raw `*http.Response` so the
caller decides streaming vs buffered based on `Content-Type`. One
method instead of two avoids duplicating request building for one
endpoint. The error path closes the body explicitly (the existing
`checkResponse` only drains a 64 KiB prefix).
Streaming output is NDJSON regardless of TTY — one parsed event per
line as `{"type":"...","data":{...}}`. Predictable; pipes cleanly
into `jq -r 'select(.type=="text-delta") | .data.delta'` for the
"just show me the text" use case. Rich TTY rendering (incremental
token print + dim tool-calls) is deferred — not worth growing this
PR another ~150 LOC for a UX nicety.
`--no-stream` returns the buffered JSON response verbatim.
`--dry-run` prints the resolved body and skips the HTTP call.
`--message` is a one-shot convenience that wraps as
`[{"role":"user","content":"<text>"}]`; `--input` reads a messages
JSON array from a file or `-` (stdin); cobra's
`MarkFlagsMutuallyExclusive` enforces exactly one.
`signal.NotifyContext(SIGINT)` is wired locally inside `test/run`,
not in `root.go`. Streaming commands are the only ones that need
mid-call cancellation; touching the global root context risked
silently regressing every other command's signal behavior.
Backend's discriminated `messages` union is pre-validated client-
side with a cheap `[]any` shape check so the obvious "passed an
object instead of an array" case names the file in the error
instead of producing a 422.
Live-verified end-to-end against staging EU: 11/11 checks pass.
The test app has no LLM provider configured, so a real completion
can't run — but the 422s we get back ("Field required" from the
test endpoint, "Agent is not published" from `agents run` against
the existing draft agent) confirm URL routing, query params,
headers, body framing, and 422-detail extraction all land where
intended. The streaming parser path is unit-tested against a
synthetic SSE source (8 cases covering both wire formats, malformed
lines, large frames, [DONE], onEvent error propagation).
Other pieces:
- `NormalizeCompatibility` lives in `pkg/cmd/agents/shared/`
(extracted on second use, same rule as `printDryRun` in Phase 3).
- `BuildMessages` / `ReadJSONFile` / `MarshalCompletionBody` /
`RenderCompletion` shared between the two commands; all unit-
tested in isolation.
Co-authored-by: Cursor <cursoragent@cursor.com>
…idation order
Three pieces that round out the agents tree for review.
AGENTS.md grew an "Agent Studio" section so the next contributor
doesn't replay the recon: architecture, auth model, base-URL
resolution priority chain, SSE wire format gotchas (v4 IS NOT
standard SSE; `compatibilityMode` is mandatory; CLI defaults to
v5), dry-run convention (`cmd.Flags().Changed("output")` gating,
not `PrintFlags.HasStructuredOutput()`), and a brief on why no
bespoke telemetry events were added.
Two new e2e testscripts:
- testscripts/agents/dry-run.txtar (ungated): the contract test
that the dry-run paths still produce stable output through the
full cobra tree under the e2e binary. Exercises create/update
structured + human modes, test with --input/--message, run with
the agent id in the path, and the flag-validation guards.
Covers the create/update/test/run dry-run paths in one file
(delete dry-run is exercised in unit tests; it doesn't fit the
same shape because it pre-fetches the agent and so still needs
network).
- testscripts/agents/list.txtar (env-gated on
ALGOLIA_AGENT_STUDIO_E2E=1): read-only smoke confirming the
live backend still emits the wire format we parse against.
Gated because the standard e2e test app doesn't necessarily
have the gen_ai feature flag enabled — running this against
the wrong app would 403 and fail the suite.
To make the env-gating possible, registered an `envCondition`
in e2e_test.go that exposes `[env:VAR]` and `[!env:VAR]` to all
testscripts. The cli/go-internal testscript framework only ships
GOOS/GOARCH conditions out of the box; this is a one-liner that
treats "" and "0" as falsy (matches how callers actually set
ALGOLIA_AGENT_STUDIO_E2E=1).
Behavior fix discovered while writing dry-run.txtar:
`--compatibility` was validated AFTER the dry-run short-circuit
in `agents test` and `agents run`, so `--compatibility v9
--dry-run` silently produced a "valid-looking" preview for an
invalid request. Validation now runs first; same rationale as
rejecting --input + --message together at flag-parse time. The
unit test that asserted "invalid value rejected" had been
running in non-dry-run mode and thus already covered the new
behavior; no test churn needed.
Telemetry deliberately NOT added. The existing `Command Invoked`
event from `root.PersistentPreRunE` carries
`{command: cmd.CommandPath(), flags: [<changed flag names>]}`
which already attributes per verb (`algolia agents create`) and
per dry-run (flags contains "dry-run"). Adding bespoke
`agents.<verb>.{start,success,error,dry_run}` events would be
the first in-command telemetry calls in the entire codebase,
diverging from convention for one feature only. Outcome
(success/error) is a separate, all-commands refactor.
Docs generator (`go run ./cmd/docs --app_data-path <dir>`)
verified to serialize all 10 agents commands cleanly with the
expected frontmatter / examples / usage strings. Output is
consumed downstream by the docs site and is intentionally not
committed.
Live-CRUD testscript (create→get→update→delete chained on the
same id) deferred — it needs an id-extraction helper in the
testscript framework that we don't have yet. The shell smokes in
tmp/qa/ already exercise that path against staging.
Co-authored-by: Cursor <cursoragent@cursor.com>
User caught a real terminological collision in code review.
The brief said "tools to test and debug their Agent Studio setups
with --dry-run". I had read that as a literal flag spec on every
relevant command. Cross-checking the backend disproved that read:
- Strings `dry_run`/`dryRun`/`dry-run` in
`algolia/conversational-ai@development`: zero hits.
- The closest backend primitive is `AgentTestConfiguration` +
the `agent_id="test"` route — a real completion against an
in-memory configuration, nothing persisted.
That route is what `agents test` (now renamed `agents try`) wraps.
The whole command IS the conversational-ai-side "dry-run" — running
unsaved config against the real backend. So `agents test --dry-run`
read literally as "dry-run a dry-run", which is the awkward case
the user pointed at.
Resolution:
1. Rename `agents test` → `agents try`. "Try" is single-word
(preserves the CLI-wide convention — zero hyphenated subcommands
exist anywhere in the tree), reads as "experiment without
commitment", and avoids the pytest/unit-test connotations of
"test" that don't apply here. `try` is registered in
`agents.go` via the `trycmd` import alias to avoid clashing
with Go's `try` keyword candidate / common variable name.
2. Drop the `--dry-run` flag (and its branch in `runTryCmd`) from
the renamed command. There's nothing useful to "preview without
HTTP" — the command is already side-effect-free. Dropped the
corresponding unit test and replaced it with a comment block
explaining why the test no longer exists.
3. Keep `--dry-run` on `create / update / delete / run`. That
flag is the standard CLI preview convention (matches
`objects/update --dry-run`); it lives on commands that DO have
side effects (`create`/`update`/`delete` mutate state, `run`
invokes a persisted agent which logs conversations + analytics
+ tokens). No collision with the conversational-ai concept on
any of those.
4. Updated `e2e/testscripts/agents/dry-run.txtar`: removed the
`agents test --dry-run` cases, added a regression assertion
that `agents try --dry-run` is rejected as `unknown flag` so
a future contributor doesn't re-add it without seeing this
rationale. New `try.txtar` covers the four flag-validation
guards on `agents try` (missing --config, missing --input/-m,
--input/-m together, invalid --compatibility) — none of which
need network, all fire before client construction.
5. AGENTS.md grew an "On `--dry-run`" section spelling out the
two distinct concepts and why they must stay separate. Future
contributors see this before touching the agents tree.
What I got wrong originally: I read the brief's "--dry-run"
literally instead of as shorthand for "the test/debug capability
the spec describes". Once `agents try` exists, the brief's intent
is satisfied by the command itself; layering a CLI-convention
`--dry-run` on top of it was duplicating the deliverable. The
unit tests I'd written for `agents test --dry-run` had been
passing because the dry-run branch was correctly implemented —
not because it was the right thing to implement. Pattern noted:
when a feature already satisfies an intent, double-check before
adding a flag that "completes" it.
Net diff: +28 / -378. The body-preview semantics are still
available everywhere they make sense; only the redundant
collision case is gone.
Co-authored-by: Cursor <cursoragent@cursor.com>
Closes the four completion query-param gaps and the cache-invalidate
endpoint flagged by the parity audit. Smallest delta of the planned
phases — chosen to land first because it validates the per-phase-PR
rhythm without introducing brand-new architectural surface.
What's in:
- `agents try` and `agents run` learn `--no-cache`, `--no-memory`,
`--no-analytics`, and `--secure-user-token <jwt>`. Maps onto the
backend's documented completion query params and the
X-Algolia-Secure-User-Token header.
- New `agents cache invalidate <agent-id> [--before YYYY-MM-DD]`
wraps `DELETE /1/agents/{id}/cache`. Mirrors `agents delete`'s
confirmation contract: TTY prompts, non-TTY refuses without
`--confirm`, `--dry-run` bypasses both since it's non-destructive.
Polarity calls (worth knowing for future contributors):
- The CompletionOptions.No* fields are inverted from the wire on
purpose. Backend defaults all three to true; only the negative is
interesting to expose at the CLI surface, and `memory=true` would
actually 422 (the schema is `anyOf [{const false}, {type null}]`).
So the wire form omits the param when the No* field is false and
sends `<param>=false` when true. Polarity is pinned end-to-end by
a table-driven wire-mapping test in completions_test.go and
smaller cmd-level guards in try/run.
- Date validation on `--before` is deliberately the backend's job.
Mirroring Pydantic's date parser in Go would create silent skew
on minor backend bumps; the 422-detail surfacing already turns
bad input into an actionable message verbatim.
Architecture: introduces the first nested cobra group under `agents`
(`agents cache <verb>`). Phase 6+ reuse this for providers /
conversations / keys / domains. Per-verb sub-package wasn't worth it
for one verb; if `cache stats` etc. land later, split then.
Refactor on second use deferred deliberately: 4 new flags duplicate
mechanically across try.go and run.go (8 lines per command). Could
be extracted into a `RegisterCompletionFlags` helper, but try and run
are exactly two consumers and the duplication is straight-line. If a
third consumer appears, extract following the rule we've used since
Phase 3 (`PrintDryRun`, `NormalizeCompatibility`).
Coverage:
- `api/agentstudio/`: 1 new client method (`InvalidateAgentCache`),
extended `CompletionOptions`, table-driven wire-mapping test
covering all four flag combinations + the secure-user-token
header, four new cache-endpoint tests (no-before / with-before /
404→ErrNotFound / structured-422-detail surfaces verbatim).
- `pkg/cmd/agents/cache/`: new package, 6 unit tests covering happy
path, before-flag, dry-run with and without `--before`, missing
agent-id, non-TTY-without-confirm, and 404 propagation.
- `pkg/cmd/agents/try/` and `run/`: one targeted end-to-end test
each, asserting cobra→opts→client wiring is intact for all four
new flags (catches polarity transposition, missed forwarding).
- e2e: new `cache.txtar` (5 contract checks); `dry-run.txtar`
extended with a regression that the new `--no-*` flags are
wire-only and don't leak into the dry-run body preview.
Live verification deferred. Staging test app still has no LLM
provider configured, so a green completion isn't possible — but
cache invalidation doesn't depend on a provider, and the unit + e2e
coverage is exhaustive for the wire shapes. A future Phase 6 commit
that lands provider CRUD will revive the live-completion smoke.
Strategy: per-phase PR from here on. This branch
(`lab_week_3_phase5`) is cut from `lab_week_3` HEAD; the PR will
target `cmarguta-alg:lab_week_3` so the diff is bounded to Phase 5
only. When PR algolia#212 merges upstream, this PR's base re-targets to
`algolia:main`.
Net diff: +390 / −21 across 10 files, 2 new files (cache package,
cache.txtar). 13/13 agents packages green, all 4 e2e testscripts
pass (try / cache / dry-run / list-skipped-as-gated), gofumpt
clean, golangci-lint clean.
Co-authored-by: Cursor <cursoragent@cursor.com>
Adds two new nested command groups under `algolia agents`, lifting the
CLI from 9/41 to 19/41 documented Agent Studio endpoints (~46%):
agents providers <list|get|create|update|delete|models>
agents config <get|set>
API client (api/agentstudio/):
- providers.go: ListProviders/GetProvider/CreateProvider/UpdateProvider/
DeleteProvider + ListProviderModels (catalog) + ListModelsForProvider
(configured-provider). Provider.Input is json.RawMessage to mirror
the 6-way discriminated input union (openai/azure_openai/google_genai/
deepseek/openai_compatible/anthropic) — same rationale as Agent.Config.
- configuration.go: GetConfiguration/UpdateConfiguration. Body kept as
json.RawMessage for forward-compat even though today's schema is a
single int (maxRetentionDays).
CLI (pkg/cmd/agents/{providers,config}/):
- providers.go: 6 verbs in one file extending the cache-group precedent
(rather than per-verb subpackages). `models` is one command with two
routes — no flag hits the static catalog, --provider-id hits the
configured-provider listing.
- mask.go: redacts `apiKey` to "***" by default in providers list/get/
create/update success output. --show-secret opts out for scripted
exports. Sets the convention for Phase 8 (`agents keys`).
- config.go: `set` accepts either `--retention-days N` or `-F file.json`
(mutually exclusive). File path is forward-compat for future fields.
Docs + tests:
- AGENTS.md: updated sub-group listing, provider client conventions,
and a new "Provider secret masking" section explaining the redaction
contract.
- 27 client tests + 16 cmd tests + 2 e2e testscripts (providers.txtar,
config.txtar) covering flag validation, dry-run output shape, masking
on/off, and route picking.
- Live-verified against staging EU: config get returns the live
retention default, providers list returns proper paginated shape on
empty, models catalog returns ~50 entries across 4+ provider types.
Co-authored-by: Cursor <cursoragent@cursor.com>
Refactor only — no behaviour changes. Both the API client layer and the
providers command package now mirror the OpenAPI spec's tag structure
(one source file per tag), which makes it easier to add a new endpoint
or verb later: a single new file plus its sibling test file.
api/agentstudio/
client.go 568 LOC monolith → 193 LOC infra-only
(Config/Client/NewClient/setHeaders/checkResponse/
extractDetail/sentinelFor)
agents.go NEW — Agents tag (List/Get/Create/Update/Delete/
Publish/Unpublish/Duplicate/InvalidateAgentCache +
doAgentMutation/doAgentLifecycle helpers)
completions.go NEW — Completions tag (Completions + boolToWire)
agents_test.go (renamed from mutations_test.go) absorbs GetAgent_*
from get_agent_test.go (deleted) and the ListAgents
happy-path tests previously in client_test.go.
client_test.go now holds only the cross-cutting infra tests
(NewClient validation, checkResponse error mapping,
ctx cancellation, header omission). The error-mapping
suite still uses ListAgents as a vehicle since it's
the simplest GET shape; the file header explains why.
pkg/cmd/agents/providers/
providers.go 738 LOC → 93 LOC (parent + readBody/ctxOrBackground/
relTimeOrDash helpers)
list.go, get.go, create.go, update.go, delete.go, models.go
NEW — one file per verb, same package so MaskInput
and the helpers stay un-exported
providers_test.go shrinks to the test harness (newClientForServer +
writeTempJSON); each verb gets a sibling _test.go
with its own scenarios.
AGENTS.md gains a "File organisation" section documenting both layers'
tag-aligned conventions and the explicit decision NOT to promote
sub-group verbs to their own subpackages.
All gates green: gofumpt, go vet, go test ./..., e2e TestAgents/{cache,
try,dry-run,config,providers}, task build, golangci-lint on touched
dirs.
Co-authored-by: Cursor <cursoragent@cursor.com>
Adds 5 endpoints under `agents conversations <verb>`: list, get, delete,
purge, export. Brings CLI parity to 24/41 documented Agent Studio
endpoints (~59%, ~71% of publicly visible).
Client (api/agentstudio/conversations.go):
- ListConversations → typed PaginatedConversationsResponse
- GetConversation → json.RawMessage (full shape has discriminated
message-role union; passthrough is honest)
- DeleteConversation → 204 sentinel-mapped
- PurgeConversations → 204 (DELETE on the list endpoint)
- ExportConversations → json.RawMessage (spec leaves shape unspecified)
ListConversationsParams.FeedbackVote is *int because nil = no filter
while 0 (downvote) is a meaningful value. Regression-tested.
Commands (pkg/cmd/agents/conversations/, per-verb files following the
providers/ pattern from the previous refactor):
- list: table view + --output json passthrough; supports
--start-date/--end-date/--include-feedback/--feedback-vote.
--feedback-vote requires --include-feedback (backend
constraint, rejected at CLI for a clearer error).
- get: raw JSON passthrough (pretty-printed by default,
structured via --output json), --include-feedback toggle.
- delete: single-record; mirrors `agents delete` confirm contract
(--confirm required non-TTY, --dry-run previews).
- purge: bulk; two extra guardrails on top of the confirm contract:
(a) dateless purge requires explicit --all (otherwise a
typo on --start-date would silently turn a range purge
into a wipe-all);
(b) --all is mutually exclusive with --start-date/--end-date.
- export: --output-file writes raw bytes (jq-friendly), stdout
pretty-prints; --start-date/--end-date scope.
All verbs take agent-id as the first positional argument, matching
`agents publish/run/cache invalidate`.
Tests:
- 14 client unit tests (incl. *int FeedbackVote zero-value regression,
feature-disabled propagation, ID validation, dateless-purge wire
shape).
- 16 cmd-level tests (incl. dateless-purge guardrail rejection,
--all + date mutex rejection, feedback-vote-without-include
rejection, dry-run URL+scope shapes).
- e2e/testscripts/agents/conversations.txtar (no registration needed
— TestAgents glob-runs everything in testscripts/agents/).
AGENTS.md gains a "Conversations: purge vs delete" section documenting
both guardrails and why they live at the cmd layer (the wire client
mirrors the spec).
Gates green: gofumpt, go vet, go test ./... agent dirs (16/16 packages),
e2e TestAgents/{cache,try,dry-run,config,providers,conversations},
golangci-lint, task build.
Co-authored-by: Cursor <cursoragent@cursor.com>
Spec-vs-reality mismatch caught by Anya's Phase 7 live vet against
staging EU.
OpenAPI declares both `startDate` and `endDate` on
`DELETE /1/agents/{id}/conversations` as `required: false`, which
reads as "dateless purge wipes everything." The live backend
disagrees: dateless DELETE returns
`400 "At least one filter is required"`.
The previously-shipped `--all` flag was therefore non-functional —
every invocation produced a request the backend always rejected.
Changes:
- Remove `--all` (and the dead `PurgeOptions.All` field).
- Require at least one of `--start-date`/`--end-date` at the CLI;
error message names the backend constraint so future spec drift
stays debuggable.
- Document the open-ended-bound workaround (--start-date 1970-01-01)
in help text for genuine wipe-all needs. Verified live (exit 0).
- Tests realigned: drop the obsolete --all/dateless-without-all
cases, add Test_runPurgeCmd_RefusesDateless and
Test_runPurgeCmd_AcceptsOpenEndedStart.
- e2e/testscripts/agents/conversations.txtar updated to match.
- AGENTS.md gains an explicit "Spec vs reality on purge" callout
with the failing wire trace from the vet.
Lesson noted in tmp/lab_week_3_plan.md and tmp/qa/anya_phase7_vet/
REPORT.md: trust live curl over the FastAPI OpenAPI spec, which
reflects the function signature rather than the validation logic.
Gates green post-fix: gofumpt, go vet, conversations unit tests,
TestAgents/conversations e2e, golangci-lint on agents tree,
task build.
Co-authored-by: Cursor <cursoragent@cursor.com>
Adds 6 endpoints under `agents domains <verb>`: list, get, create,
delete, bulk-insert, bulk-delete. Per-agent CORS allowlist; agent ID
positional first arg as in conversations/cache.
Client (api/agentstudio/domains.go):
- Typed AllowedDomain + AllowedDomainListResponse (no opaque JSON,
no pagination — list returns {domains:[]} flat).
- Bulk insert wants {domains:[]string}, bulk delete wants
{domainIds:[]string} as a request body (DELETE with body).
Commands (pkg/cmd/agents/domains/):
- list, get, create (--domain), delete (-y/--dry-run),
bulk-insert (--domain repeated or -F file), bulk-delete
(--domain-id repeated or -F file). Bulk verbs accept a JSON array
of strings via -F for scripted use; --domain* and -F are mutually
exclusive.
- Live-probed against staging EU before coding; wire shapes match
the spec exactly.
Gates: gofumpt, vet, unit, e2e (TestAgents/domains), lint, build.
Co-authored-by: Cursor <cursoragent@cursor.com>
… 8b)
5 endpoints under `agents keys <verb>`: list, get, create, update,
delete. Per-app scope, optional per-agent allowlist via agentIds.
Mutating verbs require an admin API key. Live-probed against staging
EU: GET /1/secret-keys returns 200 with empty list, all mutating
verbs return 403 {"message":"Admin API key required."} with a
non-admin key (extractDetail already maps {message} envelopes — gives
ErrForbidden + clear detail). Documented in command long-help.
MaskInput moved from pkg/cmd/agents/providers/ to pkg/cmd/agents/shared/
since two consumers now redact secrets (provider input.apiKey,
secret-key Value). Adds MaskString for top-level scalar secrets.
Update semantics:
- --name and --agent-id are independently set; nil pointer = leave
field unchanged in the PATCH body (matches OpenAPI's anyOf-null).
- Repeated --agent-id replaces (not merges) the allowlist.
- --agent-id="" clears the allowlist (empty strings filtered out;
if all values are empty the resulting [] still gets sent through).
Output safety:
- SecretKey.Value is redacted as "***" by default in list/get/
create/update output.
- --show-secret opts in to raw value (e.g. for piping into a vault).
Gates: gofumpt, vet, unit (api+all agents pkgs), e2e (TestAgents/keys),
lint, build.
Co-authored-by: Cursor <cursoragent@cursor.com>
Two single-tag sub-groups:
agents feedback create POST /1/feedback
agents user-data get GET /1/user-data/{user_token}
agents user-data delete DELETE /1/user-data/{user_token}
Live-probed against staging EU. Wire shapes match the spec; the
feedback endpoint returns 404 with {message:...} envelope for unknown
message IDs (already mapped via extractDetail).
Notes:
- feedback validates locally: vote ∈ {0,1}, tags ≤10 each ≤50
chars, notes ≤1000 chars (matches OpenAPI bounds; saves a
round-trip on common mistakes).
- user-data is the GDPR right-to-access (get) and right-to-be-
forgotten (delete) endpoints; delete is irreversible and erases
across every agent in the app, so the prompt is explicit and
--confirm is enforced in non-interactive shells.
- GetUserData returns the full envelope with []json.RawMessage
inner items (conversations have a discriminated message-role
union, memories have an unspecified shape — same convention as
GetConversation in Phase 7).
Gates: gofumpt, vet, unit (api+all agents pkgs), e2e (TestAgents/
feedback + TestAgents/userdata), lint, build.
Co-authored-by: Cursor <cursoragent@cursor.com>
Closes 100% of the Agent Studio OpenAPI surface (41/41 endpoints).
New top-level group (parent hidden, children visible):
agents internal status GET /status
agents internal memorize POST /1/agents/agents/{id}/memorize
agents internal ponder POST /1/agents/agents/{id}/ponder
agents internal consolidate POST /1/agents/agents/{id}/consolidate
New providers verb:
agents providers defaults GET /1/providers/models/defaults
Path quirk verified live: memorize/ponder/consolidate live under the
doubled prefix /1/agents/agents/ (the spec is correct, not a typo).
The single-prefix equivalents 404 with FastAPI's standard {detail:
"Not Found"} envelope; the doubled prefix routes to the actual
handler. Pinned in the godoc on api/agentstudio/internal.go.
Hidden semantics: parent has Hidden:true so it doesn't pollute
`agents --help`, but children are visible — running
`agents internal --help` still surfaces what's available to anyone
who knows to look. Pinned by a unit test.
Memory verbs (memorize/ponder/consolidate) take the agent ID
positional and a JSON body via --body or -F ("-" for stdin),
mirroring the agents try input contract. Body shapes are evolving
(messages is a v4/v5 discriminated role union per the spec) so the
client passes the body through as-is and returns the response as
json.RawMessage — no premature typing.
defaults: small read returning {provider_type → recommended_model}.
Sibling of `agents providers models` — different question, different
shape, didn't fit as a flag on the existing models verb.
Gates: gofumpt, vet, unit (api+all agents pkgs), e2e (TestAgents/*),
lint, build.
Co-authored-by: Cursor <cursoragent@cursor.com>
…ya F-1)
Anya's Phase 8 vet caught an asymmetric 404 from the staging gateway:
`agents user-data get` and `delete` against any token containing
"/" return `agent studio: 404 Not Found: Not Found` while every
other unknown token (including ones with +, ?, #, &, =) returns a
clean 200 empty / 204.
Root cause is server-side — the gateway decodes %2F before routing
so the path no longer matches /1/user-data/{token}. The CLI does the
right thing with url.PathEscape; the symptom is a misleading 404
that an operator processing a GDPR deletion request would reasonably
read as "data was never stored."
Fix: refuse "/"-bearing tokens at the CLI layer with a precise
error message that names the gateway behavior, on both verbs. One
shared const, two two-line guards. No behavior change for tokens
without "/".
Live-verified: `./algolia agents user-data get "ab/cd"` now
prints the explicit message instead of "404 Not Found."
Gates: gofumpt, vet, unit (api+all agents pkgs), e2e
(TestAgents/userdata), lint, build.
Co-authored-by: Cursor <cursoragent@cursor.com>
Three small follow-ups to PR algolia#212. * `agents try` / `agents run`: TTY-attached stdout now renders a human-readable transcript (text-deltas inline, tool calls/results as dim annotations, errors red) instead of NDJSON. Non-TTY paths keep emitting NDJSON unchanged so existing pipelines (jq, log capture, scripts) are not broken. New `--ndjson` flag forces NDJSON on a TTY for users who want machine output on screen. * `agents domains bulk-delete`: now pre-fetches the existing list (TTY-only, one extra GET) so the success line can split requested IDs into removed-vs-already-absent. Non-TTY behavior unchanged. Backend returns 204 with no body, so this is the only signal we can give users about whether their delete actually did anything. * `agents keys list`: lock the table-render mask path with two new tests (default masks, --show-secret reveals). Existing tests only covered --output json — the table writes k.Value verbatim, so the mask must apply before the row is added. Co-authored-by: Cursor <cursoragent@cursor.com>
…ents Pulls the long-form Agent Studio rationale (file layout, auth, host resolution, streaming protocols, dry-run semantics, secret masking, backend gotchas) out of multi-paragraph in-source comments and out of AGENTS.md, into a single reference at docs/agents.md. Source files now keep one-line godoc comments only; gotchas that genuinely have to live next to the code (purge requires a date, slash in user-tokens, doubled memory-op path, memory polarity, ACL on configuration) are kept as 1-2 line hints with a pointer to docs/agents.md. AGENTS.md drops the agent-studio sections and links to the doc. Net: -945 / +219 lines across 25 files, no behavior change. Docs generator + full test sweep + gofumpt all clean. Co-authored-by: Cursor <cursoragent@cursor.com>
Five low-risk DRY passes across `pkg/cmd/agents/`:
- `newClientForServer` × 14 test files → `sharedtest.NewClient`
(new `pkg/cmd/agents/sharedtest/` package, kept out of production
imports). `writeTempJSON` × 6 → `sharedtest.WriteTempJSON`.
- `ctxOrBackground` × 7 sub-group files → `shared.OrBackground` and
inline `if ctx == nil { ctx = context.Background() }` blocks in
delete/cache/try/run.
- `--confirm` / "non-TTY refuses without -y" pattern × 9 commands
→ `shared.AddConfirmFlag` + `shared.ResolveConfirm` +
`shared.Confirm`.
- File-reading helpers (`readBody`, `readJSONBody`, inline
`cmdutil.ReadFile + TrimUTF8BOM + json.Valid` in
config/update/create/domains) → `shared.ReadJSONFile`.
- Completion-flag block (8 flags + the input/message mutex)
duplicated in `try` and `run` → `shared.CompletionInputs` +
`shared.RegisterCompletionFlags`.
Net: 102 files changed, +587 / -770. Behavior unchanged; full suite
green, golangci-lint clean.
Co-authored-by: Cursor <cursoragent@cursor.com>
Allow common provider flows without -F: --name/--provider plus --api-key, --api-key-stdin, or --api-key-env for create (openai, anthropic, google_genai, deepseek); patches via --name, API key flags, and optional --base-url for update. Azure and openai_compatible still require JSON files. Keeps file and shortcut paths mutually exclusive. Updates agents e2e providers.txtar for the relaxed -F requirement. Co-authored-by: Cursor <cursoragent@cursor.com>
Adds docs/agents.md section on providers -F vs flags; links to docs/qa/arg_friendly_providers_SIGNOFF.md (Lab Week planning-team alignment + ordered test commands). Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 1479 |
| Duplication | 936 |
TIP This summary will be updated as you push new changes.
Co-authored-by: Cursor <cursoragent@cursor.com>
tmp is cleaned by Taskfile and may contain local QA Go packages outside review; use issues.exclude-dirs instead of deprecated run.skip-dirs. Co-authored-by: Cursor <cursoragent@cursor.com>
Author
|
Retargeting stacked work: the arg-friendly provider flags (and follow-up lint/docs) now live in a fork PR with base |
Author
|
Continued as fork PR: cmarguta-alg#4 ( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Full Agent Studio CLI surface (
algolia agents,api/agentstudio/) as developed onlab_week_3, plus provider flag shortcuts for commonagents providers create|updateflows (seedocs/agents.mdanddocs/qa/arg_friendly_providers_SIGNOFF.md).Notes
tmp/only (not committed).Test plan
task test/go test ./pkg/cmd/agents/...andgo test ./api/agentstudio/...go test ./e2e -tags=e2e -run TestAgentswithALGOLIA_APPLICATION_ID+ALGOLIA_API_KEY