Skip to content

Agents plugin: supervisor API adapter#345

Open
hubertzub-db wants to merge 39 commits intodatabricks:mainfrom
hubertzub-db:agent/v2/sa/1-adapter
Open

Agents plugin: supervisor API adapter#345
hubertzub-db wants to merge 39 commits intodatabricks:mainfrom
hubertzub-db:agent/v2/sa/1-adapter

Conversation

@hubertzub-db
Copy link
Copy Markdown

@hubertzub-db hubertzub-db commented May 5, 2026

Adds a fourth AgentAdapter that targets the Databricks AI Gateway Responses API, so AppKit apps can host server-side-orchestrated agents (Genie, Knowledge Assistants, UC functions, custom apps, UC connections) without managing tool execution locally.

import { fromSupervisorApi, supervisorTools } from "@databricks/appkit/agents/supervisor-api";

const supervisor = createAgent({
  instructions: "You are an assistant powered by the Databricks Supervisor API.",
  model: await fromSupervisorApi({
    model: "databricks-claude-sonnet-4-5",
    tools: [
      supervisorTools.genieSpace("01ABCDEF12345678", "NYC taxi trip records and zones"),
      supervisorTools.ucFunction("main.default.add", "Adds two integers and returns the sum."),
    ],
  }),
});

What's new

  • fromSupervisorApi({...}) — factory that returns an AgentAdapter. Uses the SDK's default credential chain (env, profiles, OAuth, OIDC) just like DatabricksAdapter.fromModelServing.
  • supervisorTools.* — concise factories for the five hosted tool types (genieSpace, ucFunction, knowledgeAssistant, app, ucConnection). description is required on every one — it's both validation and the routing hint the LLM uses to choose between tools.
  • @databricks/appkit/agents/supervisor-api — new subpath export so consumers pick only the adapter they want; mirrors the existing ./agents/{databricks,vercel-ai,langchain} pattern.

Reference app

Demo supervisor agent in apps/dev-playground/server/index.ts. Tools are commented out by default — uncomment any supervisorTools.* entry to give the model real powers.

Test plan

appkit_agent
  • Manual tests
  • 39 new tests (13 SSE reader + 26 adapter / factory).
  • Full appkit vitest suite passes; typecheck clean.

Foundation layer for the agents feature. Adds the portable type surface
that every downstream layer builds on, plus three LLM adapter
implementations so the agents plugin (later PR) can target whatever the
user has.

### Shared agent types

`packages/shared/src/agent.ts` — no behavior, just the type vocabulary:
`AgentAdapter`, `AgentEvent`, `AgentInput`, `AgentRunContext`,
`AgentToolDefinition`, `Message`, `Thread`, `ThreadStore`, `ToolAnnotations`,
`ToolCall`, `ToolProvider`, `ResponseStreamEvent`. Exported from the
shared barrel.

### Adapters

- `packages/appkit/src/agents/databricks.ts` — `DatabricksAdapter`:
  streams OpenAI-compatible completions against a Databricks Model
  Serving endpoint (raw fetch + SSE, no vendor SDKs).
- `packages/appkit/src/agents/vercel-ai.ts` — `VercelAIAdapter`:
  wraps any Vercel AI SDK `streamText` call. Maps Vercel SDK events to
  AppKit `AgentEvent`s and tool calls.
- `packages/appkit/src/agents/langchain.ts` — `LangChainAdapter`:
  wraps any LangChain `Runnable` (AgentExecutor, compiled LangGraph,
  etc.). Subscribes to `streamEvents(v2)` and maps to `AgentEvent`s.

Each adapter is self-contained and independently testable.

### Package plumbing

- Subpath exports `@databricks/appkit/agents/{databricks,vercel-ai,langchain}`
  so consumers pick only the adapter they want.
- `@langchain/core` and `ai` declared as optional peer dependencies.
- `@ai-sdk/openai`, `@langchain/core`, `ai` added as devDeps for tests.
- `tsdown.config.ts` emits the three adapter entry points alongside the
  main bundle.

### Test plan

- 24 adapter tests (Databricks: 16, Vercel AI: 4, LangChain: 4) passing
- Full appkit vitest suite: 1154 tests passing
- Typecheck clean
- Build clean, publint clean

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

### MVP polish

- **LangChain adapter `callId` correlation fix.** The previous
  implementation emitted `tool_call` with the LLM-provided
  `tc.id ?? tc.name` and `tool_result` with LangChain's internal
  `event.run_id` — these never matched, breaking every Responses-API
  client that pairs calls by `call_id`. The adapter now records a
  `run_id → callId` mapping at `on_tool_start` (matching the
  accumulated tool_call by name) and resolves it at `on_tool_end`. A
  deterministic `lc_<name>_<idx>_<counter>` fallback id prevents
  collisions when the same tool is called multiple times in one turn
  without a model-provided id. Adds three regression tests covering
  happy-path correlation, duplicate-name disambiguation, and the
  no-accumulator-match fallback.

- **Adapter docstring cleanup.** The four `@example` blocks in
  `databricks.ts`, `langchain.ts`, and `vercel-ai.ts` referenced a
  fictional `appkit.agent.registerAgent("assistant", adapter)` API
  that has never existed. Replaced with real usage via
  `createApp({ plugins: [agents({ agents: { assistant: createAgent(
  { model: adapter }) } })] })`.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Second layer of the agents feature. Adds the primitives for defining
agent tools and implements them on every core ToolProvider plugin.

- `tool(config)` — inline function tools backed by a Zod schema. Auto-
  generates JSON Schema for the LLM via `z.toJSONSchema()` (stripping
  the top-level `$schema` annotation that Gemini rejects), runtime-
  validates tool-call arguments, returns an LLM-friendly error string
  on validation failure so the model can self-correct.
- `mcpServer(name, url)` — tiny factory for hosted custom MCP server
  configs. Replaces the verbose
  `{ type: "custom_mcp_server", custom_mcp_server: { app_name, app_url } }`
  wrapper.
- `FunctionTool` / `HostedTool` types + `isFunctionTool` / `isHostedTool`
  type guards. `HostedTool` is a union of Genie, VectorSearch, custom
  MCP, and external-connection configs.
- `ToolkitEntry` + `ToolkitOptions` types + `isToolkitEntry` guard.
  `AgentTool = FunctionTool | HostedTool | ToolkitEntry` is the canonical
  union later PRs spread into agent definitions.

- `defineTool(config)` + `ToolRegistry` — plugin authors' internal shape
  for declaring a keyed set of tools with Zod-typed handlers.
- `toolsFromRegistry()` — produces the `AgentToolDefinition[]` exposed
  via `ToolProvider.getAgentTools()`.
- `executeFromRegistry()` — validates args then dispatches to the
  handler. Returns LLM-friendly errors on bad args.
- `toToolJSONSchema()` — shared helper at
  `packages/appkit/src/plugins/agents/tools/json-schema.ts` that wraps
  `toJSONSchema()` and strips `$schema`. Used by `tool()`,
  `toolsFromRegistry()`, and `buildToolkitEntries()`.
- `buildToolkitEntries(pluginName, registry, opts?)` — converts a
  plugin's internal `ToolRegistry` into a keyed record of `ToolkitEntry`
  markers, honoring `prefix` / `only` / `except` / `rename`.

- `AppKitMcpClient` — minimal JSON-RPC 2.0 client over SSE, zero deps.
  Handles auth refresh, per-server connection pooling, and tool
  definition aggregation.
- `resolveHostedTools()` — maps `HostedTool` configs to Databricks MCP
  endpoint URLs.

- **analytics** — `query` tool (Zod-typed, asUser dispatch)
- **files** — per-volume tool family:
  `${volumeKey}.{list,read,exists,metadata,upload,delete}`
  (dynamically named from the plugin's volume config)
- **genie** — per-space tool family:
  `${alias}.{sendMessage,getConversation}`
  (dynamically named from the plugin's spaces config)
- **lakebase** — `query` tool

Each plugin gains `getAgentTools()` + `executeAgentTool()` satisfying
the `ToolProvider` interface, plus a `.toolkit(opts?)` method that
returns a record of `ToolkitEntry` markers for later spread into agent
definitions.

- 58 new tests across tool primitives + plugin ToolProvider surfaces
- Full appkit vitest suite: 1212 tests passing
- Typecheck clean
- Build clean, publint clean

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

New `mcp-host-policy.ts` module enforces an allowlist on every MCP URL
before the first byte is sent. Same-origin Databricks workspace URLs
are admitted by default; any other host must be explicitly trusted
via the new `AgentsPluginConfig.mcp.trustedHosts` field (added in a
subsequent stack layer).

- Rejects non-`http(s)` schemes and plaintext `http://` outside of
  localhost-in-dev.
- Blocks link-local (`169.254/16` — cloud metadata), RFC1918, CGNAT,
  loopback (unless `allowLocalhost`), ULA, multicast, and IPv4-mapped
  IPv6 equivalents at DNS-resolve time. IP-literal URLs in these
  ranges are rejected without a DNS lookup. Malformed IPs fail-closed.
- `AppKitMcpClient` constructor now takes the policy as a third arg.
  Workspace credentials (SP on `initialize`/`tools/list`; caller-
  supplied OBO on `tools/call`) are never attached to non-workspace
  hosts — `callTool` drops caller OBO overrides for external
  destinations, and `sendRpc`/`sendNotification` never invoke
  `authenticate()` when `forwardWorkspaceAuth` is false.
- Constructor accepts optional `{ dnsLookup, fetchImpl }` for test DI.

New tests:
- `mcp-host-policy.test.ts` (42 tests): config builder, URL check,
  IP blocklist, DNS-backed resolution with split-DNS defense.
- `mcp-client.test.ts` (8 tests): integrated client with recording
  fetch — verifies no fetch + no `authenticate()` call when URL is
  rejected, and that auth headers are scoped correctly per-destination.

- `json-schema.ts`: biome formatting fix (pre-existing drift).
- `packages/appkit/src/index.ts`: biome organizeImports fix
  (pre-existing sort order drift).

Full appkit vitest suite: 1262 tests passing (+50 from security).

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

New `sql-policy.ts` module provides `classifyReadOnly(sql)` and
`assertReadOnlySql(sql)` — a dependency-free tokenizer-based classifier
that rejects any statement outside `SELECT | WITH | SHOW | EXPLAIN |
DESCRIBE` at execution time. Also exports
`wrapInReadOnlyTransaction(stmt)` which produces a `BEGIN READ ONLY …
ROLLBACK` envelope for belt-and-suspenders enforcement on PostgreSQL.

Why a hand-rolled tokenizer rather than `node-sql-parser` or
`pgsql-parser`:
- `node-sql-parser`'s Hive/Spark dialect coverage rejects common
  Databricks SQL patterns (three-part names, `SHOW TABLES IN`,
  `DESCRIBE EXTENDED`, `EXPLAIN`); its PostgreSQL grammar rejects the
  same meta-commands.
- `pgsql-parser` (libpg_query) is a native binding and fails to install
  cleanly on Databricks App runtimes.
- We only need statement-type classification, not full parsing.

The tokenizer handles line/block comments (nested), single- and
double-quoted literals, ANSI/backtick identifiers, PostgreSQL
dollar-quoted strings, `E'..'` escape strings, and reports
unterminated literals as fail-closed. 62 tests exercise evasion
vectors (stacked writes, quoted keywords, comment-hidden writes,
mismatched dollar-quote tags, unterminated strings).

`analytics.query` was annotated `{ readOnly: true,
requiresUserContext: true }` but the annotation was a claim only. A
prompt-injected LLM could send `UPDATE`, `DELETE`, or `DROP` and the
warehouse would run it subject to the end user's SQL grants.

The tool now calls `assertReadOnlySql` before reaching
`this.query()`. A rejection surfaces an LLM-friendly error the model
can self-correct on; tests verify writes never reach the warehouse.
Public `AppKit.analytics.query(...)` continues to accept arbitrary
SQL — app authors use it intentionally; LLMs do not.

`lakebase.query` previously shipped as an always-on agent tool with
`{ readOnly: false, destructive: false, idempotent: false }`
(`destructive: false` was an outright lie) and executed arbitrary LLM
SQL against the SP-scoped pool, auto-inherited by every markdown
agent.

The plugin now registers **no** agent tool by default. Opt-in via:

```ts
lakebase({
  exposeAsAgentTool: {
    iUnderstandRunsAsServicePrincipal: true,
    readOnly: true, // default
  },
});
```

The acknowledgement flag is required because the pool is bound to the
service principal regardless of which end user invokes the tool —
enabling the tool is a deliberate privilege grant.

When opted in with `readOnly: true` (default):
- Statement classified by `classifyReadOnly` (rejects non-SELECT with
  an LLM-friendly error).
- Remaining statement executed inside `BEGIN READ ONLY; …; ROLLBACK`
  so PostgreSQL enforces server-side even if a side-effecting function
  slips past the classifier.
- Annotations: `{ readOnly: true, destructive: false, idempotent: false }`.

When opted in with `readOnly: false`:
- Statement passed through unchanged.
- Annotations: `{ readOnly: false, destructive: true, idempotent: false }`.
  The `destructive: true` signal will be honored by the agents plugin's
  HITL approval gate in PR databricks#304.

`LakebasePlugin` is now `export class` so tests can construct it
directly. New test file `lakebase-agent-tool.test.ts` (9 tests)
verifies defaults, opt-in, acknowledgement enforcement, readOnly
rejection + wrap, and destructive pass-through.

Full appkit vitest suite: 1340 tests passing (+78 from S-2 Layer 1+2).

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Groundwork for flipping the unsafe `autoInheritTools: { file: true }`
default into opt-in auto-inherit gated by per-tool consent. Adds an
`autoInheritable?: boolean` field to `ToolEntry` (defined via
`defineTool`) and propagates it through `buildToolkitEntries` onto the
resulting `ToolkitEntry`. The agents plugin consumes this flag in a
subsequent stack layer to filter auto-inherited tools — any tool whose
plugin author has not explicitly marked `autoInheritable: true` never
spreads into agents that enable auto-inherit.

Default is `false` for defense-in-depth. Plugin authors must
consciously mark tools that are genuinely safe for unscoped exposure.

- `analytics.query`: `autoInheritable: true`. The tool is OBO-scoped
  and `readOnly: true` is enforced at runtime (S2).
- `files.list` / `files.read` / `files.exists` / `files.metadata`:
  `autoInheritable: true`. Pure read operations, OBO-scoped.
- `files.upload` / `files.delete`: NOT auto-inheritable. Destructive;
  must be explicitly wired by the agent author.
- `genie.getConversation`: `autoInheritable: true` (read-only history).
- `genie.sendMessage`: NOT auto-inheritable. State-mutating Genie
  conversation; user wires explicitly if desired.
- `lakebase.query`: NOT auto-inheritable. The tool is already gated by
  the explicit `exposeAsAgentTool` acknowledgement (S2); auto-inherit
  remains closed as defense-in-depth.

- `build-toolkit.test.ts` gains a case verifying propagation: explicit
  `true`, explicit `false`, and omitted (undefined) all flow through to
  the `ToolkitEntry` unchanged.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

- **MCP caller abort signal composition.** `callTool` now accepts an
  optional `callerSignal` and composes it with the existing 30 s
  timeout via `AbortSignal.any([...])`. The agents plugin threads its
  stream signal through in a subsequent stack layer so that a
  `POST /cancel` or agent-run shutdown immediately propagates to the
  MCP fetch, rather than leaving in-flight MCP calls running on the
  remote server until they complete. Adds a regression test that
  aborts the caller signal mid-fetch and asserts the fetch rejects
  with `AbortError`.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Three issues flagged by the re-review pass (one correctness HIGH, two
security MEDIUMs).

- **Lakebase read-only path no longer emits a multi-statement batch.**
  `wrapInReadOnlyTransaction` returned `"BEGIN READ ONLY;\n<stmt>;\
  nROLLBACK;"` as one string passed to `pool.query(text, values)`. As
  soon as the agent supplied `values`, pg switched to the Extended
  Query protocol and PostgreSQL rejected the batch with `cannot
  insert multiple commands into a prepared statement`, silently
  breaking every parameterized read-only tool call in production.
  The mocked lakebase test concealed this.

  The helper is removed; `LakebasePlugin.runReadOnlyStatement` now
  acquires a dedicated client from the pool and runs three separate
  `client.query` calls on the same connection (`BEGIN READ ONLY`,
  user statement with values, `ROLLBACK`), with a `finally` that
  rolls back and releases the client even when the user statement
  throws. Four tests cover the new flow: dispatch-time rejection,
  statement ordering, parameter forwarding, and release-on-error.

- **`isBlockedIpv6` link-local `fe80::/10` now covers the full range.**
  Previous regex `startsWith("fe80:") || startsWith("fe9")` only
  matched `fe80`–`fe9f`, leaving `fea0`–`febf` (valid link-local)
  passable. Replaced with `/^fe[89ab][0-9a-f]:/.test(lowered)` so
  the second hex nibble is checked against `8..b`.

- **`::ffff:<hex>:<hex>` IPv4-mapped IPv6 is now normalised.** The
  colon-hex form of an IPv4-mapped address (`::ffff:a9fe:a9fe` =
  169.254.169.254) previously bypassed the IPv4 blocklist because
  `isIPv4("a9fe:a9fe")` is false. `hexPairToDottedIpv4` reassembles
  the trailing two hex groups into dotted form and delegates to
  `isBlockedIpv4`. Regression tests cover the metadata, 10/8, and
  192.168/16 equivalents; a public IPv4 mapped to colon-hex still
  passes through.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Add a file-level rationale (policy/auth, narrow scope, zero extra deps) and
point the class JSDoc at it to avoid duplicating the same story in two places.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Single pass over volumes: connectors, toolkit tools, and policy warnings.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…nContext mediator

Third layer: the substrate every downstream PR relies on. No user-
facing API changes here; the surface for this PR is the mediator
pattern, lifecycle semantics, and factory stamping.

`Plugin` constructors become pure — no `CacheManager.getInstanceSync()`,
no `TelemetryManager.getProvider()`, no `PluginContext` wiring inside
`constructor()`. That work moves to a new lifecycle method:

```ts
interface BasePlugin {
  attachContext?(deps: {
    context?: unknown;
    telemetryConfig?: TelemetryOptions;
  }): void;
}
```

`createApp` calls `attachContext()` on every plugin after all
constructors have run, before `setup()`. This lets factories return
`PluginData` tuples at module scope without pulling core services into
the import graph — a prerequisite for later PRs that construct agent
definitions before `createApp`.

`packages/appkit/src/core/plugin-context.ts` — new class that mediates
all inter-plugin communication:

- **Route buffering**: `addRoute()` / `addMiddleware()` buffer until
  the server plugin calls `registerAsRouteTarget()`, then flush via
  `addExtension()`. Eliminates plugin-ordering fragility.
- **ToolProvider registry**: `registerToolProvider(name, plugin)` +
  live `getToolProviders()`. Typed discovery of tool-exposing plugins.
- **User-scoped tool execution**: `executeTool(req, pluginName,
  localName, args, signal?)` resolves the provider, wraps in
  `asUser(req)` for OBO, opens a telemetry span, applies a 30s
  timeout, dispatches, returns.
- **Lifecycle hooks**: `onLifecycle('setup:complete' | 'server:ready'
  | 'shutdown', cb)` + `emitLifecycle(event)`. Callback errors don't
  block siblings.

`packages/appkit/src/plugin/to-plugin.ts` — the factory now attaches a
read-only `pluginName` property to the returned function. Later PRs'
`fromPlugin(factory)` reads it to identify which plugin a factory
refers to without needing to construct an instance. `NamedPluginFactory`
type exported for consumers who want to type-constrain factories.

`ServerPlugin.setup()` no longer calls `extendRoutes()` synchronously.
It subscribes to the `setup:complete` lifecycle event via
`PluginContext` and starts the HTTP server there. This ensures that
any deferred-phase plugin (agents plugin in a later PR) has had a
chance to register routes via `PluginContext.addRoute()` before the
server binds. Removes the `plugins` field from `ServerConfig` (routes
are now discovered via the context, not a config snapshot).

- 25 new PluginContext tests (route buffering, tool provider registry,
  executeTool paths, lifecycle hooks, plugin metadata)
- Updated AppKit lifecycle tests to inject `context` instead of
  `plugins`
- Full appkit vitest suite: 1237 tests passing
- Typecheck clean across all 8 workspace projects

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…agents

The main product layer. Turns an AppKit app into an AI-agent host with
markdown-driven agent discovery, code-defined agents, sub-agents, and
a standalone run-without-HTTP executor.

`packages/appkit/src/core/create-agent-def.ts`. Returns the passed-in
definition after cycle-detecting the sub-agent graph. No adapter
construction, no side effects — safe at module top-level. The returned
`AgentDefinition` is plain data, consumable by either `agents({ agents
})` or `runAgent(def, input)`.

`packages/appkit/src/plugins/agents/agents.ts`. `AgentsPlugin` class:

- Loads markdown agents from `config/agents/*.md` (configurable dir)
  via real YAML frontmatter parsing (`js-yaml`). Frontmatter schema:
  `endpoint`, `model`, `toolkits`, `tools`, `default`, `maxSteps`,
  `maxTokens`, `baseSystemPrompt`. Unknown keys logged, invalid YAML
  throws at boot.
- Merges code-defined agents passed via `agents({ agents: { name: def
  } })`. Code wins on key collision.
- For each agent, builds a per-agent tool index from:
  1. Sub-agents (`agents: {...}`) — synthesized as `agent-<key>`
     tools on the parent.
  2. Explicit tool record entries — `ToolkitEntry`s, inline
     `FunctionTool`s, or `HostedTool`s.
  3. Auto-inherit (if nothing explicit) — pulls every registered
     `ToolProvider` plugin's tools. Asymmetric default: markdown
     agents inherit (`file: true`), code-defined agents don't (`code:
     false`).
- Mounts `POST /invocations` (OpenAI Responses compatible) + `POST
  /chat`, `POST /cancel`, `GET /threads/:id`, `DELETE /threads/:id`,
  `GET /info`.
- SSE streaming via `executeStream`. Tool calls dispatch through
  `PluginContext.executeTool(req, pluginName, localName, args, signal)`
  for OBO, telemetry, and timeout.
- Exposes `appkit.agent.{register, list, get, reload, getDefault,
  getThreads}` runtime helpers.

`packages/appkit/src/core/run-agent.ts`. Runs an `AgentDefinition`
without `createApp` or HTTP. Drives the adapter's event stream to
completion, executing inline tools + sub-agents along the way.
Aggregates events into `{ text, events }`. Useful for tests, CLI
scripts, and offline pipelines. Hosted/MCP tools and plugin toolkits
require the agents plugin and throw clear errors with guidance.

- `AgentEventTranslator` — stateful converter from internal
  `AgentEvent`s to OpenAI Responses API `ResponseStreamEvent`s with
  sequence numbers and output indices.
- `InMemoryThreadStore` — per-user conversation persistence. Nested
  `Map<userId, Map<threadId, Thread>>`. Implements `ThreadStore` from
  shared types.
- `buildBaseSystemPrompt` + `composeSystemPrompt` — formats the
  AppKit base prompt (with plugin names and tool names) and layers
  the agent's instructions on top.

`load-agents.ts` — reads `*.md` files, parses YAML frontmatter with
`js-yaml`, resolves `toolkits: [...]` entries against the plugin
provider index at load time, wraps ambient tools (from `agents({
tools: {...} })`) for `tools: [...]` frontmatter references.

- Adds `js-yaml` + `@types/js-yaml` deps.
- Manifest mounts routes at `/api/agent/*` (singular — matches
  `appkit.agent.*` runtime handle).
- Exports from the main barrel: `agents`, `createAgent`, `runAgent`,
  `AgentDefinition`, `AgentsPluginConfig`, `AgentTool`, `ToolkitEntry`,
  `ToolkitOptions`, `BaseSystemPromptOption`, `PromptContext`,
  `isToolkitEntry`, `loadAgentFromFile`, `loadAgentsFromDir`.

- 60 new tests: agents plugin lifecycle, markdown loading, code-agent
  registration, auto-inherit asymmetry, sub-agent tool synthesis,
  cycle detection, event translator, thread store, system prompt
  composition, standalone `runAgent`.
- Full appkit vitest suite: 1297 tests passing.
- Typecheck clean across all 8 workspace projects.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

`connectHostedTools` now builds an `McpHostPolicy` from the new
`config.mcp` field (`trustedHosts`, `allowLocalhost`) and passes it to
`AppKitMcpClient`. Same-origin workspace URLs are admitted with
workspace auth; all other hosts must be explicitly trusted, and
workspace credentials are never forwarded to them.

- `AgentsPluginConfig.mcp?: McpHostPolicyConfig` added to the public
  config surface. See PR databricks#302 for the policy definition and tests.
- No behavioural change for the default case (same-origin Databricks
  workspace URLs): those continue to receive SP on setup and caller
  OBO on `tools/call`.

- `knip.json`: ignore `packages/appkit/src/plugin/to-plugin.ts`. The
  `NamedPluginFactory` export introduced by this layer is consumed in
  a later stack layer; knip flags it as unused in the intermediate
  state.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Adds a secure-by-default HITL approval gate for any tool annotated
`destructive: true`. Before executing such a tool the agents plugin:

1. Emits a new `appkit.approval_pending` SSE event carrying the
   `approval_id`, `stream_id`, `tool_name`, `args`, and `annotations`.
2. Awaits a matching `POST /chat/approve` from the same user who
   initiated the stream.
3. Auto-denies after the configurable `timeoutMs` (default 60 s).

A denial returns a short denial string to the adapter as the tool
output so the LLM can apologise / replan instead of crashing.

```ts
agents({
  approval: {
    requireForDestructive?: boolean, // default: true
    timeoutMs?: number,              // default: 60_000
  },
});
```

- `event-channel.ts`: single-producer / single-consumer async queue
  used to merge adapter events with out-of-band events emitted by
  `executeTool` (same SSE stream, single `executeStream` sink).
- `tool-approval-gate.ts`: state machine keyed by `approvalId`.
  Owns the pending promise + timeout, enforces ownership on submit,
  exposes `abortStream(streamId)` + `abortAll()` for clean teardown.

- `AgentEvent` gains an `approval_pending` variant.
- `ResponseStreamEvent` gains `AppKitApprovalPendingEvent`.
- `AgentEventTranslator.translate()` handles both.

- `POST /approve` route with zod validation, ownership check, and
  404 / 403 / 200 semantics.
- `POST /cancel` now enforces the same ownership invariant
  (`resolveUserId(req) === stream.userId`) and aborts any pending
  approval gates on the stream.

- `event-channel.test.ts` (7): ordering, buffered-before-iter,
  close semantics, close-with-error rejection, interleave.
- `tool-approval-gate.test.ts` (8): approve / deny / timeout /
  ownership / abortStream / abortAll / late-submit no-op.
- `approval-route.test.ts` (8): schema validation, unknown stream,
  ownership refusal, unknown approvalId, approve happy path, deny
  happy path, cancel clears pending gates, cancel ownership refusal.

Full appkit suite: 1448 tests passing (+23 from Layer 3).

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

`autoInheritTools` now defaults to `{ file: false, code: false }`.
Markdown and code-defined agents with no explicit `tools:` declaration
receive an empty tool index unless the developer explicitly opts in.

When opted in (`autoInheritTools: { file: true }` or the boolean
shorthand), `applyAutoInherit` now filters the spread strictly by each
`ToolkitEntry.autoInheritable` flag (set on the source `ToolEntry` in
PR databricks#302). Any tool not marked `autoInheritable: true` is skipped and
logged so the operator can see exactly what the safe default omits.

Providers exposing tools only via `getAgentTools()` (no `toolkit()`)
cannot be filtered per tool, so their entries are conservatively
skipped during auto-inherit and must be wired explicitly via
`tools:`. This removes the silent privilege-amplification path where
registering a plugin implicitly granted its entire tool surface to
every markdown agent.

- New: safe default produces an empty tool index for both file and
  code agents even when an `autoInheritable: true` tool exists.
- New: `autoInheritTools: { file: true }` spreads only the tool
  marked `autoInheritable: true`; an adjacent unmarked tool is
  skipped.
- New: `autoInheritTools: true` (boolean shorthand) enables both
  origins and still filters by `autoInheritable`.
- Updated: the prior "asymmetric default" test now validates the new
  safe-default semantics (empty index on both origins).

Full appkit suite: 1451 tests passing (+3 from S-3 Layer 2).

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Five small correctness + DX fixes rounding out the MVP blocker list.

- **Plugin name `agent` → `agents`.** The manifest name now matches
  the public runtime key (`appkit.agents.*`) and the factory export.
  Previously the cast in the reference app masked a real typing gap.

- **`maxSteps` / `maxTokens` frontmatter / AgentDefinition fields
  now flow into the adapter.** Previously `resolveAdapter` built
  `DatabricksAdapter.fromModelServing(source)` without passing either
  value, so the documented knobs were silent no-ops. Now threaded
  through as `adapterOptions` when AppKit constructs the adapter
  itself (string model or omitted model); user-supplied `AgentAdapter`
  instances own their own settings as before.

- **Single-`message` assistant turns are now persisted to the thread
  store.** The stream accumulator previously only handled
  `message_delta`; any adapter that yields a single final `message`
  (notably LangChain's `on_chain_end` path) silently dropped the
  assistant turn from thread history, so multi-turn LangChain
  conversations lost context. The loop now accumulates both kinds,
  with `message` replacing previously-accumulated deltas.

- **Void-tool return no longer coerced into a fake error.** A tool
  handler that legitimately returns `undefined` (side-effecting
  fire-and-forget tools) was being reported to the LLM as
  `Error: Tool "x" execution failed`. Now returns `""` so the model
  sees a successful-but-empty result.

- **Default `InMemoryThreadStore` is now loud about being dev-only.**
  Constructor logs an INFO in development and a WARN in production
  when no `threadStore` is supplied. Docstring rewritten to state
  unambiguously that the default is intended for local development /
  demos only, and points at a follow-up for a capped variant. Real
  caps + a persistent implementation are tracked as follow-ups.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

New `ephemeral?: boolean` field on `AgentDefinition` and the
`ephemeral` markdown frontmatter key. When set, the thread created
for a chat request against that agent is deleted from `ThreadStore`
in the stream's `finally`. Intended for stateless one-shot agents
(e.g. autocomplete) where each invocation is independent and
retaining history would both poison future calls and accumulate
unbounded state in the default `InMemoryThreadStore`.

This closes the "autocomplete agent creates orphan thread per
keystroke" regression flagged in the performance re-review (R1),
which otherwise would have put an in-tree memory-leak demonstrator
against the one perf finding S-2/S-3 consciously deferred. Cleanup
errors in the finally block are logged at warn level so a late
delete never masks the real response.

`RegisteredAgent` mirrors the flag. `load-agents.ts` adds
`ephemeral` to `ALLOWED_KEYS`.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Rewrote `event-translator.ts` to allocate the message's `output_index`
lazily on the first `message_delta` or `message` and to close any open
message before emitting a subsequent `tool_call` / `tool_result`
item. The previous implementation hardcoded `output_index: 0` for
messages and incremented a counter starting at 1 for tool items, so
any ReAct-style flow (tool call before text) produced `output_item.
added` at index 1 followed by `output_item.added` at index 0 —
monotonicity violation that OpenAI's own Responses-API SDK parsers
enforce.

Also fixed the companion bug from the original review: `message`
after preceding `message_delta`s no longer double-emits
`output_item.added` (it just emits the `done`), and
`handleToolResult` coalesces `undefined` to `""` at the translator
layer so the wire shape is always a string for every adapter (not
just the ones that funnel through `agents.ts` executeTool).

Four new regression tests pin the invariants: tool_call → text
ordering, message-interrupted-by-tool, no duplicate added on full-
message, undefined-tool-result → empty-string output.

The one remaining HIGH security item from the prior review is now
closed. Minimal, static caps at the schema layer; configurable
per-deployment caps at runtime.

Schemas:
- `chatRequestSchema.message`: `.max(64_000)` — ~16k tokens, well
  above any legitimate chat turn.
- `invocationsRequestSchema.input`: string `.max(64_000)`, array
  `.max(100)` items, per-item `content` string `.max(64_000)` or
  array `.max(100)` items.

Runtime limits (new `AgentsPluginConfig.limits`):
- `maxConcurrentStreamsPerUser` (default 5): `_handleChat` counts
  the user's active streams before admitting and returns HTTP 429
  + `Retry-After: 5` when at-limit. Per-user, not global.
- `maxToolCalls` (default 50): per-run budget tracked in the
  `executeTool` closure across the top-level adapter and any
  sub-agent delegations. Exceeding aborts the stream.
- `maxSubAgentDepth` (default 3): `runSubAgent` rejects before any
  adapter work when the recursion depth exceeds the limit. Protects
  against a prompt-injected agent that delegates to itself
  transitively.

15 new tests exercise body caps (6), per-user limit with and
without override (3), defaults and overrides on `resolvedLimits`
(2), sub-agent depth boundary + violation (2), plus the remaining
schema checks (2).

Full appkit vitest suite: 1475 tests passing (+19 from this pass).

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Tool-agnostic guidelines instead of SQL/files-specific defaults; accept full
PromptContext in buildBaseSystemPrompt for parity with custom callbacks.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Register DATABRICKS_SERVING_ENDPOINT_NAME as optional CAN_QUERY so apps using
Databricks-hosted agent models get resource wiring; optional when agents use
only external adapters. Sync template/appkit.plugins.json.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Align optional serving resource with `DatabricksAdapter.fromModelServing()`, which
reads `DATABRICKS_AGENT_ENDPOINT` — not `DATABRICKS_SERVING_ENDPOINT_NAME`
(serving plugin). Sync template.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
BREAKING CHANGE: top-level config/agents/*.md is no longer loaded. Use
<agentId>/agent.md. The skills directory name is reserved and skipped.
Orphan top-level .md files error at load; subdirs without agent.md error.

Export agentIdFromMarkdownPath for path-based id resolution.
The MCP transport client and host policy aren't agents-specific; they are
HTTP + JSON-RPC transport with URL/DNS allowlisting. Move them under
packages/appkit/src/connectors/mcp/ so they sit alongside the other
transport-layer modules (serving, genie, sql-warehouse, lakebase, …) and
stop being reachable only through the agents plugin.

- Move mcp-client.ts          -> connectors/mcp/client.ts
- Move mcp-host-policy.ts     -> connectors/mcp/host-policy.ts
- Move McpEndpointConfig type -> connectors/mcp/types.ts
- Add connectors/mcp/index.ts barrel; re-export from connectors/index.ts
- Move mcp-client / mcp-host-policy tests to connectors/mcp/tests/
- Agents plugin keeps hosted-tools.ts (HostedTool sugar + resolve) and
  imports connector types from ../../connectors/mcp.
- tools/ barrel no longer re-exports AppKitMcpClient (never was public).

No behaviour change. All existing tests pass against the new paths.
Two small cleanups to AgentsPlugin.connectHostedTools():

- Replace the dynamic `await import("../../context")` with a top-level
  `import { getWorkspaceClient } from "../../context"`, matching every
  other plugin (genie, serving, analytics, files, vector-search).
- Drop the ad-hoc env-var fallback (DATABRICKS_HOST + DATABRICKS_TOKEN,
  PAT only). When ServiceContext is not initialized (test rigs, manual
  embeds) construct a bare `new WorkspaceClient({})` and let the SDK
  walk its own credential chain — env, ~/.databrickscfg profiles, DAB
  auth, OAuth, metadata service — before calling config.authenticate().

No behaviour change on the normal createApp path. The fallback branch
now supports every SDK auth type instead of PAT only, and tells the
user which setting to fix when no host can be resolved.
…dispatchToolCall

Three small helpers pulled out of the AgentsPlugin streaming path to cut
duplication and shrink the two large methods.

- normalize-result.ts: void->"", JSON-stringify, 50K truncation with a
  human-readable marker. Unit-testable (previously covered only via the
  HTTP path).
- consume-adapter-stream.ts: the 'message_delta' + 'message' accumulation
  loop shared between _streamAgent and runSubAgent. Accepts an optional
  signal and per-event side-effect callback (for SSE translation).
- tool-dispatch.ts: one place that fans out toolkit/function/mcp/subagent
  entries. 'never'-typed default forces exhaustiveness: adding a fifth
  source is now a compile error at every call site.

_streamAgent: executeTool closure shrinks from ~60 lines of dispatch +
normalize to a single dispatchToolCall + normalizeToolResult call.
Stream consumption collapses to consumeAdapterStream.

runSubAgent: childExecute shrinks from ~30 lines of if/else dispatch to
one dispatchToolCall call. Adapter loop collapses to consumeAdapterStream.

Behaviour change (minor): childExecute previously silently fell through to
'Unsupported sub-agent tool source' when mcpClient or PluginContext was
missing; now it throws the same specific error as the main stream. Matches
the main-path behaviour.

Tests: 15 new unit tests for normalizeToolResult + consumeAdapterStream.
dispatchToolCall is exercised transitively through the full agent suite
(288 existing tests still pass, 303 total on this branch).
… → def

The `annotations` field (notably `destructive: true`) was silently dropped
as tools flowed from `tool({...})` into the resolved `AgentToolDefinition`,
so user-defined destructive tools never triggered the approval gate.

- `ToolConfig` now accepts `annotations?: ToolAnnotations`.
- `tool()` forwards it to the returned `FunctionTool`.
- `FunctionTool` exposes `annotations` and `functionToolToDefinition`
  preserves it on the definition it builds.
- `AgentsPlugin` reads the flag via `isDestructiveToolEntry()` (falls back
  to `functionTool.annotations` so a future divergence between def and
  function cannot re-introduce the bug) and emits the merged annotations
  via `combinedToolAnnotations()` on the `approval_pending` SSE payload.

Covered by `tests/tool-approval-gate.test.ts` and
`tests/function-tool.test.ts`.
ToolAnnotations.destructive is binary and has started to mislead:
"save_view" captures a screenshot and creates a new file, which is
nothing like deleting a dashboard, yet both trip the same red
"destructive" approval card. This adds a semantic `effect` enum with
four tiers — `read`, `write`, `update`, `destructive` — so tool
authors can tell the UI what blast radius they actually have. The
approval gate fires for any mutating effect (`write`/`update`/
`destructive`) and continues to honour the legacy `destructive: true`
flag so existing tools keep their current red treatment without
migration. Callers consuming `annotations` over the wire (MCP clients,
approval UIs) can now differentiate; the playground will ship a
tiered approval card as a follow-up.
…esolver

DX centerpiece. Introduces the symbol-marker pattern that collapses
plugin tool references in code-defined agents from a three-touch dance
to a single line, and extracts the shared resolver that the agents
plugin, auto-inherit, and standalone runAgent all now go through.

`packages/appkit/src/plugins/agents/from-plugin.ts`. Returns a spread-
friendly `{ [Symbol()]: FromPluginMarker }` record. The symbol key is
freshly generated per call, so multiple spreads of the same plugin
coexist safely. The marker's brand is a globally-interned
`Symbol.for("@databricks/appkit.fromPluginMarker")` — stable across
module boundaries.

`packages/appkit/src/plugins/agents/toolkit-resolver.ts`. Single source
of truth for "turn a ToolProvider into a keyed record of `ToolkitEntry`
markers". Prefers `provider.toolkit(opts)` when available (core plugins
implement it), falls back to walking `getAgentTools()` and synthesizing
namespaced keys (`${pluginName}.${localName}`) for third-party
providers, honoring `only` / `except` / `rename` / `prefix` the same
way.

Used by three call sites, previously all copy-pasted:
1. `AgentsPlugin.buildToolIndex` — fromPlugin marker resolution pass
2. `AgentsPlugin.applyAutoInherit` — markdown auto-inherit path
3. `runAgent` — standalone-mode plugin tool dispatch

Before the existing string-key iteration, `buildToolIndex` now walks
`Object.getOwnPropertySymbols(def.tools)`. For each `FromPluginMarker`,
it looks up the plugin by name in `PluginContext.getToolProviders()`,
calls `resolveToolkitFromProvider`, and merges the resulting entries
into the per-agent index. Missing plugins throw at setup time with a
clear `Available: ...` listing — wiring errors surface on boot, not
mid-request.

`hasExplicitTools` now counts symbol keys too, so a
`tools: { ...fromPlugin(x) }` record correctly disables auto-inherit
on code-defined agents.

- `AgentTools` type: `{ [key: string]: AgentTool } & { [key: symbol]:
  FromPluginMarker }`. Preserves string-key autocomplete while
  accepting marker spreads under strict TS.
- `AgentDefinition.tools` switched to `AgentTools`.

`packages/appkit/src/core/run-agent.ts`. When an agent def contains
`fromPlugin` markers, the caller passes plugins via
`RunAgentInput.plugins`. A local provider cache constructs each plugin
and dispatches tool calls via `provider.executeAgentTool()`. Runs as
service principal (no OBO — there's no HTTP request). If a def
contains markers but `plugins` is absent, throws with guidance.

`fromPlugin`, `FromPluginMarker`, `isFromPluginMarker`, `AgentTools`
added to the main barrel.

- 14 new tests: marker shape, symbol uniqueness, type guard,
  factory-without-pluginName error, fromPlugin marker resolution in
  AgentsPlugin, fallback to getAgentTools for providers without
  .toolkit(), symbol-only tools disables auto-inherit, runAgent
  standalone marker resolution via `plugins` arg, guidance error when
  missing.
- Full appkit vitest suite: 1311 tests passing.
- Typecheck clean.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
runAgent()'s adapter-consumption loop is now the same consumeAdapterStream
helper introduced in the agents-plugin layer. One loop covers all three
execution paths: HTTP streaming (_streamAgent), sub-agents (runSubAgent),
and standalone runAgent. The message_delta + message accumulation rule
(with its LangChain on_chain_end quirk) lives in exactly one place.
…template

Final layer of the agents feature stack. Everything needed to
exercise, demonstrate, and learn the feature.

`apps/agent-app/` — a standalone app purpose-built around the agents
feature. Ships with:

- `server.ts` — full example of code-defined agents via `fromPlugin`:
  ```ts
  const support = createAgent({
    instructions: "…",
    tools: {
      ...fromPlugin(analytics),
      ...fromPlugin(files),
      get_weather,
      "mcp.vector-search": mcpServer("vector-search", "https://…"),
    },
  });

  await createApp({
    plugins: [server({ port }), analytics(), files(), agents({ agents: { support } })],
  });
  ```
- `config/agents/assistant.md` — markdown-driven agent alongside the
  code-defined one, showing the asymmetric auto-inherit default.
- Vite + React 19 + TailwindCSS frontend with a chat UI.
- Databricks deployment config (`databricks.yml`, `app.yaml`) and
  deploy scripts.

`apps/dev-playground/client/src/routes/agent.route.tsx` — chat UI with
inline autocomplete (hits the `autocomplete` markdown agent) and a
full threaded conversation panel (hits the default agent).

`apps/dev-playground/server/index.ts` — adds a code-defined `helper`
agent using `fromPlugin(analytics)` alongside the markdown-driven
`autocomplete` agent in `config/agents/`. Exercises the mixed-style
setup (markdown + code) against the same plugin list.

`apps/dev-playground/config/agents/*.md` — both agents defined with
valid YAML frontmatter.

`docs/docs/plugins/agents.md` — progressive five-level guide:

1. Drop a markdown file → it just works.
2. Scope tools via `toolkits:` / `tools:` frontmatter.
3. Code-defined agents with `fromPlugin()`.
4. Sub-agents.
5. Standalone `runAgent()` (no `createApp` or HTTP).

Plus a configuration reference, runtime API reference, and frontmatter
schema table.

`docs/docs/api/appkit/` — regenerated typedoc for the new public
surface (fromPlugin, runAgent, AgentDefinition, AgentsPluginConfig,
ToolkitEntry, ToolkitOptions, all adapter types, and the agents
plugin factory).

`template/appkit.plugins.json` — adds the `agent` plugin entry so
`npx @databricks/appkit init --features agent` scaffolds the plugin
correctly.

- Full appkit vitest suite: 1311 tests passing
- Typecheck clean across all 8 workspace projects
- `pnpm docs:build` clean (no broken links)
- `pnpm --filter=@databricks/appkit build:package` clean, publint
  clean

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Documents the new `mcp` configuration block and the rules it enforces:
same-origin-only by default, explicit `trustedHosts` for external MCP
servers, plaintext `http://` refused outside localhost-in-dev, and
DNS-level blocking of private / link-local IP ranges (covers cloud
metadata services). See PR databricks#302 for the policy implementation and
PR databricks#304 for the `AgentsPluginConfig.mcp` wiring.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

- `docs/docs/plugins/agents.md`: new "SQL agent tools" subsection
  covering `analytics.query` readOnly enforcement, `lakebase.query`
  opt-in via `exposeAsAgentTool`, and the approval flow. New
  "Human-in-the-loop approval for destructive tools" subsection
  documents the config, SSE event shape, and `POST /chat/approve`
  contract.

- `apps/agent-app`: approval-card component rendered inline in the
  chat stream whenever an `appkit.approval_pending` event arrives.
  Destructive badge + Approve/Deny buttons POST to
  `/api/agent/approve` with the carried `streamId`/`approvalId`.

- `apps/dev-playground/client`: matching approval-card on the agent
  route, using the existing appkit-ui `Button` component and
  Tailwind utility classes.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Updates `docs/docs/plugins/agents.md` to document the new
two-key auto-inherit model introduced in PR databricks#302 (per-tool
`autoInheritable` flag) and PR databricks#304 (safe-by-default
`autoInheritTools: { file: false, code: false }`). Adds an
"Auto-inherit posture" subsection explaining that the developer
must opt into `autoInheritTools` AND the plugin author must mark
each tool `autoInheritable: true` for a tool to spread without
explicit wiring.

Includes a table documenting the `autoInheritable` marking on each
core plugin tool, plus an example of the setup-time audit log so
operators can see exactly what's inherited vs. skipped.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

- **Reference app no longer ships hardcoded dogfood URLs.** The three
  `https://e2-dogfood.staging.cloud.databricks.com/...` and
  `https://mario-mcp-hello-*.staging.aws.databricksapps.com/...` MCP
  URLs in `apps/agent-app/server.ts` are replaced with optional
  env-driven `VECTOR_SEARCH_MCP_URL` / `CUSTOM_MCP_URL` config. When
  set, their hostnames are auto-added to `agents({ mcp: { trustedHosts
  } })`. `.env.example` uses placeholder values the reader can replace
  instead of another team's workspace.

- **`appkit.agent` → `appkit.agents` in the reference app.** The
  prior `appkit.agent as { list, getDefault }` cast papered over the
  plugin-name mismatch fixed in PR databricks#304. The runtime key now matches
  the docs, the manifest, and the factory name; the cast is gone.

- **Auto-inherit opt-in added to the reference config.** Since the
  defaults flipped to `{ file: false, code: false }` (PR databricks#304, S-3),
  the reference now explicitly enables `autoInheritTools: { file:
  true }` so the markdown agents that ship alongside the code-defined
  one still pick up the analytics / files read-only tools. This is the
  pattern a real deployment should follow — opt in deliberately.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

- `apps/dev-playground/config/agents/autocomplete.md` sets
  `ephemeral: true`. Each debounced autocomplete keystroke no longer
  leaves an orphan thread in `InMemoryThreadStore` — the server now
  deletes the thread in the stream's `finally` (PR databricks#304). Closes R1
  from the MVP re-review.
- `docs/docs/plugins/agents.md` documents the new `ephemeral`
  frontmatter key alongside the other AgentDefinition knobs.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Documents the MVP resource caps landed in PR databricks#304: the static
request-body caps (enforced by the Zod schemas) and the three
configurable runtime limits (`maxConcurrentStreamsPerUser`,
`maxToolCalls`, `maxSubAgentDepth`). Includes the config-block
shape in the main reference and a new "Resource limits" subsection
under the Configuration section explaining the intent and per-user
semantics of each cap.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
The agents plugin's manifest `name` is `agents` (plural), so routes mount
at `/api/agents/*` and its client config is keyed as `agents` — but three
call sites still referenced the old singular `agent`:

- apps/agent-app/src/App.tsx: /api/agent/{info,chat,approve} returned an
  Express 404 HTML page, which the client then tried to JSON.parse,
  producing "Unexpected token '<', <!DOCTYPE ...". Swap to /api/agents/*.
- apps/dev-playground/client/src/routes/agent.route.tsx: same three
  paths, plus getPluginClientConfig("agent") returned {} so
  hasAutocomplete was false and the autocomplete hook short-circuited
  before ever firing a request. Swap the lookup key to "agents".
- template/appkit.plugins.json: the scaffolded plugin descriptor still
  used the singular name/key, which would have broken fresh apps the
  same way. Align with the plugin's real manifest name.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…ersions

Pin exact versions instead of ^ for reproducible manifests.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Move reference apps to config/agents/<id>/agent.md; document migration and
reserved skills folder; align generated API snippets and CHANGELOG.
typedoc picked up JSDoc changes from agent/v2/4-agents-plugin:

- New public export `agentIdFromMarkdownPath` (helper for path-based id
  resolution used by `loadAgentFromFile`).
- `loadAgentsFromDir` description/body now reflects the folder layout
  (`<id>/agent.md`, orphan `*.md` rejected, reserved `skills/` dir).

Generated by docusaurus-plugin-typedoc during pnpm --filter=docs build.
… retire agent-app

Stage 0 of the smart-dashboard-demo plan. Ports the prototype Smart
Dashboard (NYC Taxi analytics) from the p3ju worktree into dev-playground
as a new route, migrates its markdown agents to the folder layout, and
deletes apps/agent-app — which is superseded by this demo as the
integration test of the entire v2 agents stack.

Client:

- New route at client/src/routes/smart-dashboard.route.tsx with
  its own subdirectory for components/ and hooks/.
- Ported 8 components (ActiveFilters, AgentSidebar, AnomalyCard,
  FareChart, InsightCard, KPICards, QuerySection, TripChart) and
  4 hooks (useActionDispatcher, useAgentStream, useChartColors,
  useDashboardData) as-is. Relative imports preserved.
- Nav link added in __root.tsx.
- TanStack routeTree.gen.ts auto-regenerated.

Server:

- Ports apply_filter and highlight_period inline tools.
- Adds sql_analyst (code-defined: fromPlugin(analytics)) and
  dashboard_pilot (code-defined: apply_filter + highlight_period)
  per the plan's Q2 = option B decision.
- Adds query markdown dispatcher in config/agents/query/agent.md
  delegating to both specialists via the agents: frontmatter.
- Ports insights and anomaly ephemeral markdown agents.

Config:

- Ports 4 SQL queries into config/queries/dashboard_*.sql.
- Note: shared/appkit-types/analytics.d.ts not regenerated in this
  commit; useAnalyticsQuery("dashboard_*", ...) uses explicit as
  casts and works at runtime. Regenerate with
  'npx @databricks/appkit generate-types' locally when convenient.

Cleanup:

- apps/agent-app/ removed in full. No references outside
  pnpm-lock.yaml (regenerated).
- plans/smart-dashboard-demo.md added with the full staged plan.

Verification:

- pnpm --filter=dev-playground client typecheck: clean.
- pnpm --filter=dev-playground client vite build: clean.
- Server typecheck: same pre-existing errors as main (files plugin
  union type, telemetry CacheManager, playwright DOM lib) — no new
  regressions.

Next stages (1-6, per the plan): dispatcher integration verified,
save_view + approval card, dashboard-context injection + focus_chart,
Stream Inspector, polish, demo script.
Stage 0 ported the dashboard shell verbatim from the prototype; this
commit layers every v2-stack feature on top, moves the feature dir out
of routes/ (TanStack was flagging files as stray routes), rewrites the
agent -> UI action pipeline for correctness, and adds discoverability
for the HITL flow.

Server (apps/dev-playground/server/index.ts)

- Split the polymorphic apply_filter into four narrower tools:
  filter_by_date_range, filter_by_pickup_zip, filter_by_fare,
  clear_filters. Each has exactly one client-side effect; removes the
  whole class of 'agent said it worked but nothing moved' bugs.
- Add clear_highlights, focus_chart, save_view (destructive; triggers
  the approval gate).
- dashboard_pilot instructions rewritten with a compact verb-per-line
  reference so the LLM picks the right single tool for each intent.

Client - moved out of routes/

- Feature code relocates to client/src/features/smart-dashboard/
  (components/, hooks/, lib/). TanStack Router was warning that every
  non-route file under routes/ 'does not contain any route piece.'
- smart-dashboard.route.tsx uses @/features/ aliases; the route file
  is now the only thing under routes/.

Client - correctness fixes in the action dispatcher

- Act only on response.output_item.done (never .added, which fires
  with partial arguments and caused double-applied highlights plus
  silent JSON-parse races).
- Dedupe by call_id with a bounded LRU; reset on appkit.metadata
  (new-run signal).
- Use updater callbacks (onFilterUpdate(prev => ...)) instead of a
  currentFilters prop to eliminate stale-closure bugs when the agent
  fires multiple tool calls in one render cycle.
- Validate arg shapes per tool; anything malformed or unrecognized
  surfaces through onUnknownTool (route renders as a red banner +
  console.warn). Silent failure was the worst failure mode.
- Emit a human-readable summary for every applied action (onAction).

Client - discoverability / HITL

- New QuickActionsBar with Save view... (inline name input), Clear
  filters, Clear highlights. Each dispatches through the chat
  pipeline so the agent still reasons and the approval gate still
  fires for destructive actions - the bar just saves typing.
- ActionToast (bottom-left) confirms every dispatcher-applied action
  for ~3s. Answers 'did it work?' without opening the inspector.
- QuerySection refactored into a view: content/isLoading/onSend come
  from the route. Lifting useAgentStream one level up lets the Quick
  Actions bar and the chat input share a single agent stream.
- QuerySection example queries refreshed to cover the new tools.

Client - stream-inspector wiring

- SSEEvent extended with approval_pending payload fields.
- use-stream-inspector threaded through so every run's events flow
  into the inspector's module-level store.
- FocusableChart renamed its 'id' prop to 'chartId' (logical
  registry key, not a DOM id - biome was right to complain).

Verification

- pnpm --filter=dev-playground client tsc --noEmit: clean.
- pnpm --filter=dev-playground client vite build: clean.
- Server typecheck: same pre-existing errors as main; no new
  regressions.
- apps/dev-playground/shared/appkit-types/analytics.d.ts regenerated
  by vite build to register the four dashboard_* queries; kept in
  the commit so CI and downstream consumers have typed
  useAnalyticsQuery access out of the box.
…iews panel + floating chat

Four feature layers on top of the Smart Dashboard demo. Each is
independently useful; all land together because they share route
plumbing.

Phase A - Approval gate for sub-agent tool calls (packages/appkit)

The earlier event-forwarding fix for runSubAgent surfaced a second gap:
destructive tools invoked through a sub-agent (save_view via
dashboard_pilot) bypassed the approval gate entirely. The gate lived
only in _streamAgent.executeTool; childExecute called dispatchToolCall
directly.

Extract the approval flow (emit appkit.approval_pending, await gate,
handle deny) into a closure the parent stream builds once and passes
down to runSubAgent. runSubAgent propagates it to nested sub-agents
too. Now every destructive tool fires the gate regardless of which
agent in the delegation chain invoked it.

No public API change; runSubAgent's new checkApproval parameter is
private.

Phase B - save_view uploads a dashboard snapshot to a UC volume

Previously the save_view tool was a stub that logged to the console.
Now it actually persists:

- Add 'files' volume binding backed by DATABRICKS_VOLUME_FILES.
- Client captures the dashboard body via html2canvas at scale 0.6,
  JPEG quality 0.75 - sized to fit under express.json's default
  100kb limit.
- ApprovalCard intercepts 'approve' for save_view: capture -> POST
  to /api/dashboard/save-view with name + description + filters +
  highlights + pngBase64 -> then POST to /api/agents/approve.
- Server writes <timestamp>_<slug>.png and sidecar .json into the
  'files' volume under saved-views/. Uses appkit.files(...).asUser(req)
  so OBO scoping works.
- Approval card shows a preview thumbnail + the final volume path.

Phase C - Saved views panel + load_view tool

- New /api/dashboard/saved-views route lists paired .png/.json entries
  in the volume, parses metadata sidecars.
- New /api/dashboard/saved-view-png streams PNG bytes so thumbnail
  <img src> tags work without a general file-download endpoint.
- New 'load_view' tool on dashboard_pilot. Tool arguments carry the
  resolved filters + highlights; use-action-dispatcher applies them
  in one shot. No extra round trip needed since state is in the
  tool-call JSON.
- SavedViewsPanel renders a horizontal strip of thumbnail cards;
  clicking a card dispatches 'Load the saved view X' through the
  chat so the agent loop stays consistent.

Phase D - Floating chat drawer with multi-turn history

The previous QuerySection wiped the UI on every send. Replaced with
ChatDrawer:

- Multi-turn messages accumulate in route state; previous user +
  assistant turns stay visible as the user iterates.
- Streaming assistant content updates the in-progress turn in place.
- Toggled by CMD+J, opens automatically when a new approval arrives
  so users don't miss destructive gates.
- Approval cards render inline in the conversation, pinned to the
  user turn that triggered them.

Extras / hygiene

- query dispatcher markdown prompt expanded: 'save' was missing from
  the pilot verb list, which caused the LLM to respond directly
  from its empty toolset. Added save/clear + explicit delegation
  rule.
- dashboard_pilot tool list includes save_view + load_view.
- QuerySection component removed (superseded by ChatDrawer).

Verification

- pnpm --filter=appkit typecheck clean.
- pnpm --filter=appkit test suite: 264 tests pass.
- dev-playground client tsc + vite build: clean.
Fresh UC volumes don't have a saved-views/ subdirectory until the first
save; the SDK throws FILES_API_DIRECTORY_IS_NOT_FOUND on list. The
route was propagating that as a 500 which rendered as a red error
banner in the SavedViewsPanel on first load.

Catch the error explicitly, return { views: [] }, let the panel render
its 'no saved views yet' empty state cleanly. Uploads still work the
first time because the SDK auto-creates parent dirs on upload.
Previously forwardSubAgentToolEvent allow-listed only tool_call and
tool_result. That was overcautious: it meant the sub-agent's
message_delta / message / thinking / status events never reached the
client, so users stared at 'thinking…' for the entire sub-agent
run and only saw text once the whole run completed.

Now forward everything except metadata. Metadata carries the
sub-agent's own threadId and would overwrite the parent thread on
the client, breaking multi-turn continuity — that's the one thing
still worth filtering.

Also update the query dispatcher prompt so the parent no longer
echoes what the specialist already said. The specialist text streams
through as-is; the dispatcher speaks only when it needs to route,
combine, or add context.
html2canvas 1.x throws on `oklch()` color values, which Tailwind v4 emits
everywhere in computed styles. Swap to the maintained html2canvas-pro
fork (drop-in API) so dashboard captures render without
"Attempting to parse an unsupported color function 'oklch'" errors in
the approval card. Keeps html2canvas pinned so types still resolve.
Databricks SDK `volume.download(path)` returns a wrapper
`{ contents: ReadableStream, "content-type": string }`, not the stream
itself. The previous handler tried to write the wrapper directly, which
produced an empty body and broke thumbnails in the saved-views panel.
Now we read `.contents`, drain the stream, and respond with the
server-reported content-type (falling back to `image/png`).

Also drops a couple of noisy console.logs left over from the debugging
session.
… click

Clicking a saved-view thumbnail was sending a chat prompt like "Load the
saved view 'january'" and letting the agent reconstruct filters from the
view name. That dropped the highlights (agent had no tool to fetch the
stored metadata) so January-with-focus-on-week-1 came back as just
January-wide.

Since the client already holds the full authoritative metadata for the
clicked thumbnail, bypass the agent and apply `meta.filters` and
`meta.highlights` directly to local state, with a toast summarising
what was restored.

Also hardens the `appkit.approval_pending` handler: it now accepts
both snake_case and camelCase fields and validates that
approval_id/tool_name/stream_id are non-empty strings before enqueuing,
so a malformed event can't push a broken approval card.
MarioCadenas and others added 9 commits April 27, 2026 15:27
Picks up the new `annotations?: ToolAnnotations` field on `ToolConfig`
and `FunctionTool` introduced upstream in the annotations-propagation
fix.
…nable agent feed

Reshapes the Smart Dashboard demo from a sparse 2-chart layout into a 2x2
chart grid with a right-rail agent feed, and turns the previously
read-only insights/anomaly cards into clickable actions that drive the
dashboard directly.

New visualisations:
- HourlyHeatmap: day-of-week × hour-of-day grid, click a cell to ask the
  agent to investigate that slot.
- TopZonesChart: hand-rolled horizontal bar leaderboard with click-to-
  filter and a `highlight_zone` ring driven by the agent.
- KPI sparklines: inline 7-day micro-charts with windowed trend deltas
  baked into each KPI card.

Agent feed becomes interactive:
- `feed-actions.ts` defines a structured action schema (filter_date,
  filter_zip, filter_fare, highlight_period, highlight_zone, focus_chart,
  ask) and a parser. The `insights` and `anomaly` ephemeral agents now
  emit JSON matching that schema.
- `ActionableCard` renders insights/anomalies with action chips that
  invoke `useActionDispatcher.dispatch` directly — same code path the
  SSE function-call handler uses, so UI clicks and agent tool calls
  behave identically.
- The feed re-runs (debounced) whenever filters or highlights change.

Server-side wiring:
- Adds `highlight_zone` and `clear_zone_highlights` tools.
- Extends the `focus_chart` enum with `hourly_heatmap` and `top_zones`.
- Updates `dashboard_pilot` instructions to prefer `highlight_zone` over
  `filter_by_pickup_zip` when calling out a single ZIP.
- Adds three SQL queries: `dashboard_hourly_heatmap`,
  `dashboard_top_zones`, `dashboard_kpi_sparklines`. The top-zones query
  casts `pickup_zip` (an INT in samples.nyctaxi.trips) to STRING so the
  client's highlight Map keys, the agent's `highlight_zone` arg, and the
  filter parameter all speak the same type.

Polish & defensive fixes:
- Defensive `Number()` coercion in `kpi-cards.tsx` for sparkline values
  so trend math doesn't render `NaN%` or string-concatenated revenue
  totals if a driver hands back DECIMAL-as-string.
- `Sparkline` reserves vertical space for intentionally-empty series
  (e.g. the categorical "Top Pickup Zone" KPI) instead of rendering a
  loading-style placeholder.
- 2x2 chart grid uses `items-start` + `auto-rows-min content-start` so
  the rail no longer stretches the chart column and creates dead space.
- `ChatDrawer` becomes a controlled component (`open` + `onOpenChange`)
  so any agent-triggering UI action can auto-open the chat — the user
  always sees the agent's response without manual disclosure.
The playground header was unscalable: 14 demo links rendered as
side-by-side buttons that overflowed on narrow screens, and the home
page maintained a parallel hand-curated grid that had already drifted
(missing Smart Dashboard, Chart Inference, Vector Search, Policy Matrix,
and Serving — ~30% of the catalog).

Introduces `client/src/lib/nav.ts` as the single source of truth: each
demo declares its label, one-line description, lucide icon, and category
group. Both surfaces now read from the same list, so adding a demo is a
one-line change and they can no longer drift.

Header (`__root.tsx`):
- Replaces the button wall with a single "Menu" hamburger dropdown
  grouping demos by purpose (Data / AI / Platform).
- Active route is highlighted inside the dropdown and shown breadcrumb-
  style next to the brand, so the user always knows where they are.
- Caps dropdown height at viewport-minus-header with overflow scroll, so
  adding more demos won't break the layout.

Home page (`index.tsx`):
- Restrained hero with a soft dual-radial gradient wash (~6-8% opacity,
  primary + accent) — depth without saturation.
- Featured card for the Smart Dashboard flagship demo: gradient accent,
  icon tile, eyebrow badge, animated CTA. The featured demo also appears
  in its category grid, de-emphasised with a "Featured above" note.
- Three category sections with one-line taglines, rendered as a 1/2/3-col
  responsive grid of icon + title + description cards. Each card is a
  real `<Link>` (not a button inside a decorative `<Card>`), so the whole
  surface is keyboard-accessible.
- Footer shows live demo and category counts driven by the catalog.
…tive

Retag save_view as effect: "write" (it creates a PNG; it doesn't
delete anything) and teach the approval card to render three distinct
tiers. Capturing a screenshot no longer masquerades as deletion:
writes get a calm blue card with a plus-circle icon, updates get a
warning-amber card with a pencil, and real destructive actions retain
the red shield-alert. Legacy destructive: true still maps to the red
tier, so tools that haven't migrated keep their current look.
Tailwind v4 compiles `bg-blue-50/50` to a two-layer rule: an sRGB hex
fallback plus an `@supports (color-mix)` override that mixes the oklch
palette token with transparent in oklab. Browsers with color-mix support
(recent Chrome/Arc) take the oklab path; older embedded Chromiums (e.g.
Cursor's built-in browser) fall through to the sRGB hex. Those two paths
produce visibly different tints against the dark `--card` token, which
is why the agent-feed cards rendered inconsistently across Chrome, Arc,
and Cursor's browser.

Pin the four insight/anomaly-tier backgrounds to arbitrary 8-digit hex
(`bg-[#eff6ff80]` etc.) so every browser lands on the same sRGB path.
Values taken from Tailwind's own fallback output to preserve the
intended look on color-mix-capable browsers.
appkit-ui's globals.css already defines dark-theme tokens via two paths
— an explicit `.dark` class on <html>, and `@media (prefers-color-scheme:
dark)` guarded by `:root:not(.light)` so an explicit `.light` class
wins. Tailwind v4's default `dark:` variant, however, is purely media
driven. That mismatch shows up when the user forces light via the
playground's theme selector while their OS is in dark mode: the
bootstrap script sets `<html class="light">`, --card/--background
correctly resolve to light, but every `dark:*` utility keeps firing
under the media query — cards end up painted with dark-mode
backgrounds layered under light-mode chrome.

Declare a playground-local `@custom-variant dark` that mirrors the
token logic exactly: fire when the element is (or descends from)
`.dark`, or when `prefers-color-scheme: dark` matches and no `.light`
ancestor is present. This rebinds every `dark:*` utility to respect
the theme selector's forced choice, keeping the rest of appkit-ui's
consumers — which don't ship the bootstrap script — on the existing
media-only behaviour.
The streaming-message bubble in the smart-dashboard chat drawer used
`animate-pulse` while tokens arrived. The constant fade in/out reads
as visual noise when the agent is mid-stream — especially with longer
replies where it pulses for many seconds. Drop the animation; the
ellipsis placeholder still communicates the loading state for empty
streaming bubbles.
`server({ autoStart: false }).then(appkit => appkit.server.extend(...).start())`
is gone — `createApp` now orchestrates server start itself, with the
post-setup hook surfaced as the `onPluginsReady` config callback.

Drop `autoStart: false`, hoist the `extend` block from the trailing
`.then` chain into `onPluginsReady`, and replace the dangling promise
with `.catch(console.error)` so unhandled rejections still surface.

Tracks databricks#280 / databricks#291 (autoStart removal + on-plugins-ready codemod).
@hubertzub-db hubertzub-db requested a review from a team as a code owner May 5, 2026 13:30
@hubertzub-db hubertzub-db requested a review from pkosiec May 5, 2026 13:30
@hubertzub-db hubertzub-db changed the title Agent/v2/sa/1 adapter Agents plugin: supervisor API adapter May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants