Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
505 changes: 505 additions & 0 deletions .claude/skills/supervisor-api-background-mode/SKILL.md
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the sync-skills.py file wasn't updated with the new skill?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it might be useful to reference these skills in the base AGENTS.md to improve discovery, perhaps we can also mention there what supervisor API is, and when the agent should suggest it to the user? This would help enable customer discovery of this new offering as well

Large diffs are not rendered by default.

220 changes: 220 additions & 0 deletions .claude/skills/supervisor-api/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
---
name: supervisor-api
description: "Replace the client-side agent loop with Databricks Supervisor API (hosted tools). Use when: (1) User asks about Supervisor API, (2) User wants Databricks to run the agent loop server-side, (3) Connecting Genie spaces, UC functions, agent endpoints, or MCP servers as hosted tools."
---

# Use the Databricks Supervisor API

The Supervisor API lets Databricks run the tool-selection and synthesis loop server-side. Instead of your agent managing tool calls and looping, you declare hosted tools and call `responses.create()` — Databricks handles the rest.

## When to Use

Use the Supervisor API when you want Databricks to manage the full agent loop for hosted tools: Genie spaces, UC functions, KA (Knowledge Assistant) agent endpoints, or MCP servers via UC connections.

**Limitations:**
- Cannot mix hosted tools with client-side function tools in the same request
- Inference parameters (e.g., `temperature`, `top_p`) are not supported when tools are passed

## Step 1: Install `databricks-openai`

Add to `pyproject.toml` if not already present:

```toml
[project]
dependencies = [
...
"databricks-openai>=0.14.0",
"databricks-sdk>=0.55.0",
]
```

Then run `uv sync`.

## Step 2: Declare Hosted Tools

Define your tools as a list of dicts. Run `uv run discover-tools` to find available resources in your workspace.

```python
TOOLS = [
# Genie space — natural language queries over structured data
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we change these? We renamed stuff to this tool spec: https://openapi.dev.databricks.com/pr-1737623/api/workspace/supervisoragents/createtool

"type": "genie_space",
"genie_space": {
"description": "Query sales data using natural language",
"id": "<genie-space-id>",
},
},
# UC function — SQL or Python UDF
{
"type": "uc_function",
"uc_function": {
"name": "<catalog>.<schema>.<function_name>",
"description": "Executes a custom UC function",
},
},
# Knowledge Assistant agent endpoint
{
"type": "knowledge_assistant",
"knowledge_assistant": {
"description": "A Knowledge Assistant agent",
"knowledge_assistant_id": "<ka-id>",
},
},
# MCP server via UC connection
{
"type": "connection",
"connection": {
"description": "An MCP server via UC connection",
"name": "<uc-connection-name>",
},
},
# Databricks Apps MCP server
{
"type": "app",
"app": {
"description": "An MCP server running as a Databricks App",
"name": "<databricks-app-name>",
},
},
]
```

## Step 3: Update `agent_server/agent.py`

Replace your existing invoke/stream handlers with the Supervisor API pattern. Remove any MCP client setup, LangGraph agents, or OpenAI Agents SDK runner code — the Supervisor API replaces the client-side loop entirely.

`use_ai_gateway=True` automatically resolves the correct AI Gateway endpoint for the workspace.

When deployed on Databricks Apps, the platform forwards the authenticated user's token via `x-forwarded-access-token`. Pass this to the Supervisor API so tool calls (e.g., Genie queries) run on behalf of the user rather than the app's service principal.

```python
import mlflow
from databricks.sdk import WorkspaceClient
from databricks.sdk.config import Config
from databricks_openai import DatabricksOpenAI
from mlflow.genai.agent_server import invoke, stream
from mlflow.types.responses import (
ResponsesAgentRequest,
ResponsesAgentResponse,
)

mlflow.openai.autolog()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed a couple spots where we hardcode openai like this. Is it worth having a separate langchain example??


MODEL = "databricks-claude-sonnet-4-5"
TOOLS = [...] # From Step 2

# Resolve and cache the AI Gateway URL once at module load
_wc = WorkspaceClient()
_client = DatabricksOpenAI(workspace_client=_wc, use_ai_gateway=True)
_ai_gateway_base_url = str(_client.base_url)


def _get_client(obo_token: str | None = None) -> DatabricksOpenAI:
"""Return a client using the OBO token if provided, else service principal."""
if obo_token:
obo_wc = WorkspaceClient(
config=Config(host=_wc.config.host, token=obo_token)
)
return DatabricksOpenAI(workspace_client=obo_wc, base_url=_ai_gateway_base_url)
return _client


def _obo_token(request: ResponsesAgentRequest) -> str | None:
return (request.custom_inputs or {}).get("x-forwarded-access-token")


@invoke()
def invoke_handler(request: ResponsesAgentRequest) -> ResponsesAgentResponse:
mlflow.update_current_trace(
metadata={"mlflow.trace.session": request.context.conversation_id}
)
response = _get_client(_obo_token(request)).responses.create(
model=MODEL,
input=[i.model_dump() for i in request.input],
tools=TOOLS,
stream=False,
)
return ResponsesAgentResponse(output=[item.model_dump() for item in response.output])


@stream()
def stream_handler(request: ResponsesAgentRequest):
mlflow.update_current_trace(
metadata={"mlflow.trace.session": request.context.conversation_id}
)
return _get_client(_obo_token(request)).responses.create(
model=MODEL,
input=[i.model_dump() for i in request.input],
tools=TOOLS,
stream=True,
)
```

> **OBO note:** The `x-forwarded-access-token` is injected into `custom_inputs` by the app server middleware. No changes are needed to the client — the token arrives automatically when users call your deployed app.

## Step 4: Grant Permissions in `databricks.yml`

For each hosted tool, grant the corresponding resource access. See the **add-tools** skill for complete YAML examples.

| Tool type | Resource to grant |
|-----------|-------------------|
| `genie_space` | `genie_space` with `CAN_RUN` |
| `uc_function` | `uc_securable` (FUNCTION) with `EXECUTE` |
| `knowledge_assistant` | `serving_endpoint` with `CAN_QUERY` |
| `connection` | `uc_securable` (CONNECTION) with `USE_CONNECTION` |
| `app` | *(Databricks App access)* |

Also grant `CAN_QUERY` on the `MODEL` serving endpoint:

```yaml
- name: 'model-endpoint'
serving_endpoint:
name: 'databricks-claude-sonnet-4-5'
permission: 'CAN_QUERY'
```

## Step 5: Test and Deploy

```bash
uv run start-app # Test locally
databricks bundle deploy && databricks bundle run {{BUNDLE_NAME}} # Deploy
```

## MCP Server Tools: Multi-Turn Approval Flow

When using MCP server tools (`connection` or `app`), the Supervisor API does **not** execute the MCP tool call in a single request. Instead, it returns a `completed` response containing `mcp_approval_request` output items. To complete the tool call, your agent must handle a multi-turn flow:

1. **First request** — `responses.create()` → response completes with `mcp_approval_request` items in the output
2. **Return to frontend** — the `mcp_approval_request` item is returned to the chat UI so the user can approve the tool call
3. **Second request** — user approves → frontend sends a new request with the original input + `mcp_approval_request` + `mcp_approval_response` (with `approve: true`) appended to the input
4. **Result** — the second response completes with the actual `function_call_output` (tool result) and the final assistant `message`

No special backend handling is needed — the agent server simply returns all output items (including `mcp_approval_request`) to the frontend. The multi-turn flow is handled naturally through the conversation: each request/response is a separate `responses.create()` call.

**Example input for the follow-up request (step 3):**
```python
input = [
# Original user message
{"type": "message", "role": "user", "content": "Search for Databricks"},
# The mcp_approval_request from the first response's output
{"type": "mcp_approval_request", "id": "call_xxx", "name": "web-search",
"server_label": "you_dot_com", "arguments": '{"query": "Databricks"}'},
# The approval
{"type": "mcp_approval_response", "id": "call_xxx",
"approval_request_id": "call_xxx", "approve": True},
]
```

## Troubleshooting

**"Please ensure AI Gateway V2 is enabled"** — AI Gateway must be enabled for the workspace. Contact your Databricks account team.

**"Cannot mix hosted and client-side tools"** — Remove any `function`-type tools (Python callables) from `TOOLS`. All tools must be hosted types (`genie_space`, `uc_function`, `knowledge_assistant`, `connection`, `app`).

**"Parameter not supported when tools are provided"** — Remove `temperature`, `top_p`, or other inference parameters from the `responses.create()` call.

## Background Mode (Long-Running Tasks)

If your agent needs to run long-running tasks that may exceed HTTP timeout limits (e.g., complex multi-tool workflows, large data analysis), you can enable **background mode**. This submits the request asynchronously, polls for completion, and streams the result back to the frontend.

See the **supervisor-api-background-mode** skill for full implementation details.
4 changes: 4 additions & 0 deletions .scripts/sync-skills.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,10 @@ def sync_template(template: str, config: dict):
# Deploy skill (with substitution)
copy_skill(SOURCE / "deploy", dest / "deploy", subs)

# Supervisor API skills (with substitution for bundle name in deploy command)
copy_skill(SOURCE / "supervisor-api", dest / "supervisor-api", subs)
copy_skill(SOURCE / "supervisor-api-background-mode", dest / "supervisor-api-background-mode", subs)

# SDK-specific skills (with substitution for bundle name references)
if isinstance(sdk, list):
# Multiple SDKs: copy skills for each, keeping SDK suffix in name
Expand Down
Loading