Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .claude/skills/databricks-python-sdk/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -613,3 +613,13 @@ If I'm unsure about a method, I should:
| Pipelines | https://databricks-sdk-py.readthedocs.io/en/latest/workspace/pipelines/pipelines.html |
| Secrets | https://databricks-sdk-py.readthedocs.io/en/latest/workspace/workspace/secrets.html |
| DBUtils | https://databricks-sdk-py.readthedocs.io/en/latest/dbutils.html |

## Related Skills

- **[databricks-config](../databricks-config/SKILL.md)** - profile and authentication setup
- **[databricks-bundles](../databricks-bundles/SKILL.md)** - deploying resources via DABs
- **[databricks-jobs](../databricks-jobs/SKILL.md)** - job orchestration patterns
- **[databricks-unity-catalog](../databricks-unity-catalog/SKILL.md)** - catalog governance
- **[databricks-model-serving](../databricks-model-serving/SKILL.md)** - serving endpoint management
- **[databricks-vector-search](../databricks-vector-search/SKILL.md)** - vector index operations
- **[databricks-lakebase-provisioned](../databricks-lakebase-provisioned/SKILL.md)** - managed PostgreSQL via SDK
3 changes: 3 additions & 0 deletions databricks-builder-app/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -32,5 +32,8 @@ client/.vite/
.vscode/
*.swp

# Local docs/notes
docs/TODO.md

# OS
.DS_Store
137 changes: 102 additions & 35 deletions databricks-builder-app/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -503,18 +503,21 @@ databricks auth login --host https://your-workspace.cloud.databricks.com
# 2. Create the app (first time only)
databricks apps create my-builder-app

# 3. Add Lakebase as a resource (first time only)
# 3. Configure app.yaml (copy and edit the example)
cp app.yaml.example app.yaml
# Edit app.yaml — set LAKEBASE_ENDPOINT (autoscale) or LAKEBASE_INSTANCE_NAME (provisioned)

# 4. (Provisioned Lakebase only) Add Lakebase as an app resource
# Skip this step if using autoscale — it connects via OAuth directly.
databricks apps add-resource my-builder-app \
--resource-type database \
--resource-name lakebase \
--database-instance <your-lakebase-instance-name>

# 4. Configure app.yaml (copy and edit the example)
cp app.yaml.example app.yaml
# Edit app.yaml with your Lakebase instance name and other settings

# 5. Deploy
./scripts/deploy.sh my-builder-app

# 6. Grant database permissions to the app's service principal (see Section 7)
```

### Step-by-Step Deployment Guide
Expand Down Expand Up @@ -564,6 +567,10 @@ The app requires a PostgreSQL database (Lakebase) for storing projects, conversa

#### 4. Add Lakebase as an App Resource

**Autoscale Lakebase**: Skip this step. Autoscale connects via OAuth using `LAKEBASE_ENDPOINT` — no app resource needed.

**Provisioned Lakebase**: Add the instance as an app resource:

```bash
databricks apps add-resource my-builder-app \
--resource-type database \
Expand Down Expand Up @@ -647,25 +654,89 @@ The deploy script will:

#### 7. Grant Database Permissions

After the first deployment, grant table permissions to the app's service principal:
After the first deployment, the app's service principal needs two things:
1. A **Lakebase OAuth role** (so it can authenticate via OAuth tokens)
2. **PostgreSQL grants** on the `builder_app` schema (so it can create/read/write tables)

```sql
-- Run this in a Databricks notebook or SQL editor
-- Replace <service-principal-id> with your app's service principal
##### Step 7a: Find the service principal's client ID

```bash
SP_CLIENT_ID=$(databricks apps get my-builder-app --output json | jq -r '.service_principal_client_id')
echo $SP_CLIENT_ID
```

##### Step 7b: Create a Lakebase OAuth role for the SP

GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public
TO `<service-principal-id>`;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public
TO `<service-principal-id>`;
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT ALL ON TABLES TO `<service-principal-id>`;
> **Important**: Do NOT use PostgreSQL `CREATE ROLE` directly. Lakebase Autoscaling requires
> roles to be created through the Databricks API so the OAuth authentication layer recognizes them.

```python
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.postgres import Role, RoleRoleSpec, RoleAuthMethod, RoleIdentityType

w = WorkspaceClient()

# Replace with your branch path and SP client ID
branch = "projects/<project-id>/branches/<branch-id>"
sp_client_id = "<sp-client-id>"

w.postgres.create_role(
parent=branch,
role=Role(
spec=RoleRoleSpec(
postgres_role=sp_client_id,
auth_method=RoleAuthMethod.LAKEBASE_OAUTH_V1,
identity_type=RoleIdentityType.SERVICE_PRINCIPAL,
)
),
).wait()
```

To find your app's service principal ID:
Or via CLI:

```bash
databricks apps get my-builder-app --output json | jq '.service_principal_id'
databricks postgres create-role \
"projects/<project-id>/branches/<branch-id>" \
--json '{
"spec": {
"postgres_role": "<sp-client-id>",
"auth_method": "LAKEBASE_OAUTH_V1",
"identity_type": "SERVICE_PRINCIPAL"
}
}'
```

**Provisioned Lakebase**: This step is not needed — adding the instance as an app resource
(Step 4) automatically configures authentication.

##### Step 7c: Grant PostgreSQL permissions

Connect to your Lakebase database as your own user (via psql or a notebook) and run:

```sql
-- Replace <sp-client-id> with the service_principal_client_id

-- 1. Allow the SP to create the builder_app schema
GRANT CREATE ON DATABASE databricks_postgres TO "<sp-client-id>";

-- 2. Create the schema and grant full access
CREATE SCHEMA IF NOT EXISTS builder_app;
GRANT USAGE ON SCHEMA builder_app TO "<sp-client-id>";
GRANT ALL PRIVILEGES ON SCHEMA builder_app TO "<sp-client-id>";

-- 3. Grant access to any existing tables/sequences (needed if you ran migrations locally)
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA builder_app TO "<sp-client-id>";
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA builder_app TO "<sp-client-id>";

-- 4. Ensure the SP has access to future tables/sequences created by other users
ALTER DEFAULT PRIVILEGES IN SCHEMA builder_app
GRANT ALL ON TABLES TO "<sp-client-id>";
ALTER DEFAULT PRIVILEGES IN SCHEMA builder_app
GRANT ALL ON SEQUENCES TO "<sp-client-id>";
```

After granting permissions, redeploy the app so it can run migrations with the new role.

#### 8. Access Your App

After successful deployment, the script will display your app URL:
Expand Down Expand Up @@ -706,33 +777,29 @@ Skills are copied from the sibling `databricks-skills/` directory. Ensure:
2. The skill name in `ENABLED_SKILLS` matches a directory in `databricks-skills/`
3. The skill directory contains a `SKILL.md` file

#### "Permission denied for table projects" or Database Errors
#### "password authentication failed" or "Permission denied for table projects"

When using a shared Lakebase instance, you need to grant the app's service principal permissions on the tables:
See [Section 7: Grant Database Permissions](#7-grant-database-permissions) for the complete setup.

```bash
# 1. Get your app's service principal ID
databricks apps get my-builder-app --output json | python3 -c "import sys, json; print(json.load(sys.stdin)['service_principal_id'])"
```

2. Connect to your Lakebase instance via psql or a Databricks notebook, then run:
Common causes:

```sql
-- Replace <service-principal-id> with the ID from step 1
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO "<service-principal-id>";
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO "<service-principal-id>";
GRANT USAGE ON SCHEMA public TO "<service-principal-id>";
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO "<service-principal-id>";
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON SEQUENCES TO "<service-principal-id>";
```
| Error | Cause | Fix |
|-------|-------|-----|
| `password authentication failed` | Lakebase OAuth role missing or created via SQL instead of API | Create the role via `w.postgres.create_role()` with `LAKEBASE_OAUTH_V1` auth (Step 7b) |
| `permission denied for table` | SP lacks PostgreSQL grants on schema/tables | Run the GRANT statements (Step 7c) |
| `schema "builder_app" does not exist` | SP lacks `CREATE` on the database | `GRANT CREATE ON DATABASE databricks_postgres TO "<sp-client-id>"` |
| `relation does not exist` | Migrations haven't run | Redeploy the app, or run `alembic upgrade head` locally |

Alternatively, if you have a fresh/private Lakebase instance, the app's migrations will create the tables with proper ownership automatically.
> **Autoscale Lakebase pitfall**: Do NOT use `CREATE ROLE ... LOGIN` in PostgreSQL directly.
> Lakebase Autoscaling requires roles to be created through the Databricks API so that OAuth
> token authentication works. Manually created roles get `NO_LOGIN` auth and will fail with
> "password authentication failed".

#### App shows blank page or "Not Found"

Check the app logs in Databricks:
```bash
databricks apps get-logs my-builder-app
databricks apps logs my-builder-app
```

Common causes:
Expand Down
5 changes: 3 additions & 2 deletions databricks-builder-app/alembic/env.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,10 @@
def get_url_and_connect_args():
"""Get database URL and connect_args from environment.

Supports two modes:
Supports three modes:
1. Static URL: Uses LAKEBASE_PG_URL directly
2. Dynamic OAuth: Builds URL from LAKEBASE_INSTANCE_NAME + generates token
2. Autoscale OAuth: Builds URL from LAKEBASE_ENDPOINT + generates token via client.postgres
3. Provisioned OAuth: Builds URL from LAKEBASE_INSTANCE_NAME + generates token via client.database

Returns tuple of (url, connect_args) for psycopg2 driver.
"""
Expand Down
34 changes: 20 additions & 14 deletions databricks-builder-app/app.yaml.example
Original file line number Diff line number Diff line change
Expand Up @@ -28,29 +28,35 @@ env:
# =============================================================================
# Skills Configuration
# =============================================================================
# Comma-separated list of skills to enable
# Comma-separated list of skills to enable.
# AUTO-POPULATED by deploy.sh at deploy time based on installed skills.
# To override, set a specific list here before deploying.
- name: ENABLED_SKILLS
value: "databricks-bundles,databricks-agent-bricks,databricks-aibi-dashboards,databricks-app-apx,databricks-app-python,databricks-config,databricks-docs,databricks-jobs,databricks-python-sdk,databricks-unity-catalog,databricks-mlflow-evaluation,databricks-spark-declarative-pipelines,databricks-synthetic-data-gen,databricks-unstructured-pdf-generation"
value: ""
- name: SKILLS_ONLY_MODE
value: "false"

# =============================================================================
# Database Configuration (Lakebase)
# =============================================================================
# IMPORTANT: You must add Lakebase as an app resource for database connectivity.
#
# Steps:
# 1. Create a Lakebase instance in your workspace (if not exists)
# 2. Add it as an app resource:
# databricks apps add-resource <app-name> \
# --resource-type database \
# --resource-name lakebase \
# --database-instance <your-lakebase-instance-name>
# Choose ONE of the two options below.
#
# When added as a resource, Databricks automatically sets:
# - PGHOST, PGPORT, PGUSER, PGPASSWORD, PGDATABASE
# --- Option A: Autoscale Lakebase (recommended) ---
# Scales to zero when idle. No add-resource step needed — connects via OAuth.
# Find endpoint name in: Catalog → Lakebase → your project → Branches → Endpoints
#
# - name: LAKEBASE_ENDPOINT
# value: "projects/<project-name>/branches/production/endpoints/<endpoint>"
# - name: LAKEBASE_DATABASE_NAME
# value: "databricks_postgres"
#
# --- Option B: Provisioned Lakebase ---
# Fixed-capacity instance. Must add as an app resource:
# databricks apps add-resource <app-name> \
# --resource-type database \
# --resource-name lakebase \
# --database-instance <your-lakebase-instance-name>
#
# You only need to specify the instance name for OAuth token generation:
- name: LAKEBASE_INSTANCE_NAME
value: "<your-lakebase-instance-name>"
- name: LAKEBASE_DATABASE_NAME
Expand Down
File renamed without changes.
Loading