Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 12 additions & 12 deletions .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,16 @@ jobs:
- uses: actions/checkout@v4
- name: Install Databricks CLI
run: curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
- name: Validate bundle (dev)
- name: Validate bundle (demo)
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
DOCINTEL_WAREHOUSE_ID: ${{ vars.DOCINTEL_WAREHOUSE_ID }}
run: databricks bundle validate --strict -t dev --var "warehouse_id=$DOCINTEL_WAREHOUSE_ID"
run: databricks bundle validate --strict -t demo --var "warehouse_id=$DOCINTEL_WAREHOUSE_ID"

deploy-dev:
deploy-demo:
# CI assumes steady-state: the first-ever bring-up of a workspace must be
# done locally via `./scripts/bootstrap-dev.sh`, which handles the
# done locally via `./scripts/bootstrap-demo.sh`, which handles the
# foundation/consumers staging and waits for Lakebase AVAILABLE. After
# that initial bring-up, every push to main runs a full bundle deploy
# against the now-existing resources — no temp-rename trick (DAB would
Expand Down Expand Up @@ -52,7 +52,7 @@ jobs:
# wait_for_kpis / log_and_register use. Without --var, the bundle
# falls back to its `lookup: warehouse: Serverless Starter Warehouse`
# default and silently picks a different ID.
run: databricks bundle deploy -t dev --var "warehouse_id=$DOCINTEL_WAREHOUSE_ID"
run: databricks bundle deploy -t demo --var "warehouse_id=$DOCINTEL_WAREHOUSE_ID"

- name: Wait for Lakebase instance to be AVAILABLE
# Lakebase already exists in steady-state but a config change can
Expand All @@ -61,7 +61,7 @@ jobs:
run: |
python -c "
import json, os, sys, time, subprocess
name = os.environ.get('LAKEBASE_NAME') or 'docintel-dev-state-v3'
name = os.environ.get('LAKEBASE_NAME') or 'docintel-demo-state-v1'
deadline = time.time() + 600
while True:
out = subprocess.run(['databricks','api','get','/api/2.0/database/instances','--output','json'],
Expand All @@ -79,7 +79,7 @@ while True:
time.sleep(15)
"
env:
LAKEBASE_NAME: ${{ vars.DOCINTEL_LAKEBASE_NAME || 'docintel-dev-state-v3' }}
LAKEBASE_NAME: ${{ vars.DOCINTEL_LAKEBASE_NAME || 'docintel-demo-state-v1' }}

- name: Refresh data — upload samples, run pipeline, register new model version
run: |
Expand All @@ -88,11 +88,11 @@ while True:
"dbfs:/Volumes/${DOCINTEL_CATALOG}/${DOCINTEL_SCHEMA}/raw_filings/" \
--overwrite
done
databricks bundle run -t dev --var "warehouse_id=$DOCINTEL_WAREHOUSE_ID" doc_intel_pipeline
databricks bundle run -t demo --var "warehouse_id=$DOCINTEL_WAREHOUSE_ID" doc_intel_pipeline
python scripts/wait_for_kpis.py --min-rows 3 --timeout 900
# --serving-endpoint repoints the existing endpoint to the new
# model version in-place (steady-state idempotent operation).
python agent/log_and_register.py --target dev --serving-endpoint analyst-agent-dev
python agent/log_and_register.py --target demo --serving-endpoint analyst-agent-demo

- name: Apply UC grants (catalog + schema; not DAB-supported)
# UC requires the full chain: USE_CATALOG → USE_SCHEMA → SELECT/EXECUTE.
Expand All @@ -112,15 +112,15 @@ while True:
# Databricks Apps deploy docs:
# https://docs.databricks.com/aws/en/dev-tools/databricks-apps/deploy
# `bundle deploy` alone uploads code but doesn't apply config/restart.
run: databricks bundle run -t dev --var "warehouse_id=$DOCINTEL_WAREHOUSE_ID" analyst_app
run: databricks bundle run -t demo --var "warehouse_id=$DOCINTEL_WAREHOUSE_ID" analyst_app

- name: Verify OBO scopes survived deploy
# `bundle run` may wipe user_api_scopes (documented destructive-update
# behavior). Fail loudly so we re-apply. Skipped when user_api_scopes
# are not declared (workspace feature off).
run: |
if grep -q '^ user_api_scopes:' resources/consumers/analyst.app.yml; then
databricks apps get doc-intel-analyst-dev --output json > /tmp/app.json
databricks apps get doc-intel-analyst-demo --output json > /tmp/app.json
python -c "
import json
app = json.load(open('/tmp/app.json'))
Expand All @@ -134,4 +134,4 @@ assert not missing, f'OBO scopes missing: {sorted(missing)} (got {sorted(scopes)
fi

- name: CLEARS evaluation gate
run: python evals/clears_eval.py --endpoint analyst-agent-dev --dataset evals/dataset.jsonl
run: python evals/clears_eval.py --endpoint analyst-agent-demo --dataset evals/dataset.jsonl
24 changes: 12 additions & 12 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,17 @@ For an end-to-end overview written for humans, read [`README.md`](./README.md).

## Critical: deploy ordering hazard (READ FIRST before touching deploys)

The bundle has three chicken-egg dependencies that prevent a single `databricks bundle deploy -t dev` from succeeding on a fresh workspace:
The bundle has three chicken-egg dependencies that prevent a single `databricks bundle deploy -t demo` from succeeding on a fresh workspace:

1. **Model Serving endpoint** references a registered model version that doesn't exist until `agent/log_and_register.py` runs.
2. **Lakehouse Monitor** (`resources/consumers/kpi_drift.yml`) attaches to `gold_filing_kpis`, which doesn't exist until the pipeline runs once.
3. **Lakebase database_catalog + Databricks App** race the `database_instance` provisioning.

**Canonical fix**: Run `./scripts/bootstrap-dev.sh` for fresh stand-ups; plain `databricks bundle deploy -t dev` for steady-state. The script does a **staged deploy** — `resources/` is split into `foundation/` (no data deps) and `consumers/` (need data). Stage 1 temporarily renames consumer YAMLs to `*.yml.skip` so the bundle glob skips them; stage 2 produces data and then runs full `bundle deploy`. **Both deploys succeed cleanly** — no "errors tolerated" hand-waving, no orphans to clean up on retry.
**Canonical fix**: Run `./scripts/bootstrap-demo.sh` for fresh stand-ups; plain `databricks bundle deploy -t demo` for steady-state. The script does a **staged deploy** — `resources/` is split into `foundation/` (no data deps) and `consumers/` (need data). Stage 1 temporarily renames consumer YAMLs to `*.yml.skip` so the bundle glob skips them; stage 2 produces data and then runs full `bundle deploy`. **Both deploys succeed cleanly** — no "errors tolerated" hand-waving, no orphans to clean up on retry.

**Do NOT try to "fix" these by:**
- Adding `depends_on` between heterogeneous DAB resource types — DAB doesn't reliably honor it across instance↔catalog↔app.
- Switching `resources/consumers/agent.serving.yml` to UC alias syntax (`@dev`) — DAB serving config may reject alias syntax; that's why `_promote_serving_endpoint` exists in `agent/log_and_register.py`.
- Switching `resources/consumers/agent.serving.yml` to UC alias syntax (`@demo`) — DAB serving config may reject alias syntax; that's why `_promote_serving_endpoint` exists in `agent/log_and_register.py`.
- Splitting monitors into a separate target overlay — adds complexity for a one-time concern.

Full breakdown lives in [`docs/runbook.md`](./docs/runbook.md) §"Known deploy ordering gaps".
Expand All @@ -38,8 +38,8 @@ app/ Streamlit on Databricks Apps + Lakebase psycopg client
evals/ MLflow CLEARS gate (clears_eval.py + dataset.jsonl)
jobs/ Lakeflow Jobs Python tasks (retention, index_refresh)
resources/foundation/ DAB resources with no data deps: catalog/schema/volume, pipeline, retention job, Lakebase instance
resources/consumers/ DAB resources that depend on foundation data: serving endpoint, monitor, VS endpoint, index-refresh job, app, dashboard, Lakebase catalog
scripts/ Operational scripts (bootstrap-dev.sh, wait_for_kpis.py)
resources/consumers/ DAB resources that depend on foundation data: serving endpoint, monitor, index-refresh job, app, dashboard, Lakebase catalog
scripts/ Operational scripts (bootstrap-demo.sh, wait_for_kpis.py)
samples/ Synthetic 10-K PDFs (regenerable via synthesize.py)
specs/001-… Spec-Kit artifacts (spec, plan, tasks, research, data-model, contracts, quickstart)
docs/runbook.md Day-2 ops + bring-up workflow
Expand All @@ -48,16 +48,16 @@ docs/runbook.md Day-2 ops + bring-up workflow

## Build & deploy

- Validate: `databricks bundle validate -t dev`
- Fresh stand-up: `./scripts/bootstrap-dev.sh` (requires `DOCINTEL_CATALOG`, `DOCINTEL_SCHEMA`, `DOCINTEL_WAREHOUSE_ID`)
- Steady-state deploy: `databricks bundle deploy -t dev`
- Run pipeline: `databricks bundle run -t dev doc_intel_pipeline`
- Run eval: `python evals/clears_eval.py --endpoint analyst-agent-dev --dataset evals/dataset.jsonl`
- Validate: `databricks bundle validate -t demo`
- Fresh stand-up: `./scripts/bootstrap-demo.sh` (requires `DOCINTEL_CATALOG`, `DOCINTEL_SCHEMA`, `DOCINTEL_WAREHOUSE_ID`)
- Steady-state deploy: `databricks bundle deploy -t demo`
- Run pipeline: `databricks bundle run -t demo doc_intel_pipeline`
- Run eval: `python evals/clears_eval.py --endpoint analyst-agent-demo --dataset evals/dataset.jsonl`

## Tests & validation

- `pytest agent/tests/` — unit tests for retrieval, agent routing, supervisor
- `databricks bundle validate -t dev` and `-t prod` — schema check both targets before merging
- `databricks bundle validate -t demo` and `-t prod` — schema check both targets before merging
- The CLEARS eval is the deploy gate; principle V says no agent ships without it passing

## Working with this codebase — gotchas Claude has learned
Expand All @@ -69,7 +69,7 @@ These were discovered the painful way during the 2026-04-25 bring-up. Future ses
- **Section explosion fallback**: `pipelines/sql/03_gold_classify_extract.sql` POSEXPLODES `parsed:sections[*]` and falls back to a single `full_document` row when the VARIANT lacks `$.sections` so we never lose a filing.
- **MLflow + UC requires both inputs AND outputs in signatures**: an inputs-only signature is rejected at registration. For variable-shape fields like `citations` (array of dicts), use `mlflow.types.schema.AnyType()` to avoid serving-time truncation. Reference: `agent/log_and_register.py:_signature`.
- **`lakebase_stopped: true` is rejected on instance creation**: the API doesn't allow creating a database_instance directly into stopped state. Default is `false`; flip to `true` only after the instance exists. Reference: `databricks.yml` variable description.
- **macOS doesn't ship `python`**: scripts must prefer `.venv/bin/python` then fall back to `python3`. Reference: `scripts/bootstrap-dev.sh`.
- **macOS doesn't ship `python`**: scripts must prefer `.venv/bin/python` then fall back to `python3`. Reference: `scripts/bootstrap-demo.sh`.
- **`agent/log_and_register.py` needs `PYTHONPATH`**: it imports the `agent` package; run with `PYTHONPATH=$REPO_ROOT` or use the bootstrap script which exports it.
- **Serving endpoint version drifts from YAML**: `resources/consumers/agent.serving.yml` pins `entity_version: "1"` as the bootstrap value. Steady-state CI re-registers new versions and uses `_promote_serving_endpoint` to update the served entity in-place. The YAML and the live endpoint diverge over time — that's intentional, not drift.
- **Streamlit on Databricks Apps requires CORS+XSRF off via env vars**: not flags. `STREAMLIT_SERVER_ENABLE_CORS=false` and `STREAMLIT_SERVER_ENABLE_XSRF_PROTECTION=false` in `app/app.yaml`. Databricks Apps runtime config: https://docs.databricks.com/aws/en/dev-tools/databricks-apps/app-runtime.
Expand Down
6 changes: 3 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,11 @@ python -m venv .venv

```bash
.venv/bin/python -m pytest agent/tests/ -q # 18 unit tests
databricks bundle validate --strict -t dev # YAML schema + interpolation
bash -n scripts/bootstrap-dev.sh # bash syntax
databricks bundle validate --strict -t demo # YAML schema + interpolation
bash -n scripts/bootstrap-demo.sh # bash syntax
```

End-to-end is exercised by `./scripts/bootstrap-dev.sh` against a real Databricks workspace; see [`specs/001-doc-intel-10k/quickstart.md`](./specs/001-doc-intel-10k/quickstart.md).
End-to-end is exercised by `./scripts/bootstrap-demo.sh` against a real Databricks workspace; see [`specs/001-doc-intel-10k/quickstart.md`](./specs/001-doc-intel-10k/quickstart.md).

## Working with the spec-kit

Expand Down
Loading