Add vector_search_indexes resource (direct engine) by janniklasrose · Pull Request #5123 · databricks/cli

janniklasrose · 2026-04-29T13:11:45Z

Changes

Adds vector_search_indexes as a first-class DABs resource on the direct engine, alongside the existing vector_search_endpoints. Direct engine only — vector search has no Terraform provider.

resources:
  vector_search_endpoints:
    my_endpoint:
      name: my-endpoint
      endpoint_type: STANDARD
  vector_search_indexes:
    my_index:
      name: main.default.my_index
      endpoint_name: ${resources.vector_search_endpoints.my_endpoint.name}
      primary_key: id
      index_type: DELTA_SYNC
      delta_sync_index_spec:
        source_table: main.default.source
        pipeline_type: TRIGGERED
      grants:
        - principal: data-engineers
          privileges: [SELECT]

What's included:

Resource model in bundle/config/resources/vector_search_index.go (with grants) and bundle/direct/dresources/vector_search_index.go (state, lifecycle, drift classification). RemapState round-trips index_subtype so a populated remote subtype isn't classified as drift on the next plan.
UC grants wired through the generic grants path with securable type table.
recreate_on_changes for immutable spec fields (name, endpoint_name, index_type, index_subtype, primary_key, delta_sync_index_spec, direct_access_index_spec); delta_sync_index_spec.columns_to_sync marked ignore_remote_changes (request-only field — see follow-up note below). The index API has no rename or update path, so any config-side change has to round-trip through delete + create.
Index orphaning detection: index state persists the endpoint_uuid of the endpoint it was created against. DoRead looks up the current endpoint UUID by name; if the endpoint was deleted out-of-band the lookup returns "" and OverrideChangeDesc classifies the saved-vs-remote mismatch as Recreate. Builds on the endpoint UUID persistence merged in Persist endpoint UUID for vector_search_endpoints drift detection #5127.
Async delete handling: new optional WaitAfterDelete adapter method (sibling to WaitAfterCreate / WaitAfterUpdate). For VS indexes it polls GetIndex until 404 (15-minute cap). apply.Recreate runs DoDelete → DeleteState → WaitAfterDelete → DoCreate → SaveState → WaitAfterCreate, so a wait-time failure leaves the bundle consistent. Replaces the prior SaveState("", nil, nil) placeholder that produced invalid state: empty id planning failures on partial recreate.
Destructive-action prompt for VS indexes in bundle/phases/. The message intentionally covers both Delta Sync ("re-runs the embedding pipeline") and Direct Access ("upserted vectors lost") in one paragraph — picking a type-specific message from the bundle config would be wrong on type changes (DELTA_SYNC → DIRECT_ACCESS recreates would describe the destination type while the actual teardown is of the source type).
Dev-mode name prefixing for indexes prefixes only the leaf component of catalog.schema.name, since catalog and schema are external references (the previous behavior produced invalid names like dev_jan_main.default.my_index). The mutator skips names that still carry literal ${...} tokens, since the leaf split would otherwise inject the prefix inside the trailing ref expression itself.
Testserver enforces endpoint existence on index create. Index status returns Ready: true immediately, matching the convention used by every other slow resource the testserver fakes (endpoints → ONLINE, database instances → AVAILABLE, apps → RUNNING).

index_type / spec-block consistency is intentionally not validated client-side — the CreateIndex API rejects mismatched combinations at deploy time, and replicating that check in DABs would just duplicate backend logic.

Why

The direct engine recently gained vector_search_endpoints (#4887). This PR extends the support to indexes, which were the missing half. Along the way it surfaces and fixes a number of issues:

Without persisted endpoint UUIDs, identity drift was undetectable. An index pointing at a deleted-and-recreated endpoint would appear live by name but its backing endpoint was gone, leading to confusing "index already exists" errors on subsequent deploys. Persist endpoint UUID for vector_search_endpoints drift detection #5127 added the same UUID tracking on the endpoint side; this PR mirrors it on the index side so the orphan is caught.
The async deletion model isn't documented in the SDK, but recreate deploys hit it every time. Without a wait, every recreate failed on the immediate Create.
apply.Recreate was writing a malformed empty-ID state entry as its "delete state" step, which then poisoned the next plan with invalid state: empty id.
Recreating a VS index is genuinely expensive — Delta Sync re-runs the full embedding pipeline; Direct Access loses every upserted vector. The destructive-action prompt now reflects that.

Follow-ups

delta_sync_index_spec.columns_to_sync is request-only in the SDK today: the field is accepted on Create but the Get response doesn't echo it back, which is why we mark it ignore_remote_changes here. There's an open backend PR to expose columns_to_sync on the read path; once the SDK is regenerated against that, we can drop the ignore_remote_changes entry and let normal drift detection handle the field.
vector_search_endpoints.budget_policy_id drift (effective vs. requested) and the SDK doc-comment for vector_search_endpoints.usage_policy_id are intentionally not in this PR — both will be addressed by the next SDK bump and the corresponding ./task generate-schema regen.

Tests

./task fmt, ./task checks, ./task lint — all clean.
./task test — unit tests green across bundle/....
New unit test TestVectorSearchIndexNameWithUnresolvedRefsLeftAlone in apply_target_mode_test.go exercises the leaf-prefix skip on ${var.catalog}.${var.schema}.${var.index}.
New acceptance directories under acceptance/bundle/resources/vector_search_indexes/: basic, drift/columns_to_sync, drift/deleted_remotely, drift/orphaned_endpoint, recreate/index_type, recreate/mixed_types, grants/select.
The recreate request log (recreate/index_type/out.requests.recreate.direct.json) captures GET → DELETE → GET → POST with --get enabled in print_requests.py. The middle GET is the WaitAfterDelete poll; if a future change drops the wait the regenerated capture loses that line and the test fails.
acceptance/bundle/validate/presets_name_prefix covers the leaf-only name prefix on a 3-part index name.
acceptance/bundle/invariant/configs/vector_search_index.yml.tmpl exercises the resource through the invariant matrix; the testserver enforces endpoint existence on index create.
Live tested with --profile tmp against staging across initial deploy / drift / recreate / destroy.

This PR was written by Claude Code.

github-actions · 2026-04-29T13:12:30Z

Approval status: pending

`/acceptance/bundle/` - needs approval

64 files changed
Suggested: @denik
Also eligible: @andrewnester, @shreyas-goenka, @pietern, @anton-107, @lennartkats-db

`/bundle/` - needs approval

31 files changed
Suggested: @denik
Also eligible: @andrewnester, @shreyas-goenka, @pietern, @anton-107, @lennartkats-db

General files (require maintainer)

4 files changed
Based on git history:

@denik -- recent work in bundle/direct/dresources/, bundle/direct/, libs/testserver/

_{Any maintainer (@andrewnester, @anton-107, @denik, @pietern, @shreyas-goenka, @simonfaltum, @renaudhartert-db) can approve all areas.

See OWNERS for ownership rules.}

) ## Changes Persist `endpoint_uuid` in state and detect identity drift on `vector_search_endpoints`. The endpoint name is stable but its UUID changes if the endpoint is deleted and recreated by name (e.g. via the workspace UI). Without persisting the UUID: - The bundle silently rebound permissions to a different backing endpoint without recreating the endpoint resource. - Anything else referencing `endpoint_uuid` (most importantly the permissions object_id, but also indexes added on top in the next PR) raced the recreate. `VectorSearchEndpointState` now embeds `vectorsearch.CreateEndpoint` and adds `EndpointUuid`. `DoCreate` records the UUID from the create response; `DoUpdate` copies it from `entry.RemoteState` so unrelated updates (e.g. `min_qps`) don't blank it out. `OverrideChangeDesc` classifies `endpoint_uuid` drift as `Recreate` when saved differs from remote, `Skip` otherwise. `drift/recreated_same_name` flips from a "badness snapshot" (which captured the old behavior of permissions silently rebinding) to the recreate behavior, with a permissions block on the endpoint to verify the cascade rebinds correctly. `drift/min_qps/out.plan.direct.json` regenerates to include the new `endpoint_uuid` skip entry in the detailed plan. ## Why Splitting this out of the larger `vector_search_indexes` PR ([#5123](#5123)) so it can land independently. The index PR builds on the persisted UUID for orphan detection, but the endpoint UUID work stands on its own and is useful regardless. ## Tests - `make fmtfull`, `make checks`, `make lintfull` — clean. - `make test` — green (`libs/apps/runlocal` needed `NODE_OPTIONS=` for the harness leak; unrelated). `bundle/internal/schema TestRequiredAnnotationsForNewFields` panics, which is failing on `main` for unrelated reasons. - `go test ./acceptance -run 'TestAccept/bundle/resources/vector_search_endpoints'` — all green, including the flipped `drift/recreated_same_name`. _This PR was written by Claude Code._

- invariant vector_search_index config now creates the required endpoint in-bundle so the testserver's endpoint-existence check passes - drift/budget_policy now asserts that the remote drift is ignored (matches ignore_remote_changes for budget_policy_id: effective value may include inherited workspace policy, not user-set) - vector_search_indexes/basic URL updated to match the current catalog.schema.name-based InitializeURL behavior Co-authored-by: Isaac

Covers the case where an index is deleted outside the bundle: the next plan should propose creation and the next deploy restores it. Mirrors acceptance/bundle/resources/volumes/remote-delete and closes the missing drift coverage for indexes (endpoints have drift/budget_policy and drift/min_qps; indexes had none). Co-authored-by: Isaac

…ectorIndexRequest make lint surfaces exhaustruct issues on changed packages. Explicitly initialize the zero-value fields introduced by the prior two commits (EndpointUuid on the new state type, IndexSubtype on the existing RemapState literal). Co-authored-by: Isaac

The name is a 3-part UC identifier (catalog.schema.index) where catalog and schema are external references the bundle does not own. Prefixing the whole name turned "main.default.my_index" into "dev_user_main.default.my_index", which is not a valid catalog. Prefix only the leaf component, matching how references work: "main.default.my_index" -> "main.default.dev_user_my_index". Co-authored-by: Isaac

Exercises the fix from the previous commit: for vector_search_indexes.vs_index.name = my_catalog.${resources.schemas.my_schema.name}.my_index only the leaf (my_index) is prefixed, the catalog is preserved as-is, and the schema reference stays intact — it resolves at deploy time to the already-prefixed schema, yielding the expected 3-part name. Co-authored-by: Isaac

Indexes are UC-only (delta sync against a UC table, UC grants as ACL). Mirrors the flag already set on vector_search_endpoints so these tests are skipped on non-UC cloud runs instead of failing on missing catalog. Co-authored-by: Isaac

If the endpoint an index is attached to is deleted out-of-band, the index still exists by name but its backing endpoint is gone. The planner saw the index's remote state as valid and produced "skip", then deploy recreated the endpoint (new UUID) and left the index orphaned against the old UUID — on the next deploy attempt the user saw "index already exists" with no way to resolve short of manual deletion. VectorSearchIndexState now persists endpoint_uuid alongside the create request. DoRead looks it up from the endpoint service (the index API doesn't return it). OverrideChangeDesc classifies saved vs remote endpoint UUID drift as Recreate so the plan correctly shows "recreate vector_search_indexes.*" when the endpoint has been replaced, and the apply path delete+creates the index against the new endpoint. Covered by acceptance/.../vector_search_indexes/drift/orphaned_endpoint. Co-authored-by: Isaac

Recreate of a Vector Search index re-runs the full embedding pipeline (Delta Sync) or drops every upserted vector (Direct Access). Both can take significant time and cost. Bring vector_search_indexes into the same destructive-action prompt path used for schemas, pipelines, volumes, dashboards, and the Lakebase resources, with a message explaining the cost. Acceptance scripts that recreate indexes now pass --auto-approve, and all VS index test outputs pick up the prompt message at destroy time. Co-authored-by: Isaac

DELETE on a Vector Search index is asynchronous: the backend tears down the embedding pipeline over a few minutes, and any operation on the same name during that window returns "Index ... is currently pending deletion". Recreate (delete+create on the same name) hit this on every deploy. DoDelete now polls GetIndex until it returns 404, so a follow-up DoCreate sees a clean slate. Timeout is 15 minutes, matching observed worst-case teardown. Also harden recovery from a partial recreate: if Create after Delete fails, apply.Recreate now drops the state entry instead of leaving it with an empty ID. Pre-fix state files that already carry an empty ID are tolerated by the planner — empty ID is treated the same as a missing entry, so the next plan re-creates the resource cleanly instead of failing with "invalid state: empty id". Testserver now models the async deletion window: DELETE moves the index into a pending buffer, CREATE during pending returns the backend's exact error, and the next GET (the wait loop's poll) clears the buffer. The recreate/index_type acceptance test exercises the full path. Co-authored-by: Isaac

Previously lookupEndpointUuid swallowed all non-404 errors and returned "", which would feed empty remoteUuid into OverrideChangeDesc and propose a destructive Recreate ("endpoint replaced out-of-band") on transient or permission errors. The Recreate is dangerous: Delta Sync re-runs the embedding pipeline, and Direct Access loses all upserted vectors. Now the helper returns (string, error): 404 maps to ("", nil) — the orphan signal — and any other error is propagated through DoRead/DoCreate so the plan fails loudly instead of misclassifying it as drift. Document the OverrideChangeDesc divergence from vector_search_endpoint (which requires remoteUuid != ""): for indexes, an empty remoteUuid is the orphan signal, and the lookup contract guarantees that case is unambiguous.

Add a Badness-marked test that deploys a bundle with both a vector_search_endpoint and a vector_search_index referencing it, then changes the endpoint_type to trigger an endpoint Recreate. The plan correctly recreates the endpoint but leaves the dependent index unchanged, so on a real workspace the endpoint delete would either fail (indexes still attached) or orphan the index. Root cause is in the planner (bundle/direct/bundle_plan.go): there is no logic to propagate Recreate from a dependency to its dependents. This is a framework-level concern that affects more than just VS, so it's deferred to a follow-up. The Badness entry documents the gap.

Add a Badness-marked validate test showing that the name_prefix preset does not rewrite a vector_search_indexes.*.endpoint_name literal that points at a bundle-managed (and therefore prefixed) endpoint. The output shows vs_endpoint -> prefix_vs_endpoint while vs_index_literal still targets the unprefixed name vs_endpoint. The DABs idiom is to use ${resources.vector_search_endpoints.X.name} (captured by vs_index_ref in the same fixture). That form resolves correctly to the prefixed name at plan/deploy time, so users have a working pattern. The literal form silently breaks though, and the preset has enough information to rewrite it; tracked as Badness for a follow-up fix in apply_presets.go.

Mirror the existing vector_search_endpoint bind test: pre-create both endpoint and index, bind the index into the bundle, deploy, unbind, and destroy. Verifies the index survives unbind+destroy as expected. Required by bundle/direct/dresources/README.md for new resource types.

The placeholder substitutes index_type, not primary_key — the original name was misleading.

Drop the Terraform-provider justification (already implied by "direct engine only") and the long list of internal mechanics. Keep the entry focused on what customers see.

CreateIndex returns immediately with metadata of an index whose embedding pipeline is still provisioning; queries against an index that isn't ready fail. Implement WaitAfterCreate so dependent resources (and the next plan) see a usable index. 75-minute timeout matches the terraform provider. Co-authored-by: Isaac

…mapping

The SaveState->DeleteState change in apply.Recreate and the empty-id tolerance in bundle_plan.go were extracted to a separate PR (#5173). Reverting them here so this branch and #5173 can land independently; once #5173 merges, a rebase on main brings the same fix back in. Co-authored-by: Isaac

janniklasrose · 2026-05-04T08:32:54Z

+      # columns_to_sync is request-only in the create spec and not returned by the read API.
+      - field: delta_sync_index_spec.columns_to_sync
+        reason: input_only


I made a PR (under review by VS team) to return this field

janniklasrose · 2026-05-04T08:51:38Z

+// createIndexTimeout caps the wait for an index to become ready after creation.
+// Delta sync indexes do an initial sync from the source table, which can stretch
+// out for large tables. Matches the terraform provider's defaultIndexProvisionTimeout.
+// https://github.com/databricks/terraform-provider-databricks/blob/c61a32300445f84efb2bb6827dee35e6e523f4ff/vectorsearch/resource_vector_search_index.go#L19


update reference after databricks/terraform-provider-databricks#5598 is merged

andrewnester · 2026-05-06T10:57:57Z

+
+trace $CLI bundle summary
+
+trace $CLI vector-search-endpoints create-endpoint "${endpoint_name}" STANDARD | jq '{id, name, endpoint_type}'


Could you create it as part of the same bundle and refer to it in the index?

no strong feeling either way.

pro: they are strongly coupled so should be declared together

con: precedent is to have resources/RESOURCE/*/databricks.yml.tmpl only contain the resource

precedent is to have resources/RESOURCE/*/databricks.yml.tmpl only contain the resource

I don't think we are strict about it, we have already resources combining both (see catalogs or database_catalogs for example).

Having them together helps us to test the actual use case users will have + cleaning up is a bit easier too

andrewnester · 2026-05-06T11:08:10Z

+// provisioning; queries against an index that isn't ready fail. Blocking here
+// lets dependent resources (and the next plan) see a usable index.
+func (r *ResourceVectorSearchIndex) WaitAfterCreate(ctx context.Context, config *VectorSearchIndexState) (*VectorSearchIndexRemote, error) {
+	index, err := retries.Poll(ctx, createIndexTimeout, func() (*vectorsearch.VectorIndex, *retries.Err) {


This can be a very long operation, right? Shall we maybe not wait for index to be ready and then later introduce support for lifecycle.started here which will wait?

This is following terraform's 75min timeout (see databricks/terraform-provider-databricks#5598 that I made by request from VS team). Don't we always wait for creation for every resource? We can also keep WaitAfterCreate but make it opt-in (maybe use an experimental flag)?

We don't for clusters and sql warehouses for example (in both direct and TF mode) and we now make it opt in with lifecycle.started option #5150

My concern with waiting is that if it's a very long operation users don't have a way out of it right now
But implementation wise I'm fine with both to be honest, we can iterate on it any time

andrewnester · 2026-05-06T11:10:50Z

+}
+
+func (r *ResourceVectorSearchIndex) DoUpdate(ctx context.Context, id string, config *VectorSearchIndexState, entry *PlanEntry) (*VectorSearchIndexRemote, error) {
+	// Vector search indexes have no update API; all field changes trigger recreation via resources.yml.


What will happen if the new field is added but it can be updatable, how do we not leave it unnoticed?

janniklasrose marked this pull request as draft April 29, 2026 13:12

janniklasrose force-pushed the janniklasrose/vs-index branch from 1943af9 to 87018ce Compare April 29, 2026 13:19

janniklasrose temporarily deployed to test-trigger-is April 29, 2026 13:20 — with GitHub Actions Inactive

janniklasrose temporarily deployed to test-trigger-is April 29, 2026 13:41 — with GitHub Actions Inactive

janniklasrose temporarily deployed to test-trigger-is April 29, 2026 14:03 — with GitHub Actions Inactive

janniklasrose temporarily deployed to test-trigger-is April 29, 2026 14:23 — with GitHub Actions Inactive

janniklasrose temporarily deployed to test-trigger-is April 29, 2026 14:27 — with GitHub Actions Inactive

janniklasrose mentioned this pull request Apr 29, 2026

Persist endpoint UUID for vector_search_endpoints drift detection #5127

Merged

janniklasrose temporarily deployed to test-trigger-is April 29, 2026 15:22 — with GitHub Actions Inactive

janniklasrose temporarily deployed to test-trigger-is April 29, 2026 15:58 — with GitHub Actions Inactive

janniklasrose added 11 commits April 30, 2026 14:30

One-shot with skill

959b0cb

Fixes

c07e291

janniklasrose force-pushed the janniklasrose/vs-index branch from 2b22f02 to 44ade3f Compare April 30, 2026 14:23

janniklasrose temporarily deployed to test-trigger-is April 30, 2026 14:24 — with GitHub Actions Inactive

janniklasrose added 6 commits April 30, 2026 23:04

acceptance: rename PRIMARY_KEY_PLACEHOLDER to INDEX_TYPE_PLACEHOLDER

a191d5e

The placeholder substitutes index_type, not primary_key — the original name was misleading.

changelog: shorten VS indexes entry

0c33818

Drop the Terraform-provider justification (already implied by "direct engine only") and the long list of internal mechanics. Keep the entry focused on what customers see.

janniklasrose mentioned this pull request May 1, 2026

Add Vector Search semantic product discovery example databricks/bundle-examples#153

Open

5 tasks

janniklasrose temporarily deployed to test-trigger-is May 1, 2026 09:31 — with GitHub Actions Inactive

janniklasrose added 2 commits May 4, 2026 10:56

Allow leaf to be prefixed if wihtout a reference

3d51621

Simplify dresources/vector_search_index.go PrepareState index spec re…

ec7351f

…mapping

janniklasrose temporarily deployed to test-trigger-is May 4, 2026 09:00 — with GitHub Actions Inactive

janniklasrose temporarily deployed to test-trigger-is May 4, 2026 13:48 — with GitHub Actions Inactive

janniklasrose added 2 commits May 4, 2026 18:38

Hack: recreate index if endpoint is being recreated

41fadce

Re-run 'task generate-validation'

8f24901

janniklasrose temporarily deployed to test-trigger-is May 6, 2026 10:50 — with GitHub Actions Inactive

janniklasrose marked this pull request as ready for review May 6, 2026 10:51

janniklasrose commented May 6, 2026

View reviewed changes

janniklasrose requested review from andrewnester and denik May 6, 2026 10:51

andrewnester reviewed May 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vector_search_indexes resource (direct engine)#5123

Add vector_search_indexes resource (direct engine)#5123
janniklasrose wants to merge 45 commits intomainfrom
janniklasrose/vs-index

janniklasrose commented Apr 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

janniklasrose May 4, 2026

Uh oh!

janniklasrose May 4, 2026

Uh oh!

andrewnester May 6, 2026

Uh oh!

janniklasrose May 6, 2026

Uh oh!

andrewnester May 6, 2026

Uh oh!

andrewnester May 6, 2026

Uh oh!

janniklasrose May 6, 2026

Uh oh!

andrewnester May 6, 2026

Uh oh!

andrewnester May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		trace $CLI bundle summary

		trace $CLI vector-search-endpoints create-endpoint "${endpoint_name}" STANDARD \| jq '{id, name, endpoint_type}'

Conversation

janniklasrose commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Why

Follow-ups

Tests

Uh oh!

github-actions Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approval status: pending

/acceptance/bundle/ - needs approval

/bundle/ - needs approval

General files (require maintainer)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

janniklasrose commented Apr 29, 2026 •

edited

Loading

github-actions Bot commented Apr 29, 2026 •

edited

Loading

`/acceptance/bundle/` - needs approval

`/bundle/` - needs approval