Add AgentOps project type and vector search data preparation workflows #211

veenaramesh · 2025-11-19T20:16:55Z

Overview

This PR adds a new project type "AgentOps" to the existing MLOps Stacks template. Users can now select between two project types when initializing a stack:

mlops - Existing template; traditional ML pipeline for model training and batch inference
agentops - NEW template; agent-specific workflows for data ingestion + eventually agent development/deployment.

This PR also adds the vector search data ingestion pipeline for the AgentOps projects.

Features

AgentOps Template Updates

1. Project type selection

Added input_project_type parameter to databricks_template_schema.json
- Options: mlops (default) or agentops
- First-order parameter in template initialization
Updated minimum Databricks CLI version to v0.266.0 to support new features
Default project name now reflects selected project type: my_{{ .input_project_type }}_project
Other changes:
- Reordered parameters with input_project_type as order 1
- Updated all subsequent parameter orders
- Conditional parameter display (e.g., input_include_models_in_unity_catalog skipped for agentops)
- Updated default values to be project-type aware

2. Updating project structure layout

Added conditional logic to generate appropriate project structure based on input_project_type to update_layout.tmpl
- Ensures MLOps-specific files are only generated for MLOps projects
Added conditional logic to certain files:
Separate code structure sections for MLOps vs AgentOps, which conditionally renders based on input_project_type
- requirements.txt.tmpl
  - Adds dependencies (e.g. vector search SDK)
- README.md.tmpl
  - Adds basic documentation for agentops project
- databricks.yml.tmpl
  - Extends bundle configuration to support agentops resources
  - Adds data preparation workflow targets
- All CI/CD pipelines (more on this later)

3. Updating CI/CD workflows

Extended CI/CD pipelines to handle AgentOps projects and test the correct workflows:
- GitHub Actions (.github/workflows/{{.input_project_name}}-run-tests.yml.tmpl)
- Azure DevOps (.azure/devops-pipelines/{{.input_project_name}}-tests-ci.yml.tmpl)
- GitLab CI (.gitlab/pipelines/{{.input_project_name}}-bundle-ci.yml.tmpl)

Data preparation with vector search for AgentOps

1. Data preparation code

Notebook: DataIngestion.py.tmpl
- Processes raw documentation from data source URLs and stores data in UC
- uses utility function fetch_data.py.tmpl for retrieval
Notebook: DataPreprocessing.py.tmpl
- Cleans and chunks documentation to prepare for vector search
- uses utiltiy function create_chunk.py.tmpl for chunking logic
- define configs for chunking in config.py.tmpl
Notebook: VectorSearch.py.tmpl
- Creates Vector Search endpoint and index using delta sync (TRIGGERED mode)
- uses utility function vector_search_utils.py.tmpl for management + waiting for endpoint to be ready

2. Workflow resource configuration

Defined the data preparation workflow in data-preparation-resource.yml.tmpl, which includes each notebook as a separate task (sequential execution)
- Parameters for notebooks are given here
- Scheduled for running everyday in the morning 5am
- Severless environment and dependencies are also defined here
Included resource in list of resources in databricks.yml.tmpl

3. Defined variables in `databricks.yml`

Included variables (that will feed into data preparation workflow parameters)
- catalog_name
  - Defined uniquely for each deployment target using template input (e.g.databricks_staging_workspace_host)
- schema
  - Defined the same for each deployment target using template input input_schema_name
- raw_data_table
  - Will automatically populate as "raw_documentation"
- preprocessed_data_table
  - Will automatically populate as "databricks_documentation"
- eval_table
  - Will automatically populate as "databricks_documentation_eval"
- vector_search_endpoint
  - Will automatically populate as "ai_agent_endpoint"
- vector_search_index
  - Will automatically populate as "databricks_documentation_vs_index"

What I have tested:

Validated project generation for both mlops and agentops project types
Tested original mlops-stacks project + confirmed that the default behavior is unchanged
Validated data preparation pipeline works end-to-end
Validated that bundle variables are used properly by resources

arpitjasa-db

@veenaramesh can we update the tests to pass? https://github.com/databricks/mlops-stacks/actions/runs/19515122285/job/55864800288?pr=211

Let's also look to add a bit of coverage for this new flow as well

…ks to exclude test

veenaramesh added 2 commits November 14, 2025 18:46

v1 data ingestion, agent development, and agent deployment code

018553f

separating agent code, using serverless dependency in job def

3c3b117

veenaramesh requested review from alexbaur, arpitjasa-db and sdonohoo-db November 19, 2025 20:16

arpitjasa-db reviewed Nov 19, 2025

View reviewed changes

veenaramesh and others added 8 commits November 20, 2025 10:04

fixing tests

b0d5d17

fixing tests, editing incorrectly formatted yml file

663f7ec

fixing tests, updating mlflow recipes links from latest to v2.5

091107d

removing https://docs.databricks.com/ from README.md

c44b8d6

Removing unnecessary doc strings. Adding DataIngestion to list of ntb…

7a72bc4

…ks to exclude test

adding docs link to template schema json

9a4eee5

Update min databricks cli version to support serverless job environments

94aaca4

removing pyspark from requirements

d52be58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add AgentOps project type and vector search data preparation workflows #211

Add AgentOps project type and vector search data preparation workflows #211

Uh oh!

veenaramesh commented Nov 19, 2025

Uh oh!

arpitjasa-db left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add AgentOps project type and vector search data preparation workflows #211

Are you sure you want to change the base?

Add AgentOps project type and vector search data preparation workflows #211

Uh oh!

Conversation

veenaramesh commented Nov 19, 2025

Overview

Features

AgentOps Template Updates

1. Project type selection

2. Updating project structure layout

3. Updating CI/CD workflows

Data preparation with vector search for AgentOps

1. Data preparation code

2. Workflow resource configuration

3. Defined variables in databricks.yml

What I have tested:

Uh oh!

arpitjasa-db left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

3. Defined variables in `databricks.yml`