Add AgentOps project type and vector search data preparation workflows #211
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR adds a new project type "AgentOps" to the existing MLOps Stacks template. Users can now select between two project types when initializing a stack:
This PR also adds the vector search data ingestion pipeline for the AgentOps projects.
Features
AgentOps Template Updates
1. Project type selection
Added
input_project_typeparameter todatabricks_template_schema.jsonmlops(default) oragentopsUpdated minimum Databricks CLI version to
v0.266.0to support new featuresDefault project name now reflects selected project type:
my_{{ .input_project_type }}_projectOther changes:
input_project_typeas order 1input_include_models_in_unity_catalogskipped for agentops)2. Updating project structure layout
Added conditional logic to generate appropriate project structure based on
input_project_typetoupdate_layout.tmplAdded conditional logic to certain files:
Separate code structure sections for MLOps vs AgentOps, which conditionally renders based on
input_project_typerequirements.txt.tmplREADME.md.tmpldatabricks.yml.tmplAll CI/CD pipelines (more on this later)
3. Updating CI/CD workflows
Extended CI/CD pipelines to handle AgentOps projects and test the correct workflows:
.github/workflows/{{.input_project_name}}-run-tests.yml.tmpl).azure/devops-pipelines/{{.input_project_name}}-tests-ci.yml.tmpl).gitlab/pipelines/{{.input_project_name}}-bundle-ci.yml.tmpl)Data preparation with vector search for AgentOps
1. Data preparation code
DataIngestion.py.tmplfetch_data.py.tmplfor retrievalDataPreprocessing.py.tmplcreate_chunk.py.tmplfor chunking logicconfig.py.tmplVectorSearch.py.tmplvector_search_utils.py.tmplfor management + waiting for endpoint to be ready2. Workflow resource configuration
data-preparation-resource.yml.tmpl, which includes each notebook as a separate task (sequential execution)databricks.yml.tmpl3. Defined variables in
databricks.ymldatabricks_staging_workspace_host)input_schema_nameWhat I have tested:
mlopsandagentopsproject types