Skip to content

databricks-solutions/ai-dev-kit

Repository files navigation

Databricks AI Dev Kit

Build Databricks projects with AI coding assistants (Claude Code, Cursor, etc.) using MCP (Model Context Protocol).

Overview

The AI Dev Kit provides everything you need to build on Databricks using AI assistants:

  • High-level Python functions for Databricks operations
  • MCP server that exposes these functions as tools for AI assistants
  • Skills that teach AI assistants best practices and patterns

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                              Your Project                                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                              β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚   databricks-skills/    β”‚        β”‚   .claude/mcp.json              β”‚   β”‚
β”‚   β”‚                         β”‚        β”‚                                 β”‚   β”‚
β”‚   β”‚   Knowledge & Patterns  β”‚        β”‚   MCP Server Config             β”‚   β”‚
β”‚   β”‚   β€’ dabs-writer         β”‚        β”‚   β†’ databricks-mcp-server       β”‚   β”‚
β”‚   β”‚   β€’ sdp-writer          β”‚        β”‚                                 β”‚   β”‚
β”‚   β”‚   β€’ synthetic-data-gen  β”‚        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚   β”‚   β€’ databricks-sdk      β”‚                        β”‚                      β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                        β”‚                      β”‚
β”‚               β”‚                                      β”‚                      β”‚
β”‚               β”‚    SKILLS teach                      β”‚    TOOLS execute     β”‚
β”‚               β”‚    HOW to do things                  β”‚    actions on        β”‚
β”‚               β”‚                                      β”‚    Databricks        β”‚
β”‚               β–Ό                                      β–Ό                      β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚   β”‚                          Claude Code                                 β”‚  β”‚
β”‚   β”‚                                                                      β”‚  β”‚
β”‚   β”‚   "Create a DAB with a DLT pipeline and deploy to dev/prod"         β”‚  β”‚
β”‚   β”‚                                                                      β”‚  β”‚
β”‚   β”‚   β†’ Uses SKILLS to know the patterns and best practices             β”‚  β”‚
β”‚   β”‚   β†’ Uses MCP TOOLS to execute SQL, create pipelines, etc.           β”‚  β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

                                    β”‚
                                    β”‚ MCP Protocol
                                    β–Ό

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        databricks-mcp-server                                 β”‚
β”‚                                                                              β”‚
β”‚   Exposes Python functions as MCP tools via stdio transport                 β”‚
β”‚   β€’ execute_sql, execute_sql_multi                                          β”‚
β”‚   β€’ get_table_details, list_warehouses                                      β”‚
β”‚   β€’ run_python_file_on_databricks                                           β”‚
β”‚   β€’ ka_create, mas_create, genie_create (Agent Bricks)                      β”‚
β”‚   β€’ create_pipeline, start_pipeline (SDP)                                   β”‚
β”‚   β€’ ... and more                                                            β”‚
β”‚                                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β”‚ Python imports
                                    β–Ό

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         databricks-mcp-core                                  β”‚
β”‚                                                                              β”‚
β”‚   Pure Python library with high-level Databricks functions                  β”‚
β”‚                                                                              β”‚
β”‚   β”œβ”€β”€ sql/                    SQL execution, warehouses, table stats        β”‚
β”‚   β”œβ”€β”€ unity_catalog/          Catalogs, schemas, tables                     β”‚
β”‚   β”œβ”€β”€ compute/                Execution contexts, run code on clusters      β”‚
β”‚   β”œβ”€β”€ spark_declarative_pipelines/   DLT/SDP pipeline management            β”‚
β”‚   └── agent_bricks/           Genie, Knowledge Assistants, MAS              β”‚
β”‚                                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β”‚ API calls
                                    β–Ό

                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                          β”‚  Databricks         β”‚
                          β”‚  Workspace          β”‚
                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Quick Start

Step 1: Clone and install

# Clone the repository
git clone https://github.com/databricks-solutions/ai-dev-kit.git
cd ai-dev-kit

# Install the core library
cd databricks-mcp-core
uv pip install -e .

# Install the MCP server
cd ../databricks-mcp-server
uv pip install -e .

Step 2: Configure Databricks authentication

# Option 1: Environment variables
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-token"

# Option 2: Use a profile from ~/.databrickscfg
export DATABRICKS_CONFIG_PROFILE="your-profile"

Step 3: Add MCP server to your project

In your project directory, create .claude/mcp.json:

{
  "mcpServers": {
    "databricks": {
      "command": "uv",
      "args": ["run", "python", "-m", "databricks_mcp_server.server"],
      "cwd": "/path/to/ai-dev-kit/databricks-mcp-server",
      "defer_loading": true
    }
  }
}

Replace /path/to/ai-dev-kit with the actual path where you cloned the repo.

Step 4: Install Databricks skills to your project (recommended)

Skills teach Claude best practices and patterns:

# In your project directory
curl -sSL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/databricks-skills/install_skills.sh | bash

This installs to .claude/skills/:

  • dabs-writer: Databricks Asset Bundles patterns
  • sdp-writer: Spark Declarative Pipelines (DLT)
  • synthetic-data-generation: Realistic test data generation
  • databricks-python-sdk: SDK and API usage

Step 5: Start Claude Code

cd /path/to/your/project
claude

Claude now has both skills (knowledge) and MCP tools (actions) for Databricks!

Components

Component Description
databricks-mcp-core Pure Python library with Databricks functions
databricks-mcp-server MCP server wrapping core functions as tools
databricks-skills Skills for Claude Code with patterns & examples

Using the Core Library with Other Frameworks

The core library (databricks-mcp-core) is framework-agnostic. While databricks-mcp-server exposes it via MCP for Claude Code, you can use the same functions with any AI agent framework.

Direct Python usage

from databricks_mcp_core.sql import execute_sql, get_table_details, TableStatLevel

results = execute_sql("SELECT * FROM my_catalog.my_schema.customers LIMIT 10")

stats = get_table_details(
    catalog="my_catalog",
    schema="my_schema",
    table_names=["customers", "orders"],
    table_stat_level=TableStatLevel.DETAILED
)

With LangChain

from langchain_core.tools import tool
from databricks_mcp_core.sql import execute_sql, get_table_details
from databricks_mcp_core.file import upload_folder

@tool
def run_sql(query: str) -> list:
    """Execute a SQL query on Databricks and return results."""
    return execute_sql(query)

@tool
def get_table_info(catalog: str, schema: str, tables: list[str]) -> dict:
    """Get schema and statistics for Databricks tables."""
    return get_table_details(catalog, schema, tables).model_dump()

@tool
def upload_to_workspace(local_path: str, workspace_path: str) -> dict:
    """Upload a local folder to Databricks workspace."""
    result = upload_folder(local_path, workspace_path)
    return {"success": result.success, "files": result.total_files}

# Use with any LangChain agent
tools = [run_sql, get_table_info, upload_to_workspace]

With OpenAI Agents SDK

from agents import Agent, function_tool
from databricks_mcp_core.sql import execute_sql
from databricks_mcp_core.spark_declarative_pipelines.pipelines import (
    create_pipeline, start_update, get_update
)

@function_tool
def run_sql(query: str) -> list:
    """Execute a SQL query on Databricks."""
    return execute_sql(query)

@function_tool
def create_sdp_pipeline(name: str, catalog: str, schema: str, notebook_paths: list[str]) -> dict:
    """Create a Spark Declarative Pipeline."""
    result = create_pipeline(name, f"/Workspace/{name}", catalog, schema, notebook_paths)
    return {"pipeline_id": result.pipeline_id}

agent = Agent(
    name="Databricks Agent",
    tools=[run_sql, create_sdp_pipeline],
)

This separation allows you to:

  • Use the same Databricks functions across different agent frameworks
  • Build custom integrations without MCP overhead
  • Test functions directly in Python scripts

Documentation

Development

# Clone the repo
git clone https://github.com/databricks-solutions/ai-dev-kit.git
cd ai-dev-kit

# Install with uv
uv pip install -e databricks-mcp-core
uv pip install -e databricks-mcp-server

# Run tests
cd databricks-mcp-core
uv run pytest tests/integration/ -v

License

Β© 2025 Databricks, Inc. All rights reserved. The source in this project is provided subject to the Databricks License.

Third-Party Package Licenses

Package License Copyright
databricks-sdk Apache License 2.0 Copyright (c) Databricks, Inc.
fastmcp MIT License Copyright (c) 2024 Jeremiah Lowin
pydantic MIT License Copyright (c) 2017 Samuel Colvin
sqlglot MIT License Copyright (c) 2022 Toby Mao
sqlfluff MIT License Copyright (c) 2019 Alan Cruickshank

About

Databricks-approved Tooling for Vibecoding with Native Services

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •