feat(databricks-skills): add databricks-mlflow-ml skill for classic ML by dgokeeffe · Pull Request #474 · databricks-solutions/ai-dev-kit

dgokeeffe · 2026-04-19T13:20:05Z

Why

The existing MLflow-related skills leave a gap for classic ML practitioners:

Skill	Scope	Covers classic ML UC registration?
`databricks-mlflow-evaluation`	GenAI agent evaluation (`mlflow.genai.evaluate`, scorers, judges)	❌ Different audience
`databricks-model-serving`	Real-time serving endpoints	❌ Covers serving, not training/registration
`databricks-unity-catalog`	Tables, volumes, system tables	❌ Data primitives, not model registry
`databricks-mlflow-ml` (this PR)	Classic ML training + UC registration + batch inference	✅

A data scientist training a forecasting model, registering it to Unity Catalog, and scoring predictions in a notebook or Lakeflow pipeline has no skill to trigger on. This PR fills that gap.

What's in the skill

SKILL.md — workflow index (Train → Register → Score, Retrain + Promote A/B, Debugging), quick-start, runtime compatibility note, and trigger description.

7 reference files:

GOTCHAS.md — 14 common mistakes with symptoms + fixes
CRITICAL-interfaces.md — exact API signatures + the models:/catalog.schema.model@alias URI format
patterns-experiment-setup.md — UC volume artifact_location (required in UC-enforced workspaces)
patterns-training.md — logging with signature + input_example, sklearn.Pipeline wrapping, autologging
patterns-uc-registration.md — three-level names, @champion/@challenger aliases, verification via DESCRIBE MODEL, A/B promotion
patterns-batch-inference.md — notebook pyfunc.load_model (Tier 1), Lakeflow SDP pyfunc.spark_udf (Tier 2), champion-vs-challenger validation, explicit warning against ai_query on custom UC models
user-journeys.md — 7 end-to-end workflows including debugging scenarios

Key gotchas this skill teaches that other guides miss

UC volume artifact_location on experiment creation — DBFS root is rejected in UC-enforced workspaces. Every log_model call fails with opaque errors until artifact_location points at a UC volume.
mlflow.set_registry_uri('databricks-uc') — without this, register_model silently routes to the legacy workspace registry. The Add initial skills for Databricks development #1 "my model isn't showing up in Catalog Explorer" support question.
ai_query on custom UC models — doesn't work. Requires a serving endpoint. Correct primitive is mlflow.pyfunc.load_model (notebook) or mlflow.pyfunc.spark_udf (Lakeflow).
@champion / @challenger aliases — replace deprecated transition_model_version_stage() stages. The legacy API still exists but is a no-op on UC-registered models (no error, no effect).
mlflow.pyfunc.spark_udf in Lakeflow SDP — must be constructed at module scope, not inside @dp.materialized_view. Otherwise deserialization repeats on every pipeline evaluation.
pip install 'mlflow[databricks]' — required for UC registration outside Databricks clusters. Plain pip install mlflow omits the cloud-storage SDKs (azure-core / boto3 / google.cloud) MLflow needs to stage UC artifacts. Clusters ship the extras pre-installed.

Testing

Field-tested end-to-end against a live Databricks workspace:

Feature table seeded, trained a GradientBoostingRegressor
Registered to UC with @champion alias — verified in Catalog Explorer UI
Loaded via mlflow.pyfunc.load_model — predictions within ~2% of actuals
Two additional gotchas surfaced during the test (mlflow[databricks] install + artifact_path deprecation) and added to GOTCHAS.md

Runtime verified: MLflow 3.11 on Lakeflow SDP serverless compute v5 (current default). Patterns compatible with MLflow 2.16+ — pairs on older classic DBRs still get correct behaviour. 2.x/3.x divergences called out in GOTCHAS.md (e.g., artifact_path → name=).

Structure parity

File layout matches databricks-mlflow-evaluation (same SKILL.md + references/ + GOTCHAS.md + CRITICAL-interfaces.md + patterns-*.md convention). Installable via the existing install_skills.sh:

./install_skills.sh databricks-mlflow-ml

Not in scope

Model Serving endpoints (databricks-model-serving covers that)
GenAI agent evaluation (databricks-mlflow-evaluation covers that)
Generic UC primitives like volumes and tables (databricks-unity-catalog covers those)

Deliberately narrow — classic ML + UC registration + batch inference only.

Origin

Built to fill a gap encountered during the Coles Vibe Workshop (airgapped Databricks field-engineer hackathon). DS pairs needed UC-scoped MLflow guidance that wasn't covered by any existing skill. Content battle-tested in the workshop before being contributed upstream.

dustinvannoy-db · 2026-04-24T17:11:24Z

Do the mlflow official skills we install not over this gap? cc: @jacksandom

dgokeeffe · 2026-04-27T13:13:04Z

@dustinvannoy-db I checked the mlflow/skills repo (what install_genie_code_skills.py pulls from). All 8 skills are GenAI/LLM-tracing scoped: agent-evaluation, mlflow-onboarding, instrumenting-with-mlflow-tracing, analyze-mlflow-trace, analyze-mlflow-chat-session, querying-mlflow-metrics, retrieving-mlflow-traces, searching-mlflow-docs. Not one touches Unity Catalog, set_registry_uri('databricks-uc'), @champion/@challenger aliases, or pyfunc.spark_udf.

The UC-specific stuff is what this PR covers: UC-enforced workspaces rejecting DBFS artifact roots, the legacy stage transition API silently no-oping on UC models, ai_query not working on custom UC models. That belongs here rather than upstream — it's Databricks config, not MLflow API.

@Challenger

Fills the gap between databricks-mlflow-evaluation (GenAI agent eval) and databricks-model-serving (real-time endpoints). Covers: - Classic ML model training with MLflow tracking (sklearn / XGBoost / PyTorch) - Experiment creation with UC volume artifact_location (required in UC-enforced workspaces) - Unity Catalog model registration with three-level names - @Champion / @Challenger alias management - Batch inference via mlflow.pyfunc.load_model (notebook, up to ~10k rows) - Distributed batch via mlflow.pyfunc.spark_udf in Lakeflow SDP pipelines Structure mirrors databricks-mlflow-evaluation: - SKILL.md: workflows + trigger description + quick start - references/GOTCHAS.md: 12 common mistakes with symptoms + fixes - references/CRITICAL-interfaces.md: exact API signatures + models:/ URI format - references/patterns-experiment-setup.md: UC volume artifact_location setup - references/patterns-training.md: logging with signature + input_example - references/patterns-uc-registration.md: register + alias + verify + A/B - references/patterns-batch-inference.md: pyfunc.load_model + spark_udf + ai_query anti-pattern - references/user-journeys.md: 7 end-to-end workflows including debugging Key gotchas covered that other MLflow guides miss: - Experiment creation now requires UC volume artifact_location in UC-enforced workspaces (DBFS root writes are rejected) - mlflow.set_registry_uri('databricks-uc') is required; silent workspace registry fallback is the databricks-solutions#1 support question - ai_query does NOT work on custom UC-registered models unless they're deployed to a serving endpoint; use pyfunc.load_model or spark_udf instead - UC aliases (@champion/@Challenger) replace deprecated stage transitions (transition_model_version_stage is a no-op on UC models) - mlflow.pyfunc.spark_udf must be constructed at module scope in Lakeflow SDP pipelines, not inside the function body Tested against MLflow 2.16+ on Databricks Runtime 15.4 LTS. Content battle- tested in the Coles Vibe Workshop (classic-ML track running in an airgapped environment where online MLflow docs aren't reachable).

Field-tested the skill end-to-end from a local Python environment against a live Databricks workspace. Surfaced two gotchas not in the original set: databricks-solutions#12 mlflow[databricks] extras missing when running outside Databricks: plain `pip install mlflow` omits azure-core / boto3 / google.cloud SDKs that UC registration needs to stage artifacts. Training + log_model work; register_model fails with opaque "No module named 'azure'". Databricks clusters ship the extras pre-installed, so this only bites laptops / CI. databricks-solutions#13 artifact_path= deprecated in favour of name= (MLflow 2.16+): emits warning on every log_model call. Non-blocking, but worth flagging since most online tutorials + training courses still use the old param. Both verified against the workshop's test run — skill workflow 1 now completes cleanly with these fixes documented.

Original SKILL.md didn't state a runtime target. Adds a "Runtime compatibility" section anchored on what the skill was actually tested against — MLflow 3.11 on Lakeflow SDP serverless compute v5 — with a compat note for MLflow 2.16+ (classic DBR 15.4 LTS still ships 2.x). Points at GOTCHAS.md for the 3.x-vs-2.x divergence (artifact_path deprecation, etc.).

QuentinAmbard · 2026-05-08T07:24:27Z

here's what claude suggest:

  Report: databricks-mlflow-ml skill audit

  Current state

  8 files, 1,666 lines. Structure: SKILL.md + 7 references (GOTCHAS, CRITICAL-interfaces, 4 pattern files, user-journeys).

  What's wrong (the honest assessment)

  1. Massive redundancy — the same 5–6 facts repeated 4–5 times each

  The five "load-bearing" facts of this skill are:
  - Set mlflow.set_registry_uri("databricks-uc")
  - Use three-level UC names
  - Pin artifact_location to a UC volume
  - Log with signature + input_example
  - Load via models:/cat.sch.name@alias (alias not version)
  - Use pyfunc.spark_udf (not ai_query) for batch

  Each one appears in: SKILL.md Quick Start + Common Issues table + GOTCHAS entry + CRITICAL-interfaces section + a pattern file +
  user-journeys + Workflow tables. Same fact, 4–6 places. That's massive duplication for a model that doesn't need it — I would memorize
   it after one pass.

  2. Treats me like a junior engineer who needs full code

  Patterns 1, 2, 3 of patterns-training.md are textbook sklearn — train_test_split, GradientBoostingRegressor.fit, mlflow.log_metrics. I
   know all of this. What I need is what's specific to UC + MLflow on Databricks. The actual UC-relevant lines per pattern: 3–5. The
  rest is filler.

  Same with patterns-batch-inference.md Pattern 2 (matplotlib chart), Pattern 6 (structured streaming boilerplate), Pattern 5 (basic A/B
   comparison with mean_absolute_error). I don't need these spelled out.

  3. user-journeys.md is almost pure pointer-shuffling

  Every step is "do X (see file Y, pattern Z)". It's a table of contents disguised as content. The workflows in SKILL.md already do
  this. Journey 7 ("everything is on fire") is fluff.

  4. CRITICAL-interfaces.md is a poorly-disguised cheatsheet of what I already know

  mlflow.search_runs(...), MlflowClient().set_registered_model_alias(...) — I'd write these correctly without prompting. The only
  entries that earn their place: the models:/<cat>.<sch>.<name>@<alias> URI format (UC-specific), set_registry_uri("databricks-uc")
  (non-obvious), the artifact_location="dbfs:/Volumes/..." shape (non-obvious), and SQL forms like DESCRIBE MODEL.

  5. The skill doesn't separate "you would never guess this" from "this is just MLflow"

  GOTCHAS #13 (artifact_path → name= rename) and #12 (mlflow[databricks] extras) are gold. #1, #2, #6, #7, #9 are gold. #14 (Pipeline
  preprocessing) is sklearn 101 — every ML engineer learns this. It dilutes the signal.

  6. SKILL.md has two workflow tables that overlap with user-journeys.md

  Workflow 1/2/3 in SKILL.md ≈ Journeys 1/2/4. Pick one location.

  ---
  Proposed restructure

  Target: 3 files, ~400–500 lines total (down from 8 files, 1,666 lines — ~70% reduction)

  SKILL.md                        (~120 lines — entry point + decision tree + quick start + Databricks-specific notes)
  references/gotchas.md           (~200 lines — only Databricks/UC-specific gotchas)
  references/recipes.md           (~150 lines — UC-specific code shapes for the 4 real workflows)

  File 1: SKILL.md — keep current structure but trim

  Keep:
  - The frontmatter (good description, gates correctly)
  - "Why this skill exists" 3-skill comparison table (essential for routing — I conflate these otherwise)
  - Quick Start (the 4-step copy-paste — actually useful)
  - A single decision table mapping situation → recipe section (not "Workflow 1, 2, 3" tables that re-list the same patterns)

  Cut:
  - All three Workflow tables (replaced by a 6-row decision table)
  - Common Issues table (lives in gotchas.md only)
  - Reference Files list (3 files — I can ls)
  - Runtime compatibility section (2 lines max, not a paragraph)

  File 2: gotchas.md — keep only the non-obvious, Databricks-specific ones

  Keep (and trim each to ~10 lines: symptom + fix + one-sentence why):
  - #1 set_registry_uri("databricks-uc") required
  - #2 Three-level UC names mandatory (with the silently-wrong workspace-registry case — that's the killer)
  - #4 artifact_location must be a UC volume in UC-enforced workspaces
  - #6 Production/Staging aliases are silently no-op (huge trap)
  - #7 CREATE MODEL ON SCHEMA is a separate grant (admins miss this)
  - #9 ai_query ≠ batch inference for custom models (naming-overlap trap)
  - #11 Construct spark_udf at module scope in Lakeflow SDP
  - #12 mlflow[databricks] extras for non-Databricks compute
  - #13 artifact_path= → name= deprecation

  Cut entirely (I'd handle these without help):
  - #3 "use alias not version" — covered as a single rule in SKILL.md
  - #5 "verify after register_model" — generic good practice, one-liner in SKILL.md                         File 2: gotchas.md — keep only the non-obvious, Databricks-specific ones
                                                                                                            Keep (and trim each to ~10 lines: symptom + fix + one-sentence why):
  - #1 set_registry_uri("databricks-uc") required                                                           - #2 Three-level UC names mandatory (with the silently-wrong workspace-registry case — that's the
  killer)                                                                                                   - #4 artifact_location must be a UC volume in UC-enforced workspaces
  - #6 Production/Staging aliases are silently no-op (huge trap)                                            - #7 CREATE MODEL ON SCHEMA   Cut:                                                                                                                                    - All three Workflow tables (replaced by a 6-row decision table)
  - Common Issues table (lives in gotchas.md only)                                                                                        - Reference Files list (3 files — I can ls)
  - Runtime compatibility section (2 lines max, not a paragraph)                                                                          File 2: gotchas.md — keep only the non-obvious, Databricks-specific ones

                                                                Keep (and trim each to ~10 lines: symptom + fix + one-sentence why):
  - #1 set_registry_uri("databricks-uc") required
  - #2 Three-level UC names mandatory (with the silently-wrong workspace-registry case — that's the killer)
  - #4 artifact_location must be a UC volume in UC-enforced workspaces
  - #6 Production/Staging aliases are silently no-op (huge trap)
  - #7 CREATE MODEL ON SCHEMA is a separate grant (admins miss this)
  - #9 ai_query ≠ batch inference for custom models (naming-overlap trap)
  - #11 Construct spark_udf at module scope in Lakeflow SDP
  - #12 mlflow[databricks] extras for non-Databricks compute
  - #13 artifact_path= → name= deprecation

  Cut entirely (I'd handle these without help):

  Keep (and trim each to ~10 lines: symptom + fix + one-sentence why):
  - #1 set_registry_uri("databricks-uc") required
  - #2 Three-level UC names mandatory (with the silently-wrong workspace-registry case — that's the killer)
  - #4 artifact_location must be a UC volume in UC-enforced workspaces
  - #6 Production/Staging aliases are silently no-op (huge trap)
  - #7 CREATE MODEL ON SCHEMA is a separate grant (admins miss this)
  - #9 ai_query ≠ batch inference for custom models (naming-overlap trap)
  - #11 Construct spark_udf at module scope in Lakeflow SDP                                                                               - #12 mlflow[databricks] extras for non-Databricks compute
  - #13 artifact_path= → name= deprecation

                                                                                                Cut entirely (I'd handle these without help):
  - #3 "use alias not version" — covered as a single rule in SKILL.md
  - #5 "verify after register_model" — generic good practice, one-liner in SKILL.md                                                       - #8 "log signature + input_example" — covered in the Quick Start; just one bullet in SKILL.md
  - #10 "can't reuse version numbers" — obvious                                                                                           - #14 "wrap preprocessing in Pipeline" — sklearn 101

                                                                                    File 3: recipes.md — UC-specific code shapes only, not full implementations

                                                             For each of the 4 real workflows, give me the 5–15 lines that are Databricks-specific, not the full training loop. Tell me what to do,
   not how to write sklearn.
                                                                                                              Sections:
  1. Experiment + UC volume setup (the two lines I might forget: set_registry_uri + artifact_location shape)
  2. Log → Register → Alias (the canonical 4-line pattern, plus tags syntax — I'd guess register_model argument order wrong otherwise)
  3. Lakeflow SDP spark_udf shape (module-scope construction + result_type for multi-output — the only spot I genuinely need a template)
  4. A/B promotion alias swap (the delete_registered_model_alias + set_registered_model_alias sequence — easy to mess up the order)
  5. Verification one-liners (DESCRIBE MODEL, model.aliases dict shape, get_model_info for signature debugging)

  For each, prefer "do X with this exact arg" guidance over fully-formed code blocks. Example:

  ▎ Register + champion alias. After training: result = mlflow.register_model(f"runs:/{run_id}/model", "<cat>.<sch>.<name>") returns a
  ▎ ModelVersion with .version (str). Then MlflowClient().set_registered_model_alias(name, "champion", result.version). register_model
  ▎ does NOT set aliases — separate call required.

  That's enough. I don't need the full with mlflow.start_run(): block.

  Things to delete entirely

  - user-journeys.md — fold the genuinely useful "debugging which-registry" flow into gotchas #1, drop the rest. The rest is
  pointer-shuffling.
  - CRITICAL-interfaces.md — 90% of it is plain MLflow API surface. The 4–5 UC-specific lines move into recipes.md or SKILL.md Quick
  Start.
  - patterns-experiment-setup.md — Patterns 5–6 (workspace-vs-repo paths, notebook auto-association) are filler. The one essential bit
  (UC volume shape + CREATE VOLUME IF NOT EXISTS) is two lines in recipes.md.
  - patterns-training.md — sklearn boilerplate. Replace with a single "logging UC essentials" subsection in recipes.md: signature +
  input_example + name= (not artifact_path=) + log Pipeline (one sentence).
  - patterns-uc-registration.md — most of this is general MLflow API. The Pattern 5 alias-swap sequence and the permissions table earn
  their spot in recipes.md.
  - patterns-batch-inference.md — Pattern 1 (load_model + .predict) is trivial. Pattern 2 (matplotlib) is filler. Pattern 6 (streaming)
  is generic. Keep only the spark_udf module-scope pattern + result_type table for multi-output.

  What I'd add that isn't there now

  - A one-line statement at the top of SKILL.md: "Read this file fully; consult gotchas.md before writing UC code; consult recipes.md
  only for the alias-swap and spark_udf patterns." Right now nothing tells me to short-circuit reading the references for trivial tasks.
  - An explicit negative list: "If you're tempted to read X-pattern.md to figure out how to do basic sklearn training, stop — you don't
  need it." Saves my future-self tokens.
  - The runtime gotcha matrix in one place: MLflow 2.x vs 3.x diffs (artifact_path rename, deprecated stages). Currently scattered.

  ---
  Token impact estimate

  - Current load when this skill triggers: SKILL.md (~3.5K tokens) + likely 1–2 references (~6–10K tokens) = ~10–14K tokens for any
  non-trivial task.
  - After restructure: SKILL.md (~1.5K tokens) + at most one reference (~2K tokens) = ~3.5K tokens. ~70% reduction with no loss of
  UC-specific knowledge.

  TL;DR — what to change

  1. Delete 5 files: user-journeys.md, CRITICAL-interfaces.md, patterns-experiment-setup.md, patterns-training.md,
  patterns-batch-inference.md. Fold the ~5% that's UC-specific into the survivors.
  2. Slim GOTCHAS.md from 14 to 9 entries; cut sklearn-101 ones; trim each entry to ~10 lines.
  3. Replace 4 pattern files with one recipes.md containing only Databricks-specific code shapes (not full training loops).
  4. SKILL.md: keep the 3-skill scope table + Quick Start, replace 3 workflow tables with 1 decision table.
  5. Reframe the voice: "do X" / "watch out for Y" — not "here's a complete training script with imports."

  Net: 8 files → 3 files, ~1,666 lines → ~470 lines, ~70% token reduction, with every UC-specific gotcha and code shape preserved.

Quentin posted a Claude-generated audit on PR databricks-solutions#474 specifying the restructure. Ran gpt-5.5 in logfood with the audit as the spec. Changes: 8 files / 1,666 lines → 3 files / 485 lines (71% reduction). Structure: - SKILL.md (91 lines) — frontmatter, 3-skill comparison table, hard rules, Quick Start, decision table for situation→recipe routing, read-order instruction at top, negative list ("don't read X-pattern.md for sklearn 101"). - references/gotchas.md (161 lines) — only Databricks/UC-specific failures: silently-wrong workspace registry, three-level UC names, artifact_location UC volume in UC-enforced workspaces, alias-on-stage no-op, CREATE MODEL ON SCHEMA grant, ai_query vs custom-model batch, spark_udf module-scope in Lakeflow SDP, mlflow[databricks] extras, artifact_path→name deprecation. Each entry: symptom + silent/loud + fix + one-sentence why. - references/recipes.md (233 lines) — UC-specific code shapes only: experiment + UC volume setup, log→register→alias canonical pattern, Lakeflow SDP spark_udf module-scope, A/B alias swap order, verification one-liners. Deleted (per Quentin's audit): - references/CRITICAL-interfaces.md (90% plain MLflow API) - references/GOTCHAS.md (replaced by lowercase gotchas.md, dropping the generic entries: alias-not-version, verify-after-register, signature basics, version reuse, Pipeline preprocessing — all generic MLflow / sklearn knowledge) - references/user-journeys.md (pure pointer-shuffling) - references/patterns-experiment-setup.md - references/patterns-training.md - references/patterns-uc-registration.md - references/patterns-batch-inference.md Workflow tables in SKILL.md replaced by a 6-row decision table. Common Issues table consolidated into gotchas.md. Reference Files list dropped — Claude can ls. Co-authored-by: Isaac

macOS case-insensitive filesystem hid this from the previous commit. The content was already lowercased in references; this commit makes the git index match. Co-authored-by: Isaac

dustinvannoy-db requested a review from jacksandom April 24, 2026 17:11

David O'Keeffe added 3 commits May 8, 2026 12:13

dgokeeffe force-pushed the feat/databricks-mlflow-ml-skill branch from bf84ee5 to cf21195 Compare May 8, 2026 02:30

David O'Keeffe added 2 commits May 9, 2026 15:26

chore(mlflow-ml): rename GOTCHAS.md → gotchas.md (case fix)

b424134

macOS case-insensitive filesystem hid this from the previous commit. The content was already lowercased in references; this commit makes the git index match. Co-authored-by: Isaac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(databricks-skills): add databricks-mlflow-ml skill for classic ML#474

feat(databricks-skills): add databricks-mlflow-ml skill for classic ML#474
dgokeeffe wants to merge 5 commits intodatabricks-solutions:mainfrom
dgokeeffe:feat/databricks-mlflow-ml-skill

dgokeeffe commented Apr 19, 2026

Uh oh!

dustinvannoy-db commented Apr 24, 2026

Uh oh!

dgokeeffe commented Apr 27, 2026

Uh oh!

QuentinAmbard commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dgokeeffe commented Apr 19, 2026

Why

What's in the skill

Key gotchas this skill teaches that other guides miss

Testing

Structure parity

Not in scope

Origin

Uh oh!

dustinvannoy-db commented Apr 24, 2026

Uh oh!

dgokeeffe commented Apr 27, 2026

Uh oh!

QuentinAmbard commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants