Skip to content

feat(databricks-skills): add databricks-mlflow-ml skill for classic ML#474

Open
dgokeeffe wants to merge 5 commits intodatabricks-solutions:mainfrom
dgokeeffe:feat/databricks-mlflow-ml-skill
Open

feat(databricks-skills): add databricks-mlflow-ml skill for classic ML#474
dgokeeffe wants to merge 5 commits intodatabricks-solutions:mainfrom
dgokeeffe:feat/databricks-mlflow-ml-skill

Conversation

@dgokeeffe
Copy link
Copy Markdown

Why

The existing MLflow-related skills leave a gap for classic ML practitioners:

Skill Scope Covers classic ML UC registration?
databricks-mlflow-evaluation GenAI agent evaluation (mlflow.genai.evaluate, scorers, judges) ❌ Different audience
databricks-model-serving Real-time serving endpoints ❌ Covers serving, not training/registration
databricks-unity-catalog Tables, volumes, system tables ❌ Data primitives, not model registry
databricks-mlflow-ml (this PR) Classic ML training + UC registration + batch inference

A data scientist training a forecasting model, registering it to Unity Catalog, and scoring predictions in a notebook or Lakeflow pipeline has no skill to trigger on. This PR fills that gap.

What's in the skill

SKILL.md — workflow index (Train → Register → Score, Retrain + Promote A/B, Debugging), quick-start, runtime compatibility note, and trigger description.

7 reference files:

  • GOTCHAS.md — 14 common mistakes with symptoms + fixes
  • CRITICAL-interfaces.md — exact API signatures + the models:/catalog.schema.model@alias URI format
  • patterns-experiment-setup.md — UC volume artifact_location (required in UC-enforced workspaces)
  • patterns-training.md — logging with signature + input_example, sklearn.Pipeline wrapping, autologging
  • patterns-uc-registration.md — three-level names, @champion/@challenger aliases, verification via DESCRIBE MODEL, A/B promotion
  • patterns-batch-inference.md — notebook pyfunc.load_model (Tier 1), Lakeflow SDP pyfunc.spark_udf (Tier 2), champion-vs-challenger validation, explicit warning against ai_query on custom UC models
  • user-journeys.md — 7 end-to-end workflows including debugging scenarios

Key gotchas this skill teaches that other guides miss

  1. UC volume artifact_location on experiment creation — DBFS root is rejected in UC-enforced workspaces. Every log_model call fails with opaque errors until artifact_location points at a UC volume.
  2. mlflow.set_registry_uri('databricks-uc') — without this, register_model silently routes to the legacy workspace registry. The Add initial skills for Databricks development #1 "my model isn't showing up in Catalog Explorer" support question.
  3. ai_query on custom UC models — doesn't work. Requires a serving endpoint. Correct primitive is mlflow.pyfunc.load_model (notebook) or mlflow.pyfunc.spark_udf (Lakeflow).
  4. @champion / @challenger aliases — replace deprecated transition_model_version_stage() stages. The legacy API still exists but is a no-op on UC-registered models (no error, no effect).
  5. mlflow.pyfunc.spark_udf in Lakeflow SDP — must be constructed at module scope, not inside @dp.materialized_view. Otherwise deserialization repeats on every pipeline evaluation.
  6. pip install 'mlflow[databricks]' — required for UC registration outside Databricks clusters. Plain pip install mlflow omits the cloud-storage SDKs (azure-core / boto3 / google.cloud) MLflow needs to stage UC artifacts. Clusters ship the extras pre-installed.

Testing

Field-tested end-to-end against a live Databricks workspace:

  • Feature table seeded, trained a GradientBoostingRegressor
  • Registered to UC with @champion alias — verified in Catalog Explorer UI
  • Loaded via mlflow.pyfunc.load_model — predictions within ~2% of actuals
  • Two additional gotchas surfaced during the test (mlflow[databricks] install + artifact_path deprecation) and added to GOTCHAS.md

Runtime verified: MLflow 3.11 on Lakeflow SDP serverless compute v5 (current default). Patterns compatible with MLflow 2.16+ — pairs on older classic DBRs still get correct behaviour. 2.x/3.x divergences called out in GOTCHAS.md (e.g., artifact_pathname=).

Structure parity

File layout matches databricks-mlflow-evaluation (same SKILL.md + references/ + GOTCHAS.md + CRITICAL-interfaces.md + patterns-*.md convention). Installable via the existing install_skills.sh:

./install_skills.sh databricks-mlflow-ml

Not in scope

  • Model Serving endpoints (databricks-model-serving covers that)
  • GenAI agent evaluation (databricks-mlflow-evaluation covers that)
  • Generic UC primitives like volumes and tables (databricks-unity-catalog covers those)

Deliberately narrow — classic ML + UC registration + batch inference only.

Origin

Built to fill a gap encountered during the Coles Vibe Workshop (airgapped Databricks field-engineer hackathon). DS pairs needed UC-scoped MLflow guidance that wasn't covered by any existing skill. Content battle-tested in the workshop before being contributed upstream.

@dustinvannoy-db
Copy link
Copy Markdown
Collaborator

Do the mlflow official skills we install not over this gap? cc: @jacksandom

@dgokeeffe
Copy link
Copy Markdown
Author

@dustinvannoy-db I checked the mlflow/skills repo (what install_genie_code_skills.py pulls from). All 8 skills are GenAI/LLM-tracing scoped: agent-evaluation, mlflow-onboarding, instrumenting-with-mlflow-tracing, analyze-mlflow-trace, analyze-mlflow-chat-session, querying-mlflow-metrics, retrieving-mlflow-traces, searching-mlflow-docs. Not one touches Unity Catalog, set_registry_uri('databricks-uc'), @champion/@challenger aliases, or pyfunc.spark_udf.

The UC-specific stuff is what this PR covers: UC-enforced workspaces rejecting DBFS artifact roots, the legacy stage transition API silently no-oping on UC models, ai_query not working on custom UC models. That belongs here rather than upstream — it's Databricks config, not MLflow API.

David O'Keeffe added 3 commits May 8, 2026 12:13
Fills the gap between databricks-mlflow-evaluation (GenAI agent eval) and
databricks-model-serving (real-time endpoints). Covers:

- Classic ML model training with MLflow tracking
  (sklearn / XGBoost / PyTorch)
- Experiment creation with UC volume artifact_location
  (required in UC-enforced workspaces)
- Unity Catalog model registration with three-level names
- @Champion / @Challenger alias management
- Batch inference via mlflow.pyfunc.load_model (notebook, up to ~10k rows)
- Distributed batch via mlflow.pyfunc.spark_udf in Lakeflow SDP pipelines

Structure mirrors databricks-mlflow-evaluation:
- SKILL.md: workflows + trigger description + quick start
- references/GOTCHAS.md: 12 common mistakes with symptoms + fixes
- references/CRITICAL-interfaces.md: exact API signatures + models:/ URI format
- references/patterns-experiment-setup.md: UC volume artifact_location setup
- references/patterns-training.md: logging with signature + input_example
- references/patterns-uc-registration.md: register + alias + verify + A/B
- references/patterns-batch-inference.md: pyfunc.load_model + spark_udf + ai_query anti-pattern
- references/user-journeys.md: 7 end-to-end workflows including debugging

Key gotchas covered that other MLflow guides miss:
- Experiment creation now requires UC volume artifact_location in UC-enforced
  workspaces (DBFS root writes are rejected)
- mlflow.set_registry_uri('databricks-uc') is required; silent workspace
  registry fallback is the databricks-solutions#1 support question
- ai_query does NOT work on custom UC-registered models unless they're
  deployed to a serving endpoint; use pyfunc.load_model or spark_udf instead
- UC aliases (@champion/@Challenger) replace deprecated stage transitions
  (transition_model_version_stage is a no-op on UC models)
- mlflow.pyfunc.spark_udf must be constructed at module scope in Lakeflow
  SDP pipelines, not inside the function body

Tested against MLflow 2.16+ on Databricks Runtime 15.4 LTS. Content battle-
tested in the Coles Vibe Workshop (classic-ML track running in an airgapped
environment where online MLflow docs aren't reachable).
Field-tested the skill end-to-end from a local Python environment against
a live Databricks workspace. Surfaced two gotchas not in the original set:

databricks-solutions#12 mlflow[databricks] extras missing when running outside Databricks:
plain `pip install mlflow` omits azure-core / boto3 / google.cloud SDKs
that UC registration needs to stage artifacts. Training + log_model work;
register_model fails with opaque "No module named 'azure'". Databricks
clusters ship the extras pre-installed, so this only bites laptops / CI.

databricks-solutions#13 artifact_path= deprecated in favour of name= (MLflow 2.16+): emits
warning on every log_model call. Non-blocking, but worth flagging since
most online tutorials + training courses still use the old param.

Both verified against the workshop's test run — skill workflow 1 now
completes cleanly with these fixes documented.
Original SKILL.md didn't state a runtime target. Adds a "Runtime compatibility"
section anchored on what the skill was actually tested against — MLflow 3.11
on Lakeflow SDP serverless compute v5 — with a compat note for MLflow 2.16+
(classic DBR 15.4 LTS still ships 2.x). Points at GOTCHAS.md for the 3.x-vs-2.x
divergence (artifact_path deprecation, etc.).
@dgokeeffe dgokeeffe force-pushed the feat/databricks-mlflow-ml-skill branch from bf84ee5 to cf21195 Compare May 8, 2026 02:30
@QuentinAmbard
Copy link
Copy Markdown
Collaborator

here's what claude suggest:

  Report: databricks-mlflow-ml skill audit

  Current state

  8 files, 1,666 lines. Structure: SKILL.md + 7 references (GOTCHAS, CRITICAL-interfaces, 4 pattern files, user-journeys).

  What's wrong (the honest assessment)

  1. Massive redundancy — the same 5–6 facts repeated 4–5 times each

  The five "load-bearing" facts of this skill are:
  - Set mlflow.set_registry_uri("databricks-uc")
  - Use three-level UC names
  - Pin artifact_location to a UC volume
  - Log with signature + input_example
  - Load via models:/cat.sch.name@alias (alias not version)
  - Use pyfunc.spark_udf (not ai_query) for batch

  Each one appears in: SKILL.md Quick Start + Common Issues table + GOTCHAS entry + CRITICAL-interfaces section + a pattern file +
  user-journeys + Workflow tables. Same fact, 4–6 places. That's massive duplication for a model that doesn't need it — I would memorize
   it after one pass.

  2. Treats me like a junior engineer who needs full code

  Patterns 1, 2, 3 of patterns-training.md are textbook sklearn — train_test_split, GradientBoostingRegressor.fit, mlflow.log_metrics. I
   know all of this. What I need is what's specific to UC + MLflow on Databricks. The actual UC-relevant lines per pattern: 3–5. The
  rest is filler.

  Same with patterns-batch-inference.md Pattern 2 (matplotlib chart), Pattern 6 (structured streaming boilerplate), Pattern 5 (basic A/B
   comparison with mean_absolute_error). I don't need these spelled out.

  3. user-journeys.md is almost pure pointer-shuffling

  Every step is "do X (see file Y, pattern Z)". It's a table of contents disguised as content. The workflows in SKILL.md already do
  this. Journey 7 ("everything is on fire") is fluff.

  4. CRITICAL-interfaces.md is a poorly-disguised cheatsheet of what I already know

  mlflow.search_runs(...), MlflowClient().set_registered_model_alias(...) — I'd write these correctly without prompting. The only
  entries that earn their place: the models:/<cat>.<sch>.<name>@<alias> URI format (UC-specific), set_registry_uri("databricks-uc")
  (non-obvious), the artifact_location="dbfs:/Volumes/..." shape (non-obvious), and SQL forms like DESCRIBE MODEL.

  5. The skill doesn't separate "you would never guess this" from "this is just MLflow"

  GOTCHAS #13 (artifact_path → name= rename) and #12 (mlflow[databricks] extras) are gold. #1, #2, #6, #7, #9 are gold. #14 (Pipeline
  preprocessing) is sklearn 101 — every ML engineer learns this. It dilutes the signal.

  6. SKILL.md has two workflow tables that overlap with user-journeys.md

  Workflow 1/2/3 in SKILL.md ≈ Journeys 1/2/4. Pick one location.

  ---
  Proposed restructure

  Target: 3 files, ~400–500 lines total (down from 8 files, 1,666 lines — ~70% reduction)

  SKILL.md                        (~120 lines — entry point + decision tree + quick start + Databricks-specific notes)
  references/gotchas.md           (~200 lines — only Databricks/UC-specific gotchas)
  references/recipes.md           (~150 lines — UC-specific code shapes for the 4 real workflows)

  File 1: SKILL.md — keep current structure but trim

  Keep:
  - The frontmatter (good description, gates correctly)
  - "Why this skill exists" 3-skill comparison table (essential for routing — I conflate these otherwise)
  - Quick Start (the 4-step copy-paste — actually useful)
  - A single decision table mapping situation → recipe section (not "Workflow 1, 2, 3" tables that re-list the same patterns)

  Cut:
  - All three Workflow tables (replaced by a 6-row decision table)
  - Common Issues table (lives in gotchas.md only)
  - Reference Files list (3 files — I can ls)
  - Runtime compatibility section (2 lines max, not a paragraph)

  File 2: gotchas.md — keep only the non-obvious, Databricks-specific ones

  Keep (and trim each to ~10 lines: symptom + fix + one-sentence why):
  - #1 set_registry_uri("databricks-uc") required
  - #2 Three-level UC names mandatory (with the silently-wrong workspace-registry case — that's the killer)
  - #4 artifact_location must be a UC volume in UC-enforced workspaces
  - #6 Production/Staging aliases are silently no-op (huge trap)
  - #7 CREATE MODEL ON SCHEMA is a separate grant (admins miss this)
  - #9 ai_query ≠ batch inference for custom models (naming-overlap trap)
  - #11 Construct spark_udf at module scope in Lakeflow SDP
  - #12 mlflow[databricks] extras for non-Databricks compute
  - #13 artifact_path= → name= deprecation

  Cut entirely (I'd handle these without help):
  - #3 "use alias not version" — covered as a single rule in SKILL.md
  - #5 "verify after register_model" — generic good practice, one-liner in SKILL.md                         File 2: gotchas.md — keep only the non-obvious, Databricks-specific ones
                                                                                                            Keep (and trim each to ~10 lines: symptom + fix + one-sentence why):
  - #1 set_registry_uri("databricks-uc") required                                                           - #2 Three-level UC names mandatory (with the silently-wrong workspace-registry case — that's the
  killer)                                                                                                   - #4 artifact_location must be a UC volume in UC-enforced workspaces
  - #6 Production/Staging aliases are silently no-op (huge trap)                                            - #7 CREATE MODEL ON SCHEMA   Cut:                                                                                                                                    - All three Workflow tables (replaced by a 6-row decision table)
  - Common Issues table (lives in gotchas.md only)                                                                                        - Reference Files list (3 files — I can ls)
  - Runtime compatibility section (2 lines max, not a paragraph)                                                                          File 2: gotchas.md — keep only the non-obvious, Databricks-specific ones

                                                                Keep (and trim each to ~10 lines: symptom + fix + one-sentence why):
  - #1 set_registry_uri("databricks-uc") required
  - #2 Three-level UC names mandatory (with the silently-wrong workspace-registry case — that's the killer)
  - #4 artifact_location must be a UC volume in UC-enforced workspaces
  - #6 Production/Staging aliases are silently no-op (huge trap)
  - #7 CREATE MODEL ON SCHEMA is a separate grant (admins miss this)
  - #9 ai_query ≠ batch inference for custom models (naming-overlap trap)
  - #11 Construct spark_udf at module scope in Lakeflow SDP
  - #12 mlflow[databricks] extras for non-Databricks compute
  - #13 artifact_path= → name= deprecation

  Cut entirely (I'd handle these without help):

  Keep (and trim each to ~10 lines: symptom + fix + one-sentence why):
  - #1 set_registry_uri("databricks-uc") required
  - #2 Three-level UC names mandatory (with the silently-wrong workspace-registry case — that's the killer)
  - #4 artifact_location must be a UC volume in UC-enforced workspaces
  - #6 Production/Staging aliases are silently no-op (huge trap)
  - #7 CREATE MODEL ON SCHEMA is a separate grant (admins miss this)
  - #9 ai_query ≠ batch inference for custom models (naming-overlap trap)
  - #11 Construct spark_udf at module scope in Lakeflow SDP                                                                               - #12 mlflow[databricks] extras for non-Databricks compute
  - #13 artifact_path= → name= deprecation

                                                                                                Cut entirely (I'd handle these without help):
  - #3 "use alias not version" — covered as a single rule in SKILL.md
  - #5 "verify after register_model" — generic good practice, one-liner in SKILL.md                                                       - #8 "log signature + input_example" — covered in the Quick Start; just one bullet in SKILL.md
  - #10 "can't reuse version numbers" — obvious                                                                                           - #14 "wrap preprocessing in Pipeline" — sklearn 101

                                                                                    File 3: recipes.md — UC-specific code shapes only, not full implementations

                                                             For each of the 4 real workflows, give me the 5–15 lines that are Databricks-specific, not the full training loop. Tell me what to do,
   not how to write sklearn.
                                                                                                              Sections:
  1. Experiment + UC volume setup (the two lines I might forget: set_registry_uri + artifact_location shape)
  2. Log → Register → Alias (the canonical 4-line pattern, plus tags syntax — I'd guess register_model argument order wrong otherwise)
  3. Lakeflow SDP spark_udf shape (module-scope construction + result_type for multi-output — the only spot I genuinely need a template)
  4. A/B promotion alias swap (the delete_registered_model_alias + set_registered_model_alias sequence — easy to mess up the order)
  5. Verification one-liners (DESCRIBE MODEL, model.aliases dict shape, get_model_info for signature debugging)

  For each, prefer "do X with this exact arg" guidance over fully-formed code blocks. Example:

  ▎ Register + champion alias. After training: result = mlflow.register_model(f"runs:/{run_id}/model", "<cat>.<sch>.<name>") returns a
  ▎ ModelVersion with .version (str). Then MlflowClient().set_registered_model_alias(name, "champion", result.version). register_model
  ▎ does NOT set aliases — separate call required.

  That's enough. I don't need the full with mlflow.start_run(): block.

  Things to delete entirely

  - user-journeys.md — fold the genuinely useful "debugging which-registry" flow into gotchas #1, drop the rest. The rest is
  pointer-shuffling.
  - CRITICAL-interfaces.md — 90% of it is plain MLflow API surface. The 4–5 UC-specific lines move into recipes.md or SKILL.md Quick
  Start.
  - patterns-experiment-setup.md — Patterns 5–6 (workspace-vs-repo paths, notebook auto-association) are filler. The one essential bit
  (UC volume shape + CREATE VOLUME IF NOT EXISTS) is two lines in recipes.md.
  - patterns-training.md — sklearn boilerplate. Replace with a single "logging UC essentials" subsection in recipes.md: signature +
  input_example + name= (not artifact_path=) + log Pipeline (one sentence).
  - patterns-uc-registration.md — most of this is general MLflow API. The Pattern 5 alias-swap sequence and the permissions table earn
  their spot in recipes.md.
  - patterns-batch-inference.md — Pattern 1 (load_model + .predict) is trivial. Pattern 2 (matplotlib) is filler. Pattern 6 (streaming)
  is generic. Keep only the spark_udf module-scope pattern + result_type table for multi-output.

  What I'd add that isn't there now

  - A one-line statement at the top of SKILL.md: "Read this file fully; consult gotchas.md before writing UC code; consult recipes.md
  only for the alias-swap and spark_udf patterns." Right now nothing tells me to short-circuit reading the references for trivial tasks.
  - An explicit negative list: "If you're tempted to read X-pattern.md to figure out how to do basic sklearn training, stop — you don't
  need it." Saves my future-self tokens.
  - The runtime gotcha matrix in one place: MLflow 2.x vs 3.x diffs (artifact_path rename, deprecated stages). Currently scattered.

  ---
  Token impact estimate

  - Current load when this skill triggers: SKILL.md (~3.5K tokens) + likely 1–2 references (~6–10K tokens) = ~10–14K tokens for any
  non-trivial task.
  - After restructure: SKILL.md (~1.5K tokens) + at most one reference (~2K tokens) = ~3.5K tokens. ~70% reduction with no loss of
  UC-specific knowledge.

  TL;DR — what to change

  1. Delete 5 files: user-journeys.md, CRITICAL-interfaces.md, patterns-experiment-setup.md, patterns-training.md,
  patterns-batch-inference.md. Fold the ~5% that's UC-specific into the survivors.
  2. Slim GOTCHAS.md from 14 to 9 entries; cut sklearn-101 ones; trim each entry to ~10 lines.
  3. Replace 4 pattern files with one recipes.md containing only Databricks-specific code shapes (not full training loops).
  4. SKILL.md: keep the 3-skill scope table + Quick Start, replace 3 workflow tables with 1 decision table.
  5. Reframe the voice: "do X" / "watch out for Y" — not "here's a complete training script with imports."

  Net: 8 files → 3 files, ~1,666 lines → ~470 lines, ~70% token reduction, with every UC-specific gotcha and code shape preserved.


David O'Keeffe added 2 commits May 9, 2026 15:26
Quentin posted a Claude-generated audit on PR databricks-solutions#474 specifying the
restructure. Ran gpt-5.5 in logfood with the audit as the spec.

Changes: 8 files / 1,666 lines → 3 files / 485 lines (71% reduction).

Structure:
- SKILL.md (91 lines) — frontmatter, 3-skill comparison table, hard
  rules, Quick Start, decision table for situation→recipe routing,
  read-order instruction at top, negative list ("don't read X-pattern.md
  for sklearn 101").
- references/gotchas.md (161 lines) — only Databricks/UC-specific
  failures: silently-wrong workspace registry, three-level UC names,
  artifact_location UC volume in UC-enforced workspaces, alias-on-stage
  no-op, CREATE MODEL ON SCHEMA grant, ai_query vs custom-model batch,
  spark_udf module-scope in Lakeflow SDP, mlflow[databricks] extras,
  artifact_path→name deprecation. Each entry: symptom + silent/loud +
  fix + one-sentence why.
- references/recipes.md (233 lines) — UC-specific code shapes only:
  experiment + UC volume setup, log→register→alias canonical pattern,
  Lakeflow SDP spark_udf module-scope, A/B alias swap order, verification
  one-liners.

Deleted (per Quentin's audit):
- references/CRITICAL-interfaces.md (90% plain MLflow API)
- references/GOTCHAS.md (replaced by lowercase gotchas.md, dropping the
  generic entries: alias-not-version, verify-after-register, signature
  basics, version reuse, Pipeline preprocessing — all generic MLflow /
  sklearn knowledge)
- references/user-journeys.md (pure pointer-shuffling)
- references/patterns-experiment-setup.md
- references/patterns-training.md
- references/patterns-uc-registration.md
- references/patterns-batch-inference.md

Workflow tables in SKILL.md replaced by a 6-row decision table.
Common Issues table consolidated into gotchas.md.
Reference Files list dropped — Claude can ls.

Co-authored-by: Isaac
macOS case-insensitive filesystem hid this from the previous commit.
The content was already lowercased in references; this commit makes
the git index match.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants