Improve Claude Code skill descriptions and naming consistency by bosconi · Pull Request #35881 · MaterializeInc/materialize

bosconi · 2026-04-06T14:38:30Z

Summary

Audit and improve the Claude Code skills in .claude/skills/ to trigger more
reliably and avoid confusing overlaps.

Commit-by-commit breakdown

Rename skills to use consistent mz- prefix — adapter-guide, debug-ci,
limits-test, parallel-workload, platform-checks, and query-tracing are
renamed to match the convention already used by mz-benchmark, mz-commit,
mz-profile, mz-run, mz-test, and mz-pr-review. Updates name frontmatter
fields and the trace_tree.py path reference.
mz-pr-review: add missing name field and improve description — This skill
was missing the name frontmatter field entirely, which could prevent correct
identification. The description was too generic and missed casual triggers like
"review my code" or "does this look ok".
mz-debug-ci: expand description with casual trigger phrases — The old one-line
description only matched formal phrases. Users more often say "why is CI red" or
"checks failing".
mz-test: add cross-references to specialized framework skills — Clarifies
mz-test as the general testing guide and entry point for framework selection,
pointing to mz-platform-checks, mz-parallel-workload, and mz-limits-test for deep
framework-specific guidance.
mz-adapter-guide: add problem-oriented trigger phrases — The old description
only triggered on file paths and crate names. Adding triggers like "how does the
coordinator work" helps the skill activate when someone is asking questions, not
just editing files.
mz-query-tracing: add problem-oriented trigger phrases — Reframed from
mechanism-focused ("tracing, spans, Tempo") to problem-focused ("why is this query
slow", "where is the time going").
mz-benchmark: clarify distinction from mz-parallel-workload — Explicitly notes
this is about performance measurement frameworks, not the parallel-workload
stress-testing framework.
mz-parallel-workload: clarify distinction from mz-benchmark — Mirror
disambiguation: leads with "stress-testing for panics/errors" and points to
mz-benchmark for performance measurement.
Update skills README — Rewritten with clearer descriptions, new mz- names,
a dedicated "Specialized Test Frameworks" section, and a note that the README is
human documentation only (not used for skill triggering).
Add skills section to CLAUDE.md — A brief nudge reminding Claude that
project-specific mz-* skills exist and are worth consulting before starting a
task. This is lightweight by design: the skill descriptions are already in context
via SKILL.md frontmatter, so CLAUDE.md does not duplicate them.
mz-test: fix cross-references to use mz- prefixed skill names — The
cross-references said platform-checks, parallel-workload, and limits-test
without the mz- prefix, so they did not match the actual skill names.
mz-commit: remove "code review" trigger to avoid collision with mz-pr-review
— Both skills listed "code review" as a trigger phrase, but mz-commit is for
creating commits/PRs while mz-pr-review is for reviewing changes. Removed the
ambiguous trigger from mz-commit and added a pointer to mz-pr-review.

What was deliberately not done

No exhaustive skill listing in CLAUDE.md. Duplicating all 12 skill descriptions
would waste context tokens and create a maintenance burden. The brief mention is
sufficient to nudge the model to check the skill list it already has.
No AGENTS.md file. AGENTS.md is a cross-tool convention (Gemini CLI, Codex,
etc.) that Claude Code also reads. Since the team is using Claude Code and CLAUDE.md
is already established, adding AGENTS.md now would be premature. If other AI tools
are adopted later, a symlink AGENTS.md -> CLAUDE.md is the right move — the
skills section would be harmless noise for tools that do not support skills.

Test plan

Verify all 12 skills appear in the skill list at session start (all mz-* prefix)
Spot-check triggering: "why is CI red" should suggest mz-debug-ci
Spot-check triggering: "review my changes" should suggest mz-pr-review
Spot-check triggering: "code review" should suggest mz-pr-review, not mz-commit
Verify /mz-query-tracing works and references the correct trace_tree.py path

🤖 Generated with Claude Code

Renames adapter-guide, debug-ci, limits-test, parallel-workload, platform-checks, and query-tracing to use the mz- prefix, matching the convention already used by mz-benchmark, mz-commit, mz-profile, mz-run, mz-test, and mz-pr-review. Updates the name field in each SKILL.md frontmatter and fixes the trace_tree.py path reference in mz-query-tracing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This skill was missing the `name` frontmatter field entirely, which could prevent Claude Code from correctly identifying it. The description was also too generic ("when the user asks for a review") and missed common casual triggers like "review my code", "check my diff", or "does this look ok". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The old one-line description only matched formal phrases like "Buildkite failures". Users more often say things like "why is CI red", "build broken", or "checks failing" — or just paste a Buildkite URL. The expanded description covers these patterns so the skill triggers when it should. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mz-test is the general testing guide, but mz-platform-checks, mz-parallel-workload, and mz-limits-test provide deeper guidance for their specific frameworks. Without cross-references, both the general and specific skill could trigger redundantly. The updated description clarifies mz-test as the starting point for framework selection and points to the dedicated skills for deep framework usage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The old description only triggered on file paths and crate names, missing users who ask questions like "how does the coordinator work" or "what are read holds". Adding problem-oriented triggers helps the skill activate when someone is trying to understand the subsystem, not just editing it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The old description was focused on the mechanism (tracing, spans, Tempo) rather than the problem users are trying to solve. Adding triggers like "why is this query slow" and "where is the time going" helps the skill activate when users describe symptoms, not just when they already know they want tracing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mz-benchmark includes a "Parallel Benchmark" framework, and mz-parallel-workload is a separate stress-testing framework. The similar names could confuse the model. The updated description explicitly notes that mz-benchmark is about performance measurement, not the parallel-workload stress-testing framework. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The description now leads with what the framework does (stress-testing for panics/errors) and explicitly notes it is not for performance measurement, pointing to mz-benchmark for that. This disambiguates the two skills which have confusingly similar names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rewrite the README to use the new mz- prefixed names, add a note that this file is human documentation only (not used for skill triggering), reorganize test framework skills into their own section with guidance to start from mz-test, and use clearer "When to use" descriptions throughout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

CLAUDE.md is always loaded into context, so a brief mention of the mz-* skills helps Claude check for relevant skills before starting a task. This is a lightweight nudge rather than a full listing — the skill descriptions themselves are already in context via SKILL.md frontmatter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-06T14:38:41Z

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

The PR title is descriptive and will make sense in the git log.
This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

The description referenced platform-checks, parallel-workload, and limits-test without the mz- prefix. These need to match the actual skill names so the model can find them in its skill list. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…-review Both mz-commit and mz-pr-review listed "code review" as a trigger phrase, but they serve different purposes: mz-commit is for creating commits and PRs, mz-pr-review is for reviewing changes. A user saying "code review" almost certainly wants the review skill, not the commit skill. Added a pointer to mz-pr-review to make the boundary explicit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Keep the skills nudge generic so it does not go stale when skills are added or renamed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

bosconi and others added 10 commits April 6, 2026 10:25

bosconi requested review from antiguru, def-, ggevay, jasonhernandez, jubrad and mtabebe April 6, 2026 14:57

bosconi and others added 2 commits April 6, 2026 11:30

CLAUDE.md: remove specific skill names to avoid staleness

edbb8f4

Keep the skills nudge generic so it does not go stale when skills are added or renamed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

antiguru approved these changes Apr 6, 2026

View reviewed changes

bosconi merged commit 95e620a into main Apr 6, 2026
6 checks passed

bosconi deleted the jc/skill-linting branch April 6, 2026 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Claude Code skill descriptions and naming consistency#35881

Improve Claude Code skill descriptions and naming consistency#35881
bosconi merged 13 commits intomainfrom
jc/skill-linting

bosconi commented Apr 6, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bosconi commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Commit-by-commit breakdown

What was deliberately not done

Test plan

Uh oh!

github-actions bot commented Apr 6, 2026

PR title guidelines

Pre-merge checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bosconi commented Apr 6, 2026 •

edited

Loading