Add Claude Code automation layer for MAD (commands, agents, workflows) by coketaste · Pull Request #160 · ROCm/MAD

coketaste · 2026-06-01T02:43:22Z

Summary

Adds a Claude Code automation layer on top of `madengine` so common MAD
tasks — benchmarking, adding models, profiling, tuning, and analysis — can be
driven by slash commands and multi-agent workflows instead of hand-assembled
CLI invocations.

- **Slash commands + agents**: `/mad-benchmark`, `/mad-add-model`, `/mad-profile`,
  `/mad-report`, `/mad-tune`, `/mad-validate`, backed by dedicated subagents
  (benchmark-runner, model-author, perf-analyst, tuner).
- **Workflows**: `mad-benchmark-sweep` (fan out a benchmark matrix across tags,
  parallel, synthesize a comparison table) and `mad-tune-search` (profile-driven
  tuning search with adversarial verification).
- **Static validator** for `models.json` entries (JSON, paths, Dockerfile header,
  performance-line contract) — no GPU required.
- **How-to docs**: `mad-automation-howto.html`.

## Key design decisions

- **Workflows accept the same CLI-style flags as `/mad-benchmark`**
  (`--tags`, `--additional-context`, `--ngpus`, `--timeout`, `--plan`), and thread
  `--additional-context` (HF token, `docker_env_vars`) into every cell. They
  execute by default; `--plan`/`--no-execute` for a GPU-free dry run.
- **`mad-benchmark-sweep` runs cells in parallel** with per-cell `perf_<tag>.csv`
  isolation (no shared-file clobbering); precision is intentionally not a sweep
  axis (it's fixed per model in `models.json`).
- **`mad-tune-search` is profile-driven**: a Diagnose phase profiles the model
  once to classify the bottleneck (compute/memory/communication/launch) from
  kernel hotspots, proposals must cite that evidence, and candidates are then
  measured **cleanly and sequentially** (profiler stripped) so deltas aren't
  buried in tracing overhead or corrupted by GPU contention / parallel file edits.
- Both workflows handle `multiple_results` models (per-metric CSV instead of a
  stdout `performance:` line).

## Test plan

- [ ] `madengine discover --tags <tag>` resolves for the documented examples
- [ ] `/mad-benchmark-sweep --tags ... --plan` validates without a GPU
- [ ] `/mad-benchmark-sweep --tags <tag> --additional-context '{...}'` executes
      on a GPU host and writes `perf_<tag>.csv`
- [ ] `/mad-tune-search --tag <model> --plan` produces a profiling-informed
      candidate list
- [ ] `/mad-tune-search --tag <model> --additional-context '{...}'` diagnoses
      the bottleneck and measures candidates cleanly
- [ ] `/mad-validate` passes for all `models.json` entries
- [ ] `mad-automation-howto.html` renders and matches current workflow behavior

@main

Add a foundation pack so common MAD tasks (benchmarking, adding models, tuning, development) can be driven through Claude Code with the repo's conventions baked in: - CLAUDE.md: models.json schema, 4-step add-model flow, the "performance: <value> <unit>" stdout contract, madengine v2.1.0 CLI commands, deployment inference, and profiling. - .claude/agents/: mad-model-author, mad-perf-analyst, mad-benchmark-runner, mad-tuner. - .claude/commands/: mad-add-model, mad-benchmark, mad-profile, mad-report, mad-tune. - .claude/workflows/: mad-benchmark-sweep and mad-tune-search dynamic workflows (plan-only by default; execute:true on a GPU host). - .claude/settings.json: shared read-only/common command allowlist. Commands and docs verified against the installed madengine Typer CLI v2.1.0 (@main): top-level build/run/discover/report/database. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

- mad-benchmark-sweep: drop the non-functional precision axis (precision is fixed per-model via training_precision or baked into the inference image, not a runtime flag) and give each cell its own perf_<cell>.csv to avoid clobbering a shared perf.csv under parallel execution; flag unresolved/errored cells. - mad-model-author: document the multiple_results output contract alongside the performance: stdout line, and confirm new entries resolve via discover. - Add /mad-validate: GPU-free static checker (JSON, paths, Dockerfile CONTEXT header, output contract) with errors vs convention-warning severities. - Point profiling agent/command at scripts/common/tools.json as the source of truth and document the deploy-key convention. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Copilot

Pull request overview

Adds a Claude Code automation layer for MAD so common benchmarking, profiling, reporting, tuning, validation, and model-authoring tasks can be driven through slash commands, specialized agents, and two workflow scripts.

Changes:

Adds Claude Code command prompts, agent definitions, workflow scripts, and permission settings under .claude/.
Adds repository guidance in CLAUDE.md for MAD conventions and madengine usage.
Adds a standalone HTML how-to guide documenting the automation layer.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
`.claude/agents/mad-benchmark-runner.md`	Defines benchmark/profile command assembly and execution behavior.
`.claude/agents/mad-model-author.md`	Defines model scaffolding guidance for new MAD entries.
`.claude/agents/mad-perf-analyst.md`	Defines read-only benchmark result analysis behavior.
`.claude/agents/mad-tuner.md`	Defines iterative tuning behavior.
`.claude/commands/mad-add-model.md`	Adds slash command prompt for adding models.
`.claude/commands/mad-benchmark.md`	Adds slash command prompt for benchmark runs.
`.claude/commands/mad-profile.md`	Adds slash command prompt for profiled benchmark runs.
`.claude/commands/mad-report.md`	Adds slash command prompt for result analysis.
`.claude/commands/mad-tune.md`	Adds slash command prompt for tuning.
`.claude/commands/mad-validate.md`	Adds static validation command for MAD model entries.
`.claude/settings.json`	Adds Claude Code tool/command permission allow-list.
`.claude/workflows/mad-benchmark-sweep.js`	Adds parallel benchmark sweep workflow.
`.claude/workflows/mad-tune-search.js`	Adds candidate-based tuning search workflow.
`CLAUDE.md`	Adds Claude Code repository guidance for MAD.
`mad-automation-howto.html`	Adds user-facing HTML guide for commands, agents, workflows, and tips.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+  const safe = label.replace(/[^A-Za-z0-9._-]/g, '_')
+  const outFile = `perf_${safe}.csv`
+  const flags = [
+    '--tags ' + cell.tag,


+  const cmd = `madengine run ${flags}${ctx}`
+
+  const action = execute
+    ? `If AMD GPUs are present (check rocm-smi/amd-smi), RUN this command and parse the "performance: <value> <unit>" line from output. Results land in ${outFile}. If no GPUs, set status "skipped".`


+  const ctxParts = []
+  if (cell.nGpus) ctxParts.push(`"n_gpus": "${cell.nGpus}"`)
+  const ctx = ctxParts.length ? ` --additional-context '{${ctxParts.join(', ')}}'` : ''


+const nCandidates = Math.max(1, Math.min(cfg.candidates || 4, 8))
+const execute = cfg.execute === true
+
+phase('Baseline')


+  candidates,
+  (cand) => {
+    const action = execute
+      ? `If AMD GPUs are present, apply the change, run "madengine run --tags ${tag} --live-output", parse the performance line, then REVERT the change. If no GPUs, status "skipped".`


+    if sp:
+        sh_files = [sp] if sp.endswith(".sh") else glob.glob(os.path.join(sp,"**","*.sh"), recursive=True)
+        for sh in sh_files:
+            if os.path.isfile(sh) and "performance:" in open(sh, errors="ignore").read():


+## Future (skeleton, not yet wired)
+
+`reference_db/mad_agent.db` (tables: `model_baselines`, `optimization_history`,
+`best_configurations`, `learned_patterns`) and `knowledge_base/` exist as an
+empty scaffold for a future persistent optimization-memory layer. Not populated
+yet — do not assume data is present.


+- Smoke-test wiring with a single small tag before large sweeps. (There is no
+  `dummy` model in this repo's `models.json` — confirm a real tag with
+  `madengine discover`.)


+      "Bash(madengine discover *)",
+      "Bash(python3 -m json.tool *)",
+      "Bash(git status)",
+      "Bash(git diff *)",
+      "Bash(git log *)",
+      "Bash(rocm-smi *)",
+      "Bash(amd-smi *)",


…g & tuning Brings Claude Code's multi-agent automation to MAD: plain slash commands now drive the full benchmark/tune lifecycle, with dynamic workflows that fan out specialized subagents and synthesize their results - turning manual, flag-heavy madengine runs into one-line, self-orchestrating operations. Headline capabilities: - mad-benchmark-sweep: a dynamic workflow that benchmarks many models in parallel and auto-builds a comparison table, with isolated per-cell output so parallel runs never clobber each other. - mad-tune-search: an agentic, profiling-driven tuning loop. It profiles once to DIAGNOSE the real bottleneck (compute/memory/communication/launch), proposes evidence-backed candidates, measures each on clean runs, and has an independent agent adversarially verify every claimed gain before recommending a config - decisions grounded in data, not guesswork. Engineering changes that make this work: - Both workflows now accept the CLI-style flags the slash commands actually pass (--tags / --additional-context / --plan); the prior object-only parsing silently dropped tags and context and defaulted to plan-only. They thread --additional-context into every run and execute by default. - mad-tune-search splits one context into a profiled variant (Diagnose) and a clean variant (measurement), evaluates candidates sequentially to avoid GPU contention and config-edit races, and isolates output to perf_tune_<id>.csv. - mad-automation-howto.html updated to match: CLI-flag tables, execute-by-default callouts, the 5-phase tune-search, and a bottleneck-to-lever reference table. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Add real results and operational lessons from the Qwen3-8B profiling and tuning session on MI350X: - TunableOp warmup warning: first cold run collapses throughput to ~0.2-1 tok/s during online GEMM benchmarking; measurements are only valid on warm subsequent runs. Added to env var card and Tips section. - New red callout in tune-search: Docker containers outlive the Claude session, so a slow candidate can stall the whole sequential search and leave a config edit (e.g. extended.yaml) unapplied. Always pass --timeout to bound each candidate run. - New Tips entry "Qwen3-8B live tuning findings": rocm_trace_lite kernel breakdown (Cijk GEMM 27% + wvSplitK 21% = compute-bound), and the headline result -- max_concurrency=32 delivered 7993 tok/s throughput vs 422 tok/s baseline (18.9x), confirming concurrency as the primary vLLM serving lever. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Correct the architecture SVG so all 6 slash commands, both workflows, and the inline /mad-validate path are represented accurately. Give /mad-profile its own box, route /mad-validate to an inline-script node, split the two workflows with distinct fan-out targets, and color-code arrows with a legend. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

coketaste and others added 3 commits May 31, 2026 12:51

Create mad automation howto html as docs

b78525b

coketaste self-assigned this Jun 1, 2026

Copilot AI review requested due to automatic review settings June 1, 2026 02:43

coketaste requested review from Rohan138, amathews-amd, gargrahul and ppalaniappan-amd as code owners June 1, 2026 02:43

Copilot started reviewing on behalf of coketaste June 1, 2026 02:43 View session

coketaste marked this pull request as draft June 1, 2026 02:43

Copilot AI reviewed Jun 1, 2026

View reviewed changes

coketaste and others added 3 commits June 1, 2026 03:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Claude Code automation layer for MAD (commands, agents, workflows)#160

Add Claude Code automation layer for MAD (commands, agents, workflows)#160
coketaste wants to merge 6 commits into
ROCm:developfrom
coketaste:coketaste/claude-workflow

coketaste commented Jun 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

coketaste commented Jun 1, 2026

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants