Skip to content

chore(distill): Stage D dispatch wrapper with PMAT-701 lessons baked in#1883

Open
noahgift wants to merge 2 commits into
mainfrom
chore/stage-d-dispatch-wrapper
Open

chore(distill): Stage D dispatch wrapper with PMAT-701 lessons baked in#1883
noahgift wants to merge 2 commits into
mainfrom
chore/stage-d-dispatch-wrapper

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Summary

`scripts/dispatch-distill-stage-d.sh` is the operator entrypoint for Phase 4 Stage D production training. Captures the PMAT-701 cascade post-mortem lessons in a single dispatchable wrapper.

What it bakes in

Lesson Default behavior
PMAT-704 cuBLAS default (#1879) `APR_DISTILL_TEACHER_BACKEND=auto` — operator can opt-in to `realizar-q4k` for memory-constrained dGPUs
PMAT-705 per-step monitoring (#1881) `APR_DISTILL_LOG_EVERY=50` — visible loss progress without spam
PMAT-699 P0 checkpointing every 5000 steps (survives kill / crash)
PMAT-703 vocab alignment auto-applies inside cuda backend (no operator config)
Disk preflight requires ≥ 15 GB free; gx10 was 98% full when PMAT-704 incident surfaced
Teacher metadata validation requires stamped APR (apr-leaderboard cache); fails fast on the GGUF-import path that broke during PMAT-704
10 s alive check catches early validation errors before operator walks away

Override env vars

`STEPS`, `BATCH_SIZE`, `LR`, `T`, `ALPHA`, `DATASET_DIR`, `APR_DISTILL_LOG_EVERY`, `APR_DISTILL_CHECKPOINT_EVERY`, `APR_DISTILL_TEACHER_BACKEND`, `DISK_FREE_REQUIRED_GB`, `DRY_RUN`.

Intentionally separate from `dispatch-distill-phase-3-gx10.sh` (smoke). SPEC-DISTILL-001 §86 + `feedback_smoke_defaults_leak_into_production.md` codified why these should NOT share defaults.

QA

Cascade context

Companion to the PMAT-701 family of fixes. Ready to dispatch once #1879 (PMAT-704 cuBLAS default) and #1881 (PMAT-705 ProgressCallback) land — without those, this wrapper would default to the slow / silent path.

🤖 Generated with Claude Code

…s baked in

`scripts/dispatch-distill-stage-d.sh` is the operator entrypoint for
Phase 4 Stage D production training. Captures the PMAT-701 cascade
post-mortem lessons in a single dispatchable wrapper:

* **cuBLAS default** (PMAT-704 / #1879). `APR_DISTILL_TEACHER_BACKEND=auto`
  by default; operators can opt into the slower memory-constrained
  Realizar path via `APR_DISTILL_TEACHER_BACKEND=realizar-q4k`.
* **Per-step monitoring** (PMAT-705 / #1881). `APR_DISTILL_LOG_EVERY=50`
  default — visible loss progress without log spam. Operators can set
  =1 for verbose mode or =0 to silence.
* **PMAT-699 P0 checkpointing** every 5000 steps (durability — survives
  kill / crash).
* **PMAT-703 vocab alignment** auto-applies inside the cuda backend
  when teacher.vocab > student.vocab (no operator config needed).
* **Disk preflight**: requires ≥ 15 GB free on /home/noah (Stage D 50K
  writes ~12 GB of checkpoints; PMAT-704 cascade post-mortem caught
  gx10 at 98 % full). Fails fast with cleanup candidates listed.
* **Teacher / student validation**: requires stamped APR metadata
  (apr-leaderboard checkpoint by default — the dispatch-script's
  `apr import --preserve-q4k` path fails the cuda backend's
  metadata-required check, surfaced by PMAT-704 incident).
* **Process-alive check**: 10 s post-dispatch verification catches
  early validation errors so the operator doesn't walk away from a
  failed dispatch.

The wrapper is intentionally separate from `dispatch-distill-phase-3-gx10.sh`
which remains the Phase 3 smoke entrypoint. Stage D is production scope
and shouldn't inherit smoke defaults (see SPEC-DISTILL-001 §86 +
memory `feedback_smoke_defaults_leak_into_production.md`).

## Override env vars

* `STEPS` (default 50000)
* `BATCH_SIZE` (default 32)
* `LR` (default 1.5e-5)
* `T` (default 4.0)
* `ALPHA` (default 0.3)
* `DATASET_DIR` (unset → synthetic; set to a `.bin` shard dir for real corpus)
* `APR_DISTILL_LOG_EVERY` (default 50)
* `APR_DISTILL_CHECKPOINT_EVERY` (default 5000)
* `APR_DISTILL_TEACHER_BACKEND` (default `auto`)
* `DISK_FREE_REQUIRED_GB` (default 15)
* `DRY_RUN=1` to plan only

## QA

* `bash -n scripts/dispatch-distill-stage-d.sh` — syntax-ok
* `bashrs lint scripts/dispatch-distill-stage-d.sh` — 0 errors
  (warnings are df-non-determinism + path-traversal-ln, both expected
  for an operator-supplied path dispatcher)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) May 22, 2026 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant