chore(distill): default to MODEL-1 7B teacher + SPEC-DISTILL-001 §86 (PMAT-701 follow-up) by noahgift · Pull Request #1871 · paiml/aprender

noahgift · 2026-05-22T07:14:57Z

Summary

Now that PMAT-701 Bug A (#1863) and Bug B (#1869) have landed, the MODEL-1 7B teacher (`paiml/qwen2.5-coder-7b-apache-q4k-v1`) is feasible on Grace Blackwell GB10. This PR is the small `chore` change that flips the dispatch script default + records the why in SPEC-DISTILL-001 §86.

Changes

`scripts/dispatch-distill-phase-3-gx10.sh`: `TEACHER_REPO` default changes from `Qwen/Qwen2.5-Coder-0.5B-Instruct` (smoke fallback) → `paiml/qwen2.5-coder-7b-apache-q4k-v1` (the spec's intended teacher). Smoke-only callers override with `TEACHER_REPO=...`. Old comment about the 1.5B Block-0 OOM is replaced with the PMAT-701 fix references.
`docs/specifications/aprender-train/distillation-epic-spec.md`: new §86 amendment documenting the 5-whys, the two fixed bugs, the new falsifier `F-DISTILL-V2-001-TEACHER-DIVERGENCE` (preflight reject when STEPS>=5000 and teacher==student without an explicit override), and the discharge of the prior Stage D 50K + 10K runs as no-KD. Spec version bumped 1.1.0 → 1.2.0.

Why this matters

The Phase 4 Stage D 50K (25 h) and 10K (5 h) runs in 2026-05-20/21 silently inherited the Phase 3 smoke workaround of TEACHER_REPO == STUDENT_INIT == 0.5B. KD signal was ~zero (KL between identical distributions); 30 hours of compute fine-tuned the base model toward gibberish on a synthetic-ish corpus. The §86 amendment makes that mistake hard to repeat.

Test plan

`bash -n scripts/dispatch-distill-phase-3-gx10.sh` — syntax-ok
Spec markdown renders cleanly
CI: `ci / gate` + `workspace-test` green
Operator: when ready, re-dispatch Stage D with the new defaults — `STEPS=50000 ./scripts/dispatch-distill-phase-3-gx10.sh`. Compute estimate ~50 h on GB10 (slower than the previous 0.5B-teacher run because realizar's 7B forward is heavier; this is acceptable given the falsifier-quality gain).

🤖 Generated with Claude Code

…(PMAT-701 follow-up) The Phase 4 Stage D 50K + 10K runs (2026-05-20/21) silently inherited the Phase 3 smoke workaround of TEACHER_REPO == STUDENT_INIT == 0.5B. Result: no KD signal, 30 h of compute that fine-tuned the base model toward gibberish on a small corpus. Documented in `evidence/distill-7b-teacher-loadtest-gx10/findings.json` + this spec amendment. Now that PMAT-701 Bug A (PR #1863) and Bug B (PR #1869) have landed, the 7B Q4K teacher is feasible on Grace Blackwell GB10: * PR #1863: trueno-gpu allocator autodetects unified-memory devices (Grace, Tegra) and routes to cuMemAllocManaged so the full 128 GB pool is reachable. * PR #1869: new RealizarQ4KTeacher keeps Q4K teacher weights quantized on the GPU (no F32 dequant at upload), eliminating the OOM-kill that was killing the first training step. This PR flips the dispatch script's default and codifies the why in spec §86: * `scripts/dispatch-distill-phase-3-gx10.sh` — TEACHER_REPO default changes from `Qwen/Qwen2.5-Coder-0.5B-Instruct` (smoke fallback) to `paiml/qwen2.5-coder-7b-apache-q4k-v1` (the MODEL-1 teacher the spec was designed around). Smoke-only callers override with the env var. * `docs/specifications/aprender-train/distillation-epic-spec.md` — adds §86 documenting the 5-whys, the fix references, and a new falsifier F-DISTILL-V2-001-TEACHER-DIVERGENCE that rejects future Phase-4-class dispatches where teacher == student unless an explicit override is set. * Spec version bumped to 1.2.0 with changelog entry. The §86 amendment also notes that the existing 50K + 10K Stage D runs do NOT count toward AC-DISTILL-003 — they're discharged as no-KD baselines, and a re-dispatched 50K run with the 7B teacher is required for a real Phase 4 verdict. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 22, 2026 07:15

noahgift mentioned this pull request May 22, 2026

fix(eval): apr eval no longer reports fake pass@1=1.0 on broken models (PMAT-702) #1874

Open

6 tasks

Merge branch 'main' into chore/distill-phase4-7b-teacher-default

a693a6d

This was referenced May 22, 2026

docs(spec): SPEC-DISTILL-001 §87 — PMAT-704 post-mortem on Bug B wrong turn #1880

Open

feat(distill): wire ProgressCallback into Pipeline — close training-monitoring gap (PMAT-705) #1881

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(distill): default to MODEL-1 7B teacher + SPEC-DISTILL-001 §86 (PMAT-701 follow-up)#1871

chore(distill): default to MODEL-1 7B teacher + SPEC-DISTILL-001 §86 (PMAT-701 follow-up)#1871
noahgift wants to merge 2 commits into
mainfrom
chore/distill-phase4-7b-teacher-default

noahgift commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 22, 2026

Summary

Changes

Why this matters

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant