Skip to content

feat(distill): APR_DISTILL_MAX_STEPS smoke-validation mode (PMAT-706)#1888

Open
noahgift wants to merge 2 commits into
mainfrom
feat/apr-distill-smoke-only-pmat-706
Open

feat(distill): APR_DISTILL_MAX_STEPS smoke-validation mode (PMAT-706)#1888
noahgift wants to merge 2 commits into
mainfrom
feat/apr-distill-smoke-only-pmat-706

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Summary

`APR_DISTILL_MAX_STEPS=N` runs at most N training steps, prints loss-trajectory + projected wall-time summary, exits without writing output. Operators validate a 30-50 h Stage D cascade in ~60 s.

Closes the diagnostic loop on the PMAT-704 cascade. PMAT-705 (#1881) surfaced per-step loss; PMAT-706 adds the early-break so operators don't wait through the full epoch budget.

Changes

`crates/aprender-train-distill/src/pipeline.rs`:

  • env vars: `APR_DISTILL_MAX_STEPS=N` (early-break) + `APR_DISTILL_PROJECT_TO_STEPS` (default 50000, sets projection target)
  • N=0 or invalid → early `Err` with clear message
  • train loop breaks at `step >= N`, prints two `[SMOKE]` summary lines
  • `execute()` short-circuits export when smoke mode → no `model.safetensors` / `output.apr` written

Contract

`contracts/apr-distill-smoke-validation-v1.yaml`:

  • 3 equations + 4 falsifiers + 2 Kani harnesses + qa_gate F-SMOKE-001
  • Validates clean (`pv validate` 0/0)

Tests

4 unit tests in `pmat_706_smoke_validation`, all PASS (serialized via Mutex to avoid env race):

  • `falsify_smoke_001_exact_step_count`
  • `falsify_smoke_002_no_regression_when_unset`
  • `falsify_smoke_004_no_output_in_smoke`
  • `smoke_zero_steps_returns_err`

Output

```
[PMAT-706] smoke mode: APR_DISTILL_MAX_STEPS=10 (early-break after 10 steps; no final output.apr written)
...
[SMOKE] 10 steps in 1.20s: initial_loss=3.4567, final_loss=3.1234, throughput=8.33 step/s
[SMOKE] projected full-run wall time (50000 steps): 1.67h / 100.0 min / 6000s
[PMAT-706] smoke mode: skipping export — no model.safetensors / output.apr written
```

Methodology

Per memory `feedback_a_priori_theoretical_falsification.md`: 30 min of math saves 8 h of GPU. PMAT-706 is the runtime analog — 60 s of smoke saves 8 h of staring at a silent process.

🤖 Generated with Claude Code

When the operator sets `APR_DISTILL_MAX_STEPS=N` (default unset), the
distill training loop runs at most N steps, prints a per-run summary,
and exits without writing a final output model. Lets operators
validate the cascade end-to-end in ~60 s before committing to a 30-50 h
Stage D production run.

The PMAT-704 cascade post-mortem found that the 7B vocab-aligned
500-step validation hung at step 0 for 1.5 h with no per-step output.
PMAT-705 (#1881) added ProgressCallback to surface per-step loss
during normal runs. PMAT-706 adds the complementary EARLY-BREAK so
operators don't have to wait through the full epoch budget to see if
something's wrong.

## Changes

`crates/aprender-train-distill/src/pipeline.rs`:

* Reads `APR_DISTILL_MAX_STEPS` env var. Empty/unset = old behavior
  (no regression). N > 0 = run at most N steps then break. N = 0 or
  non-integer = early Err with clear message.
* Optional `APR_DISTILL_PROJECT_TO_STEPS` env var (default 50000)
  controls the projected-wall-time target in the summary.
* `train()` early-breaks the inner loop when step >= max_steps,
  prints two `[SMOKE]` summary lines (loss trajectory + projected
  wall time at the observed throughput), and returns empty weights /
  shapes via the normal Result path.
* `execute()` detects smoke mode (env var set) and short-circuits
  the export step — no `model.safetensors` / output.apr is written,
  so downstream tools (`apr eval`, `apr run`) can't accidentally
  consume a smoke result.

## Summary format

  [PMAT-706] smoke mode: APR_DISTILL_MAX_STEPS=N (early-break after N steps; no final output.apr written)
  ...
  [SMOKE] N steps in T.TTs: initial_loss=X.XXXX, final_loss=Y.YYYY, throughput=Z.ZZ step/s
  [SMOKE] projected full-run wall time (50000 steps): H.HHh / W.W min / S.Ss
  [PMAT-706] smoke mode: skipping export — no model.safetensors / output.apr written

## Contract

`contracts/apr-distill-smoke-validation-v1.yaml`:

* 3 equations: early_break_condition (off-by-one tight), smoke_summary_format,
  no_side_effects.
* 4 falsifiers (FT-SMOKE-001..004) covering exact step count, no-regression
  when unset, summary line format, no output.apr written.
* 2 Kani harnesses (count is tight; 0 steps is degenerate, not panic).
* qa_gate F-SMOKE-001.
* Validates clean: `pv validate` 0 errors, 0 warnings.

## Tests

`pipeline::tests::pmat_706_smoke_validation`:

  * `falsify_smoke_001_exact_step_count` — N=10 returns metrics.steps_completed == 10
  * `falsify_smoke_002_no_regression_when_unset` — unset → full epochs run
  * `falsify_smoke_004_no_output_in_smoke` — output_path empty + no model.* files
  * `smoke_zero_steps_returns_err` — N=0 returns Err

Tests share global env state; serialized via a Mutex (ENV_LOCK) so
they don't race in parallel threads. All 4 PASS.

## Methodology

This closes the diagnostic loop on the PMAT-704 cascade post-mortem
lesson. Per memory `feedback_a_priori_theoretical_falsification.md`:
30 min of math saves 8 h of GPU. PMAT-706 is the runtime analog:
60 s of smoke saves 8 h of staring at a silent process.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) May 22, 2026 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant