Skip to content

release: v0.35.0 (subsumes #1873, #1884, #1887, #1890)#1894

Merged
noahgift merged 6 commits into
mainfrom
release/v0.35.0
May 22, 2026
Merged

release: v0.35.0 (subsumes #1873, #1884, #1887, #1890)#1894
noahgift merged 6 commits into
mainfrom
release/v0.35.0

Conversation

@noahgift
Copy link
Copy Markdown
Contributor

Summary

Cuts v0.35.0 off updated origin/main (81 commits since v0.34.0). Squash-merges the four release-blocker PRs into one commit-per-feature on the release branch — saves three CI runs and keeps the merge history clean.

Subsumes (closes)

Headline win (methodology)

#1864 cuBLAS FP8 7B Q4K "gibberish" was not a numerical bug. The Golden Output gate's gen_config used ..Default::default() without overriding stop_tokens. Empty stop_tokens → generation ran the full 512-token budget → after the correct answer "4" the model continued from in-distribution chat-template noise → <|im_start|> repeats. Fix: 5 lines (PR #1890). SPEC-CUBLAS-FP8-7B-FIX-001 was authored as a 6-stage cascade but is now SUPERSEDED — Stages A/B kept as general FP8 diagnostic tooling.

Methodology lesson saved: memory/feedback_falsify_simple_before_deep.md — when a test gate FAIL looks like a deep numerical/kernel bug, first check whether the user-visible code path that the test purports to verify ALSO fails. ~3 hours of phantom investigation was rescued by the user asking "couldn't this just be a chat template or something simple?"

Commits (6 — squash-merge this PR)

  1. chore(fmt): cargo fmt --all baseline (fixes chore(distill): default to MODEL-1 7B teacher + SPEC-DISTILL-001 §86 (PMAT-701 follow-up) #1871's fmt drift on main)
  2. chore: README drift + apr serve syntax (was chore(readme): fix drift (1134→1148, 82→103) + apr serve example syntax #1873)
  3. feat(cublas-fp8): Stage A deterministic reproducer (was feat(cublas-fp8-7b-stage-a): deterministic reproducer for #1864 (SPEC Stage A) #1884)
  4. feat(cublas-fp8): Stage B per-layer parity instrumentation (was feat(cublas-fp8-7b-stage-b): per-layer parity dumps + drift signature (SPEC Stage B) #1887)
  5. fix(1864): Golden Output gate stop_tokens (was fix(qa): add EOS stop_tokens to Golden Output gate — closes phantom #1864 cuBLAS #1890)
  6. chore(release): bump to v0.35.0 + CHANGELOG + README contract count 1151 → 1153

Pre-push gates (local pass)

CHANGELOG

See CHANGELOG.md v0.35.0 entry — full 81-commit scope across distill (GB10 Blackwell), MoE (qwen3 KV cache + streaming + sampling), and the 2026-05-22 dogfood pass.

Verification (post-merge)

  • cargo install --path crates/apr-cli --force succeeds with v0.35.0
  • apr qa <qwen2.5-coder-7b-instruct-q4_k_m.gguf> ALL GATES PASSED (was: ✗ Golden Output)
  • apr serve run <7B> + curl /v1/chat/completions returns "2+2 equals 4."
  • Tag v0.35.0 + GitHub Release after merge
  • Crates.io publish cascade — ASK USER FIRST per CLAUDE.md

🤖 Generated with Claude Code

noahgift added 6 commits May 22, 2026 16:49
Workspace 0.34.0 → 0.35.0 across root Cargo.toml + all path-dep callsites
+ regenerate Cargo.lock. CHANGELOG v0.35.0 entry captures the 81-commit
release scope:

1. Distill Phase 1-3 working end-to-end on NVIDIA GB10 Blackwell sm_121
2. MoE (Qwen3) KV cache + streaming SSE + sampling
3. 2026-05-22 dogfood pass: 8 bugs surfaced, 7 fixed. #1864 was a 5-line
   stop_tokens config gap, not a deep cuBLAS FP8 numerical bug — see
   feedback_falsify_simple_before_deep.md

README contract count 1151 → 1153 (post Stage A + Stage B contracts).
@noahgift noahgift enabled auto-merge (squash) May 22, 2026 14:58
@noahgift noahgift merged commit aaafa15 into main May 22, 2026
19 of 21 checks passed
@noahgift noahgift deleted the release/v0.35.0 branch May 22, 2026 15:35
noahgift added a commit that referenced this pull request May 22, 2026
…pdates

Closes a packaging gap surfaced by post-publish dogfood: `cargo install
aprender --features cuda` failed because root facade exposed only `cli` +
`default`. Per memory/feedback_cuda_feature_footgun, --features cuda is the
documented install for GPU users (20 vs 400 tok/s).

This patch adds passthroughs that forward to apr-cli:
  cuda, cuda-batch, wgpu, inference, training, training-gpu,
  visualization, zram, xet, whisper, full

Also bundles README housekeeping for the v0.35.x release pair:
  - Hiatus banner (3-month freeze through 2026-08-22)
  - v0.35.0 + v0.35.1 release callouts
  - Contract count drift fix (1151 → 1153, missed by #1894)
  - Updated Quick Start with --features cuda/full examples

Only root facade bumps (0.35.0 → 0.35.1). Sub-crates stay at 0.35.0;
aprender@0.35.1 depends on apr-cli@0.35.0 (no transitive churn).

End-user impact: `cargo install aprender --features cuda` now works.
noahgift added a commit that referenced this pull request May 22, 2026
…pdates (#1895)

Closes a packaging gap surfaced by post-publish dogfood: `cargo install
aprender --features cuda` failed because root facade exposed only `cli` +
`default`. Per memory/feedback_cuda_feature_footgun, --features cuda is the
documented install for GPU users (20 vs 400 tok/s).

This patch adds passthroughs that forward to apr-cli:
  cuda, cuda-batch, wgpu, inference, training, training-gpu,
  visualization, zram, xet, whisper, full

Also bundles README housekeeping for the v0.35.x release pair:
  - Hiatus banner (3-month freeze through 2026-08-22)
  - v0.35.0 + v0.35.1 release callouts
  - Contract count drift fix (1151 → 1153, missed by #1894)
  - Updated Quick Start with --features cuda/full examples

Only root facade bumps (0.35.0 → 0.35.1). Sub-crates stay at 0.35.0;
aprender@0.35.1 depends on apr-cli@0.35.0 (no transitive churn).

End-user impact: `cargo install aprender --features cuda` now works.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant