release: v0.35.0 (subsumes #1873, #1884, #1887, #1890)#1894
Merged
Conversation
Workspace 0.34.0 → 0.35.0 across root Cargo.toml + all path-dep callsites + regenerate Cargo.lock. CHANGELOG v0.35.0 entry captures the 81-commit release scope: 1. Distill Phase 1-3 working end-to-end on NVIDIA GB10 Blackwell sm_121 2. MoE (Qwen3) KV cache + streaming SSE + sampling 3. 2026-05-22 dogfood pass: 8 bugs surfaced, 7 fixed. #1864 was a 5-line stop_tokens config gap, not a deep cuBLAS FP8 numerical bug — see feedback_falsify_simple_before_deep.md README contract count 1151 → 1153 (post Stage A + Stage B contracts).
This was referenced May 22, 2026
noahgift
added a commit
that referenced
this pull request
May 22, 2026
…pdates Closes a packaging gap surfaced by post-publish dogfood: `cargo install aprender --features cuda` failed because root facade exposed only `cli` + `default`. Per memory/feedback_cuda_feature_footgun, --features cuda is the documented install for GPU users (20 vs 400 tok/s). This patch adds passthroughs that forward to apr-cli: cuda, cuda-batch, wgpu, inference, training, training-gpu, visualization, zram, xet, whisper, full Also bundles README housekeeping for the v0.35.x release pair: - Hiatus banner (3-month freeze through 2026-08-22) - v0.35.0 + v0.35.1 release callouts - Contract count drift fix (1151 → 1153, missed by #1894) - Updated Quick Start with --features cuda/full examples Only root facade bumps (0.35.0 → 0.35.1). Sub-crates stay at 0.35.0; aprender@0.35.1 depends on apr-cli@0.35.0 (no transitive churn). End-user impact: `cargo install aprender --features cuda` now works.
noahgift
added a commit
that referenced
this pull request
May 22, 2026
…pdates (#1895) Closes a packaging gap surfaced by post-publish dogfood: `cargo install aprender --features cuda` failed because root facade exposed only `cli` + `default`. Per memory/feedback_cuda_feature_footgun, --features cuda is the documented install for GPU users (20 vs 400 tok/s). This patch adds passthroughs that forward to apr-cli: cuda, cuda-batch, wgpu, inference, training, training-gpu, visualization, zram, xet, whisper, full Also bundles README housekeeping for the v0.35.x release pair: - Hiatus banner (3-month freeze through 2026-08-22) - v0.35.0 + v0.35.1 release callouts - Contract count drift fix (1151 → 1153, missed by #1894) - Updated Quick Start with --features cuda/full examples Only root facade bumps (0.35.0 → 0.35.1). Sub-crates stay at 0.35.0; aprender@0.35.1 depends on apr-cli@0.35.0 (no transitive churn). End-user impact: `cargo install aprender --features cuda` now works.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Cuts v0.35.0 off updated
origin/main(81 commits since v0.34.0). Squash-merges the four release-blocker PRs into one commit-per-feature on the release branch — saves three CI runs and keeps the merge history clean.Subsumes (closes)
Headline win (methodology)
#1864 cuBLAS FP8 7B Q4K "gibberish" was not a numerical bug. The Golden Output gate's
gen_configused..Default::default()without overridingstop_tokens. Emptystop_tokens→ generation ran the full 512-token budget → after the correct answer "4" the model continued from in-distribution chat-template noise →<|im_start|>repeats. Fix: 5 lines (PR #1890). SPEC-CUBLAS-FP8-7B-FIX-001 was authored as a 6-stage cascade but is now SUPERSEDED — Stages A/B kept as general FP8 diagnostic tooling.Methodology lesson saved:
memory/feedback_falsify_simple_before_deep.md— when a test gate FAIL looks like a deep numerical/kernel bug, first check whether the user-visible code path that the test purports to verify ALSO fails. ~3 hours of phantom investigation was rescued by the user asking "couldn't this just be a chat template or something simple?"Commits (6 — squash-merge this PR)
chore(fmt): cargo fmt --all baseline (fixes chore(distill): default to MODEL-1 7B teacher + SPEC-DISTILL-001 §86 (PMAT-701 follow-up) #1871's fmt drift on main)chore: README drift + apr serve syntax (was chore(readme): fix drift (1134→1148, 82→103) + apr serve example syntax #1873)feat(cublas-fp8): Stage A deterministic reproducer (was feat(cublas-fp8-7b-stage-a): deterministic reproducer for #1864 (SPEC Stage A) #1884)feat(cublas-fp8): Stage B per-layer parity instrumentation (was feat(cublas-fp8-7b-stage-b): per-layer parity dumps + drift signature (SPEC Stage B) #1887)fix(1864): Golden Output gate stop_tokens (was fix(qa): add EOS stop_tokens to Golden Output gate — closes phantom #1864 cuBLAS #1890)chore(release): bump to v0.35.0 + CHANGELOG + README contract count 1151 → 1153Pre-push gates (local pass)
cargo fmt --all -- --check— cleancargo test -p aprender-contracts --lib lint::— 191 passed, 0 failedcargo deny check advisories— ok (RUSTSEC-2026-0105 ignored per PR chore(deny): ignore RUSTSEC-2026-0105 (core2 yanked, transitive via bitstream-io) #1878)CHANGELOG
See
CHANGELOG.mdv0.35.0 entry — full 81-commit scope across distill (GB10 Blackwell), MoE (qwen3 KV cache + streaming + sampling), and the 2026-05-22 dogfood pass.Verification (post-merge)
cargo install --path crates/apr-cli --forcesucceeds with v0.35.0apr qa <qwen2.5-coder-7b-instruct-q4_k_m.gguf>ALL GATES PASSED (was: ✗ Golden Output)apr serve run <7B>+ curl/v1/chat/completionsreturns "2+2 equals 4."🤖 Generated with Claude Code