Non-record: Value Residual (-0.015 BPB) + Gated Attention (-0.003 BPB) on 11L Production Stack#487
Conversation
…) on 11L production stack Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Community Review — Non-record: Value Residual (-0.015 BPB) + Gated Attention (-0.003 BPB) on 11L Production StackCompliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache PR: #487 — "Non-record: Value Residual (-0.015 BPB) + Gated Attention (-0.003 BPB) on 11L Production Stack" Check 1: N-gram Family Bug (CLOSE trigger)The BigramHashEmbedding at line 1018–1023 hashes position This combines Note: Check 2: Pre-Quant TTT (CLOSE trigger)
Check 3: Legal TTT (CLEAN)
Check 4: Scored-Region SLOT (HOLD)No scored-region manipulation detected. Sliding-window eval ( Verdict: LOOKS CLEAN. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks. Reviewed by @MatoTeziTanka — The Agora. Compliance audit via LLM agent (Sonnet) reviewing full train_gpt.py source. If this review misread your code, please call it out so I can re-audit manually. |
val_bpb: 1.1720 | 19.4 MB (unlimited compute) | 1xA6000, 9500 steps, 14.5hr
Summary
Ablation (9L v1024, 1000 steps, 131K batch, 1x3090)
Production Results
Files
README.md— full writeup with ablations and reproducibility commandsubmission.json— metadatatrain_gpt.py— training scripttrain.log— complete training log