Skip to content

Commit f74cb67

Browse files
brettclaude
andcommitted
docs: add HIP/AMD NaN warning for q8_0/turbo3 on large K-norm models
Adds a prominent WARNING block to turboquant-recommendations.md documenting the observed NaN divergence when using q8_0 or turbo3 compression on models with large K-vector norms (e.g. Qwen2.5-7B) on AMD/ROCm (HIP) backends. The root cause is the int8 overflow path that differs between HIP and CUDA. Recommended mitigations: switch to turbo2/turbo4 or add pre-quantization K-norm clipping. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 46efe26 commit f74cb67

1 file changed

Lines changed: 4 additions & 0 deletions

File tree

docs/turboquant-recommendations.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,10 @@ These configurations showed promising results but have less validation depth:
4646
| Q8_0 weights | `-ctk q8_0 -ctv turbo2` | phi-4 +3.1% |
4747
| Q4_K_M, Qwen2.5-7B (AMD) | `-ctk q8_0 -ctv turbo3` | NaN on HIP (Metal gets +2.0%). HIP-specific, under investigation |
4848

49+
> ⚠️ **WARNING: q8_0/turbo3 produces NaN on HIP/AMD with models that have large K norms**
50+
> (e.g. Qwen2.5-7B where K norms can reach 274). This is under active investigation.
51+
> **Safe AMD alternative: q8_0/turbo4.**
52+
4953
### Boundary V (auto-enabled for turbo2-V)
5054

5155
A layer-aware V compression strategy that protects the first 2 + last 2 layers with q8_0-V while compressing all remaining layers with turbo2-V. **Auto-enabled when `-ctv turbo2` is set** on recent builds. Opt-out: `TURBO_LAYER_ADAPTIVE=0`. On older builds, activate with `TURBO_LAYER_ADAPTIVE=7`.

0 commit comments

Comments
 (0)