Skip to content

32B batch inference crashes on Blackwell sm_121 — FP8 poisons CUDA context #542

@noahgift

Description

@noahgift

Summary

32B Q4K_M batch inference (--batch-jsonl) crashes on NVIDIA Blackwell GB10 (sm_121, CUDA 13.0). The FP8 cache warmup poisons the CUDA context, causing all subsequent GPU operations to fail with CUDA_ERROR_ILLEGAL_ADDRESS.

Five-Whys

  1. Why does 32B GPU batch fail? generate_gpu_resident FAILED: Prefill workspace init failed: CUDA_ERROR_ILLEGAL_ADDRESS
  2. Why ILLEGAL_ADDRESS? FP8 cache warmup writes to invalid memory on sm_121
  3. Why does FP8 fail on sm_121? FP8 E4M3 kernels not compatible with Blackwell architecture
  4. Why is FP8 tried? Default FP8_PREFILL/FP8_DECODE not disabled for sm_121
  5. Root cause: Missing architecture detection — sm_121 should auto-disable FP8

Reproduction

# On NVIDIA GB10 (sm_121):
export SKIP_PARITY_GATE=1
apr run checkpoints/qwen2.5-coder-32b-instruct-q4km.apr --prompt "hello" --max-tokens 5 --json

# Output:
# [PMAT-053] FP8 cache warmup failed (non-fatal): CUDA_ERROR_ILLEGAL_ADDRESS (code: 700)
# [GH-480] generate_gpu_resident FAILED: Prefill workspace init failed: CUDA_ERROR_ILLEGAL_ADDRESS
# [CUDA-FAILFAST] Context poisoned during executor lifetime

Workaround

export SKIP_PARITY_GATE=1 FP8_PREFILL=0 FP8_DECODE=0

But even with FP8 disabled, the 32B model's PTX JIT compilation takes too long (120s+ for 64 layers × multiple kernel types) and gets terminated by process managers.

Expected Fix

  1. Auto-detect sm_121 and disable FP8 in CudaExecutor::new() (no env var needed)
  2. Add kernel pre-warming phase that survives long JIT compilation times
  3. Add provable contract: gpu_context_health — verify CUDA context is not poisoned after FP8 warmup

Impact

  • 7B GPU batch works (fewer kernels, faster JIT)
  • 32B GPU batch fails (64 layers, too many kernels, FP8 poisoning)
  • Blocks 32B MBPP eval on GPU (current score 74.40% has 18 GPU errors)

Hardware

  • NVIDIA GB10, sm_121, CUDA 13.0, 119 GB unified memory
  • Driver 580.126.09
  • trueno-gpu 0.4.35

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions