Skip to content

Getting Numerical Instability in LLaVA inferece #78

@ss8319

Description

@ss8319

Hi, I am running inference with the code based off 'Quickstart with HuggingFace'

Setup
torch dtype: auto (bfloat16)
GPU: NVIDIA RTX 6000
CUDA: 12.8

Expected Behavior
Model should generate text normally without CUDA asserts.

Actual Behavior
Generation crashes with device-side assert shortly after starting. Message indicates probability tensor contains invalid values (inf/nan < 0), pointing to sampling instability in generate().

Can u please advise on recommended generation settings (e.g., forcing torch_dtype=torch.bfloat16, disabling sampling, setting do_sample=False, using torch.inference_mode(), or any required preprocessing).

Error Trace
(LLaVA-OneVision-1.5) sxxx@yyyyy:~/LLaVA-OneVision-1.5$ python ds/inference_hf.py torch_dtypeis deprecated! Usedtypeinstead! Loading checkpoint shards: 100%|██████████████████████████| 2/2 [00:01<00:00, 1.42it/s] The image processor of typeQwen2VLImageProcessoris now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class withuse_fast=False. Note that this behavior will be extended to all models in a future release. /pytorch/aten/src/ATen/native/cuda/TensorCompare.cu:112: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion probability tensor contains either inf, nan or element < 0failed. Traceback (most recent call last): File "/mnt/hdd/sda/samus/LLaVA-OneVision-1.5/ds/inference_hf.py", line 41, in <module> generated_ids = model.generate(**inputs, max_new_tokens=1024) File "/mnt/hdd/sda/samus/LLaVA-OneVision-1.5/.venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context return func(*args, **kwargs) File "/mnt/hdd/sda/samus/LLaVA-OneVision-1.5/.venv/lib/python3.9/site-packages/transformers/generation/utils.py", line 2564, in generate result = decoding_method( File "/mnt/hdd/sda/samus/LLaVA-OneVision-1.5/.venv/lib/python3.9/site-packages/transformers/generation/utils.py", line 2779, in _sample while self._has_unfinished_sequences(this_peer_finished, synced_gpus, device=input_ids.device): File "/mnt/hdd/sda/samus/LLaVA-OneVision-1.5/.venv/lib/python3.9/site-packages/transformers/generation/utils.py", line 2597, in _has_unfinished_sequences elif this_peer_finished: torch.AcceleratorError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile withTORCH_USE_CUDA_DSA to enable device-side assertions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions