Skip to content

PrismAudio local inference OOM: VideoPrism JAX model attempts to allocate 95GB compute buffer on single 16GB GPU (RTX 5060 Ti) #58

@iad96

Description

@iad96

System: RTX 5060 Ti (16GB VRAM), 48GB RAM, WSL2 Ubuntu, CUDA 13.2
Issue: Running demo.sh locally fails during feature extraction with JAX OOM error:
RESOURCE_EXHAUSTED: Out of memory while trying to allocate 95651102720 bytes.
The byte size of input/output arguments (95950995456) exceeds the base limit (13682291507).
The VideoPrism JAX model attempts to allocate ~95GB on a 16GB GPU. JAX itself flags this as a computation error before even attempting allocation. This makes local single-GPU inference impossible on consumer hardware.
The HuggingFace Space works fine, so the issue is specific to the local inference pipeline — likely the model is compiled/batched for multi-GPU server setups without a single-GPU fallback.
Would appreciate any guidance on running this locally on a single consumer GPU.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions