PrismAudio local inference OOM: VideoPrism JAX model attempts to allocate 95GB compute buffer on single 16GB GPU (RTX 5060 Ti)

System: RTX 5060 Ti (16GB VRAM), 48GB RAM, WSL2 Ubuntu, CUDA 13.2
Issue: Running demo.sh locally fails during feature extraction with JAX OOM error:
RESOURCE_EXHAUSTED: Out of memory while trying to allocate 95651102720 bytes.
The byte size of input/output arguments (95950995456) exceeds the base limit (13682291507).
The VideoPrism JAX model attempts to allocate ~95GB on a 16GB GPU. JAX itself flags this as a computation error before even attempting allocation. This makes local single-GPU inference impossible on consumer hardware.
The HuggingFace Space works fine, so the issue is specific to the local inference pipeline — likely the model is compiled/batched for multi-GPU server setups without a single-GPU fallback.
Would appreciate any guidance on running this locally on a single consumer GPU.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PrismAudio local inference OOM: VideoPrism JAX model attempts to allocate 95GB compute buffer on single 16GB GPU (RTX 5060 Ti) #58

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PrismAudio local inference OOM: VideoPrism JAX model attempts to allocate 95GB compute buffer on single 16GB GPU (RTX 5060 Ti) #58

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions