Hello,
I am attempting to run TeleStyle on RunPod, but I consistently encounter CUDA out of memory errors — even when using GPUs with large VRAM.
I would like clarification on the recommended minimum GPU memory for stable inference.
Environment
- Container:
runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04
- Python: 3.11
- CUDA: 12.4
- GPUs tested:
- 24GB VRAM → OOM
- 48GB VRAM → still OOM
What I Tried
- Installed dependencies via
requirements.txt
- Ran the default pipeline (no major modifications)
- Tested on larger GPU assuming it would be sufficient
Despite this, memory is exhausted during execution.
Questions
-
What is the minimum VRAM requirement to run TeleStyle reliably?
-
Is the project designed for:
- ≥80GB GPUs (A100/H100 class)?
- Multi-GPU setups?
-
Are there recommended flags for reduced memory usage such as:
- fp16 / bf16
- model offloading
- gradient checkpointing
- lower resolution
- smaller batch sizes
-
Is there an example configuration for running on ≤48GB VRAM?