Hi, thanks for sharing this interesting work and making the code available. I'm trying to reproduce the results reported in the paper, but I've encountered several issues that make it difficult to achieve the claimed performance. I'd appreciate your clarification on the following points:
The paper states:
"All the experiments are implemented with the PyTorch platform and trained/tested on 4 NVIDIA A100 GPUs."*
However, the current codebase does not appear to fully support multi-GPU training:
- The
TODO list includes an unchecked item: "Fix bugs in Multi-GPU parallel", suggesting known issues in distributed training.
- The training script (
train.py) uses CUDA_VISIBLE_DEVICES and single-process execution, but does not use torch.distributed or DistributedDataParallel (DDP). This limits training to single-GPU or inefficient DataParallel mode.
- There is no use of
local_rank, DistributedSampler, or proper process group initialization.
Could you clarify:
- Were the reported results indeed obtained using 4 A100 GPUs in a distributed setting?
- If so, was a different (internal) version of the code used? If yes, could you release the fixed version or provide guidance on how to properly enable multi-GPU training?
Without a working multi-GPU setup, it's challenging to train at the scale described in the paper, especially for 3D medical data.
Hi, thanks for sharing this interesting work and making the code available. I'm trying to reproduce the results reported in the paper, but I've encountered several issues that make it difficult to achieve the claimed performance. I'd appreciate your clarification on the following points:
The paper states:
However, the current codebase does not appear to fully support multi-GPU training:
TODOlist includes an unchecked item: "Fix bugs in Multi-GPU parallel", suggesting known issues in distributed training.train.py) usesCUDA_VISIBLE_DEVICESand single-process execution, but does not usetorch.distributedorDistributedDataParallel(DDP). This limits training to single-GPU or inefficientDataParallelmode.local_rank,DistributedSampler, or proper process group initialization.Could you clarify:
Without a working multi-GPU setup, it's challenging to train at the scale described in the paper, especially for 3D medical data.