This PR updates the TorchElastic Lab configuration to leverage multi-GPU nodes more effectively by allocating 4 GPUs per worker pod instead of 1. This change enables more efficient distributed training on modern GPU clusters. #1
Enhance your code review process with GitHub Actions
GitHub Actions make it easy to automate all your software workflows, now with world-class CI/CD.
Build, test, and deploy your code right from GitHub. Learn more about GitHub Actions.