Skip to content

Conversation

@sayakpaul
Copy link
Member

@sayakpaul sayakpaul commented Jan 28, 2026

What does this PR do?

Sadly the previous PR didn't quite fix it. I did locally build with the updated Dockerfiles on A100 and ran some tests, but nothing like this showed up there. I even tried to completely recreate the run as much as possible and still, nothing showed up. I guess it's one of those scenarios where things show up when you run them in the full roll.

The issue is that even when we specify CUDA 12.9, PyTorch fails to find any wheel for it because we only have 12.8 and 13.0. It was installing the 12.8 wheel and that is causing a mismatch in the CUDA runtime currently.

Edit: Changed to a different base image and specified thr 129 CUDA wheel as it was available.

@sayakpaul sayakpaul requested a review from DN6 January 28, 2026 12:57
@@ -1,4 +1,4 @@
FROM nvidia/cuda:12.9.0-runtime-ubuntu20.04
FROM nvidia/cuda:12.9.1-runtime-ubuntu22.04
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no wheel for ubuntu24.04?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sayakpaul sayakpaul requested a review from DN6 January 29, 2026 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants