Directly check the location of libcuda.so where the linker expects to find it#185
Directly check the location of libcuda.so where the linker expects to find it#185ocaisa wants to merge 1 commit intoEESSI:mainfrom
libcuda.so where the linker expects to find it#185Conversation
|
Tested this on Vega, where the drivers for 2023.06 are available but not for [eualano@gn06 ~]$ source /cvmfs/software.eessi.io/versions/2025.06/init/lmod/bash
Modules purged before initialising EESSI
Module for EESSI/2025.06 loaded successfully
EESSI has selected x86_64/amd/zen2 as the compatible CPU target for EESSI/2025.06
EESSI has selected accel/nvidia/cc80 as the compatible accelerator target for EESSI/2025.06
(for debug information when loading the EESSI module, set the environment variable EESSI_MODULE_DEBUG_INIT)
# Without the PR it (incorrectly) loads the module
{EESSI/2025.06} [eualano@gn06 ~]$ module load OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0
# Enable the PR
{EESSI/2025.06} [eualano@gn06 ~]$ export LMOD_PACKAGE_PATH=$PWD/software-layer-scripts/generate/.lmod
{EESSI/2025.06} [eualano@gn06 ~]$ module load OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0
Lmod has detected the following error:
You requested to load UCX-CUDA which relies on the CUDA runtime environment and driver libraries. In order to be able to use the module, you will need to make sure EESSI can find
the GPU driver libraries on your host system. The file being checked for on your system is
/cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/nvidia/libcuda.so
You can override this check by setting the environment variable EESSI_OVERRIDE_GPU_CHECK but the loaded application will not be able to execute on your system.
For more information on how to do this, see https://www.eessi.io/docs/site_specific_config/gpu/.
While processing the following module(s):
...
{EESSI/2025.06} [eualano@gn06 ~]$ module purge
# This resets LMOD_PACKAGE_PATH
[eualano@gn06 ~]$ module load EESSI/2023.06
Module for EESSI/2023.06 loaded successfully
{EESSI/2023.06} [eualano@gn06 ~]$ module load OSU-Micro-Benchmarks/7.5-gompi-2023b-CUDA-12.4.0
# Still works with the PR enabled
{EESSI/2023.06} [eualano@gn06 ~]$ export LMOD_PACKAGE_PATH=$PWD/software-layer-scripts/generate/.lmod
{EESSI/2023.06} [eualano@gn06 ~]$ module load OSU-Micro-Benchmarks/7.5-gompi-2023b-CUDA-12.4.0
{EESSI/2023.06} [eualano@gn06 ~]$ |
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-deucalion for:arch=aarch64/a64fx |
|
New job on instance
|
|
New job on instance
|
|
The site package is specific per arch, so we need to do a lot of building here... bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=x86_64/amd/zen2 |
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws for:arch=x86_64/amd/zen2 |
|
New job on instance
|
|
bot:status last_build |
|
This is the status of all the
|
6 similar comments
|
This is the status of all the
|
|
This is the status of all the
|
|
This is the status of all the
|
|
This is the status of all the
|
|
This is the status of all the
|
|
This is the status of all the
|
Fixes #184