Skip to content

{2025.06}[foss-2024a] Siesta 5.4.2 CUDA 12.6.0#1506

Open
AnthoniAlcaraz wants to merge 2 commits into
EESSI:mainfrom
AnthoniAlcaraz:Siesta-5.4.2-foss-2024a-CUDA-12.6.0
Open

{2025.06}[foss-2024a] Siesta 5.4.2 CUDA 12.6.0#1506
AnthoniAlcaraz wants to merge 2 commits into
EESSI:mainfrom
AnthoniAlcaraz:Siesta-5.4.2-foss-2024a-CUDA-12.6.0

Conversation

@AnthoniAlcaraz
Copy link
Copy Markdown

Adding the latest SIESTA version with CUDA

@boegel
Copy link
Copy Markdown
Contributor

boegel commented May 20, 2026

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90

@boegel boegel added the 2025.06-software.eessi.io 2025.06 version of software.eessi.io label May 20, 2026
@eessi-bot-surf
Copy link
Copy Markdown

eessi-bot-surf Bot commented May 20, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: amd-zen4 and accelerator nvidia/cc90
Building for: x86_64/amd/zen4 and accelerator nvidia/cc90
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.05/pr_1506/22960545

date job status comment
May 20 09:49:46 UTC 2026 submitted job id 22960545 will be eligible to start in about 20 seconds
May 20 09:49:57 UTC 2026 received job awaits launch by Slurm scheduler
May 20 09:52:31 UTC 2026 running job 22960545 is running
May 20 11:27:30 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-22960545.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen4-accel-nvidia-cc90-17792763000.tar.zstsize: 0 MiB (22 bytes)
entries: 0
modules under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
no software packages in tarball
reprod directories under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/reprod
no reprod directories in tarball
other under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
no other files in tarball
May 20 11:27:30 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] ( 1/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_4_node %device_type=gpu /15d6e239 @BotBuildTests:gpu_h100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 2/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_4_node %device_type=gpu /5471f15a @BotBuildTests:gpu_h100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 3/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /526cd259 @BotBuildTests:gpu_h100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 4/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_4_node %device_type=gpu /1dc400ef @BotBuildTests:gpu_h100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 5/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_4_node %device_type=gpu /9715dde6 @BotBuildTests:gpu_h100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 6/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /416eaee1 @BotBuildTests:gpu_h100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 7/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_4_node /ed938ed4 @BotBuildTests:gpu_h100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] ( 8/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_4_node /8d24cea9 @BotBuildTests:gpu_h100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] ( 9/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /73a202f1 @BotBuildTests:gpu_h100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (10/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_4_node /946648aa @BotBuildTests:gpu_h100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (11/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_4_node /9eb3f1e9 @BotBuildTests:gpu_h100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (12/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /7f04eb2b @BotBuildTests:gpu_h100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ PASSED ] Ran 0/12 test case(s) from 12 check(s) (0 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-22960545.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@boegel
Copy link
Copy Markdown
Contributor

boegel commented May 20, 2026

test step is failing:

== 2026-05-20 12:02:04,633 run.py:526 INFO Running 'OMP_NUM_THREADS=4 ...' shell command in /tmp/eessibot/easybuild/build/Siesta/5.4.2/foss-2024a-CUDA-12.6.0/easybuild_obj:
	OMP_NUM_THREADS=4 ctest --no-tests=error --output-on-failure -E 'pi3_Runs_Singleton|SpinPolarization-fe_spin_mpi4_omp|SpinPolarization-fe_spin_directphi_mpi4_omp1|SpinPolarization-fe_noncol_gga_mpi4_omp1|SpinPolarization-fe_noncol_kp_mpi4_omp1|SpinPolarization-fe_noncol_sp_mpi4_omp1|SpinOrbit-FePt-X-X_mpi4_omp1|SCFMixing-chargemix_mpi4_omp1|SCFMixing-pulay_mpi4_omp1'
== 2026-05-20 13:24:49,997 run.py:646 WARNING 'OMP_NUM_THREADS=4 ...' shell command FAILED (exit code -9)
== 2026-05-20 13:24:49,997 run.py:647 INFO Output of 'OMP_NUM_THREADS=4 ...' shell command (stdout + stderr):
Test project /tmp/eessibot/easybuild/build/Siesta/5.4.2/foss-2024a-CUDA-12.6.0/easybuild_obj
        Start   1: w90-driver_mpi

== 2026-05-20 13:24:50,097 easyblock.py:3127 WARNING shell command 'OMP_NUM_THREADS=4 ctest --no-tests=error --output-on-failure -E 'pi3_Runs_Singleton|SpinPolarization-fe_spin_mpi4_omp|SpinPolarization-fe_spin_directphi_mpi4_omp1|SpinPolarization-fe_noncol_gga_mpi4_omp1|SpinPolarization-fe_noncol_kp_mpi4_omp1|SpinPolarization-fe_noncol_sp_mpi4_omp1|SpinOrbit-FePt-X-X_mpi4_omp1|SCFMixing-chargemix_mpi4_omp1|SCFMixing-pulay_mpi4_omp1'' failed in test step for Siesta-5.4.2-foss-2024a-CUDA-12.6.0.eb

It's not fully clear to me why though, looks like it could be a segfault (-9 exit code)...

@casparvl Any ideas?

@boegel
Copy link
Copy Markdown
Contributor

boegel commented May 21, 2026

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-jsc for:arch=aarch64/nvidia/grace,accel=nvidia/cc90

@eessi-bot-jsc
Copy link
Copy Markdown

eessi-bot-jsc Bot commented May 21, 2026

New job on instance eessi-bot-jsc for repository eessi.io-2025.06-software
Building on: nvidia-grace and accelerator nvidia/cc90
Building for: aarch64/nvidia/grace and accelerator nvidia/cc90
Job dir: /p/project1/ceasybuilders/eessibot/jobs/2026.05/pr_1506/14778740

date job status comment
May 21 10:57:49 UTC 2026 submitted job id 14778740 awaits release by job manager
May 21 10:57:58 UTC 2026 released job awaits launch by Slurm scheduler
May 21 14:33:38 UTC 2026 running job 14778740 is running
May 21 16:01:10 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-14778740.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-aarch64-nvidia-grace-accel-nvidia-cc90-17793787960.tar.gzsize: 63 MiB (66131995 bytes)
entries: 311
modules under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90/modules/all
Siesta/5.4.2-foss-2024a-CUDA-12.6.0.lua
software under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90/software
Siesta/5.4.2-foss-2024a-CUDA-12.6.0
reprod directories under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90/reprod
Siesta/5.4.2-foss-2024a-CUDA-12.6.0/20260521_155126UTC
other under 2025.06/software/linux/aarch64/nvidia/grace/accel/nvidia/cc90
no other files in tarball
May 21 16:01:10 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite produced failures.
ReFrame Summary
[ FAILED ] Ran 17/29 test case(s) from 29 check(s) (4 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-14778740.out
❌ found message matching ERROR:
❌ found message matching [\s*FAILED\s*].*Ran .* test case

@TopRichard
Copy link
Copy Markdown
Collaborator

TopRichard commented May 21, 2026

I can see the same error on local build for aarch64/nvidia/grace/cc90

== 2026-05-20 17:58:26,902 run.py:526 INFO Running 'OMP_NUM_THREADS=4 ...' shell command in /tmp/EESSI/easybuild/build/Siesta/5.4.2/foss-2024a-CUDA-12.6.0/easybuild_obj:
        OMP_NUM_THREADS=4 ctest --no-tests=error --output-on-failure -E 'pi3_Runs_Singleton|SpinPolarization-fe_spin_mpi4_omp|SpinPolarization-fe_spin_directphi_mpi4_omp1|SpinPolarization-fe_noncol_gga_mpi4_omp1|SpinPolarization-fe_noncol_kp_mpi4_omp1|SpinPolarization-fe_noncol_sp_mpi4_omp1|SpinOrbit-FePt-X-X_mpi4_omp1|SCFMixing-chargemix_mpi4_omp1|SCFMixing-pulay_mpi4_omp1'
== 2026-05-20 17:58:44,775 run.py:646 WARNING 'OMP_NUM_THREADS=4 ...' shell command FAILED (exit code 8)
== 2026-05-20 17:58:44,775 run.py:647 INFO Output of 'OMP_NUM_THREADS=4 ...' shell command (stdout + stderr):
Test project /tmp/EESSI/easybuild/build/Siesta/5.4.2/foss-2024a-CUDA-12.6.0/easybuild_obj
        Start   1: w90-driver_mpi
  1/699 Test   #1: w90-driver_mpi ................................................................***Failed    0.11 sec

@boegel
Copy link
Copy Markdown
Contributor

boegel commented May 21, 2026

I can see the same error on local build for aarch64/nvidia/grace/cc90

== 2026-05-20 17:58:26,902 run.py:526 INFO Running 'OMP_NUM_THREADS=4 ...' shell command in /tmp/EESSI/easybuild/build/Siesta/5.4.2/foss-2024a-CUDA-12.6.0/easybuild_obj:
        OMP_NUM_THREADS=4 ctest --no-tests=error --output-on-failure -E 'pi3_Runs_Singleton|SpinPolarization-fe_spin_mpi4_omp|SpinPolarization-fe_spin_directphi_mpi4_omp1|SpinPolarization-fe_noncol_gga_mpi4_omp1|SpinPolarization-fe_noncol_kp_mpi4_omp1|SpinPolarization-fe_noncol_sp_mpi4_omp1|SpinOrbit-FePt-X-X_mpi4_omp1|SCFMixing-chargemix_mpi4_omp1|SCFMixing-pulay_mpi4_omp1'
== 2026-05-20 17:58:44,775 run.py:646 WARNING 'OMP_NUM_THREADS=4 ...' shell command FAILED (exit code 8)
== 2026-05-20 17:58:44,775 run.py:647 INFO Output of 'OMP_NUM_THREADS=4 ...' shell command (stdout + stderr):
Test project /tmp/EESSI/easybuild/build/Siesta/5.4.2/foss-2024a-CUDA-12.6.0/easybuild_obj
        Start   1: w90-driver_mpi
  1/699 Test   #1: w90-driver_mpi ................................................................***Failed    0.11 sec

@TopRichard Is that manually, with (just) EESSI-extend ?

I also tried on a Grace Hopper node manually with EESSI-extend (on JURECA), and I saw other problems, like:

/local/scratch/hoste1/easybuild/Siesta/5.4.2/foss-2024a-CUDA-12.6.0/siesta-5.4.2/External/ELSI-project/elsi_interface/external/ELPA/ELPA-2020.05.001/src/cudaFunctions.cu:69 Error in cudaGetDeviceCount: CUDA driver version is insufficient for CUDA runtime version

The w90-driver_mpi was working fine, but it seems like all GPU tests (like siesta-12.Solvers-si-qdot-elsi-elpa-gpu_mpi4_omp1) aren't working.
Could be an issue with the GPU drivers being exposed to EESSI though, that's why I triggered the build bot at JURECA to also do a test install...

@TopRichard
Copy link
Copy Markdown
Collaborator

I can see the same error on local build for aarch64/nvidia/grace/cc90

== 2026-05-20 17:58:26,902 run.py:526 INFO Running 'OMP_NUM_THREADS=4 ...' shell command in /tmp/EESSI/easybuild/build/Siesta/5.4.2/foss-2024a-CUDA-12.6.0/easybuild_obj:
        OMP_NUM_THREADS=4 ctest --no-tests=error --output-on-failure -E 'pi3_Runs_Singleton|SpinPolarization-fe_spin_mpi4_omp|SpinPolarization-fe_spin_directphi_mpi4_omp1|SpinPolarization-fe_noncol_gga_mpi4_omp1|SpinPolarization-fe_noncol_kp_mpi4_omp1|SpinPolarization-fe_noncol_sp_mpi4_omp1|SpinOrbit-FePt-X-X_mpi4_omp1|SCFMixing-chargemix_mpi4_omp1|SCFMixing-pulay_mpi4_omp1'
== 2026-05-20 17:58:44,775 run.py:646 WARNING 'OMP_NUM_THREADS=4 ...' shell command FAILED (exit code 8)
== 2026-05-20 17:58:44,775 run.py:647 INFO Output of 'OMP_NUM_THREADS=4 ...' shell command (stdout + stderr):
Test project /tmp/EESSI/easybuild/build/Siesta/5.4.2/foss-2024a-CUDA-12.6.0/easybuild_obj
        Start   1: w90-driver_mpi
  1/699 Test   #1: w90-driver_mpi ................................................................***Failed    0.11 sec

@TopRichard Is that manually, with (just) EESSI-extend ?

I also tried on a Grace Hopper node manually with EESSI-extend (on JURECA), and I saw other problems, like:

/local/scratch/hoste1/easybuild/Siesta/5.4.2/foss-2024a-CUDA-12.6.0/siesta-5.4.2/External/ELSI-project/elsi_interface/external/ELPA/ELPA-2020.05.001/src/cudaFunctions.cu:69 Error in cudaGetDeviceCount: CUDA driver version is insufficient for CUDA runtime version

The w90-driver_mpi was working fine, but it seems like all GPU tests (like siesta-12.Solvers-si-qdot-elsi-elpa-gpu_mpi4_omp1) aren't working. Could be an issue with the GPU drivers being exposed to EESSI though, that's why I triggered the build bot at JURECA to also do a test install...

That is manually using the eessi_container.sh but without prefix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2025.06-software.eessi.io 2025.06 version of software.eessi.io

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants