Add CUDA configuration to firedrake-configure and github actions#4988
Add CUDA configuration to firedrake-configure and github actions#4988dsroberts wants to merge 7 commits intofiredrakeproject:mainfrom
Conversation
4c9aefa to
f3a4bb6
Compare
connorjward
left a comment
There was a problem hiding this comment.
Looks pretty good. Thanks!
I am sad at how firedrake-configure is getting more confusing, but it's very hard to avoid and certainly not your job to fix.
There was a problem hiding this comment.
Can there be something similar in push.yml? When we push we want to run all of the tests.
firedrake/utils.py
Outdated
| dev_type = dev.getDeviceType() | ||
| dev.destroy() | ||
| if dev_type not in _device_mat_type_map: | ||
| raise RuntimeError(f"Unknown device type: {dev_type} initialised by PETSc") |
There was a problem hiding this comment.
Can this be a custom UnrecognisedDeviceError (https://github.com/firedrakeproject/firedrake/blob/main/firedrake/exceptions.py)
There was a problem hiding this comment.
And maybe the error message should state what the valid options are.
scripts/firedrake-configure
Outdated
|
|
||
|
|
||
| SUPPORTED_PETSC_VERSION = "v3.24.5" | ||
| # SuperLU_DIST built via PETSc does not support CUDA 13 |
There was a problem hiding this comment.
Is there an issue to reference here? So we would know if it's safe to update at a later point
There was a problem hiding this comment.
The issue is here: https://gitlab.com/petsc/petsc/-/work_items/1878, and has been resolved in main, however, we have SUPPORTED_PETSC_VERSION = "v3.24.5", which does not contain this fix. If this were my policy to set, I'd be inclined to leave it until the supported PETSc version is bumped to >=3.25.0.
There was a problem hiding this comment.
Ah then this is safe to remove. The SUPPORTED_PETSC_VERSION is only a thing for Firedrake release. These changes are going into main which tests against PETSc main and we won't make a major release until 3.25.0 comes out.
scripts/firedrake-configure
Outdated
| CUDA_ARCH_MAP = { | ||
| "aarch64": "sbsa" | ||
| } | ||
| # Structure is ( deb_repo_filename, file_contents, GPG_key_URL ) |
There was a problem hiding this comment.
Is this still correct?
scripts/firedrake-configure
Outdated
| f"libcublas-dev-{cuda_ver_str}", | ||
| ) | ||
|
|
||
| LINUX_APT_PACKAGES = BASE_LINUX_APT_PACKAGES + PETSC_EXTRAS_LINUX_APT_PACKAGES |
There was a problem hiding this comment.
| LINUX_APT_PACKAGES = BASE_LINUX_APT_PACKAGES + PETSC_EXTRAS_LINUX_APT_PACKAGES | |
| LINUX_APT_NOGPU_PACKAGES = BASE_LINUX_APT_PACKAGES + PETSC_EXTRAS_LINUX_APT_PACKAGES |
.github/workflows/core.yml
Outdated
| PETSC_OPTIONS: -use_gpu_aware_mpi 0 | ||
| EXTRA_OPTIONS: -use_gpu_aware_mpi 0 |
There was a problem hiding this comment.
Moved EXTRA_OPTIONS to the appropriate step (PETSc make check). Scratch that, turns out PETSC_OPTIONS is a just in case. Because system MPI has no GPU-awareness, PETSc will crash out if there are any future tests that use GPU offloading in parallel without that option.firedrake-check needs that to pass. I'll put it back.
connorjward
left a comment
There was a problem hiding this comment.
I think this is very very close now.
| # 'make test_durations' inside a 'firedrake:latest' Docker image. | ||
| EXTRA_PYTEST_ARGS: --splitting-algorithm least_duration --timeout=600 --timeout-method=thread -o faulthandler_timeout=660 --durations-path=./firedrake-repo/tests/test_durations.json --durations=50 | ||
| PYTEST_MPI_MAX_NPROCS: 8 | ||
| PETSC_OPTIONS: -use_gpu_aware_mpi 0 |
There was a problem hiding this comment.
Needs a comment explaining what this does
firedrake/utils.py
Outdated
|
|
||
|
|
||
| @cache | ||
| def device_matrix_type(warn: bool = False) -> str | None: |
There was a problem hiding this comment.
I feel that this should probably default to True. We want users to see the warning by default and only disable if they are confident.
scripts/firedrake-configure
Outdated
| }, | ||
| } | ||
|
|
||
| PETSC_EXTERNAL_PACKAGE_SPECS: PetscSpecsDictType = ( |
There was a problem hiding this comment.
| PETSC_EXTERNAL_PACKAGE_SPECS: PetscSpecsDictType = ( | |
| PETSC_EXTERNAL_PACKAGE_SPECS_NOGPU: PetscSpecsDictType = ( |
Description
This PR contains the build and CI component of #4953. It also contains the
skipnogpupytest marker. Tagged withmacOSto ensure that changes tofiredrake-configurehave not broken the macOS build.