Skip to content

Add CUDA configuration to firedrake-configure and github actions#4988

Open
dsroberts wants to merge 7 commits intofiredrakeproject:mainfrom
dsroberts:dsroberts/firedrake-cuda-build
Open

Add CUDA configuration to firedrake-configure and github actions#4988
dsroberts wants to merge 7 commits intofiredrakeproject:mainfrom
dsroberts:dsroberts/firedrake-cuda-build

Conversation

@dsroberts
Copy link
Copy Markdown
Contributor

Description

This PR contains the build and CI component of #4953. It also contains the skipnogpu pytest marker. Tagged with macOS to ensure that changes to firedrake-configure have not broken the macOS build.

@dsroberts dsroberts added macOS gpu For special runner labels Mar 24, 2026
@dsroberts dsroberts force-pushed the dsroberts/firedrake-cuda-build branch from 4c9aefa to f3a4bb6 Compare March 25, 2026 00:54
@dsroberts dsroberts marked this pull request as ready for review March 25, 2026 03:01
Copy link
Copy Markdown
Contributor

@connorjward connorjward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. Thanks!

I am sad at how firedrake-configure is getting more confusing, but it's very hard to avoid and certainly not your job to fix.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be something similar in push.yml? When we push we want to run all of the tests.

dev_type = dev.getDeviceType()
dev.destroy()
if dev_type not in _device_mat_type_map:
raise RuntimeError(f"Unknown device type: {dev_type} initialised by PETSc")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe the error message should state what the valid options are.



SUPPORTED_PETSC_VERSION = "v3.24.5"
# SuperLU_DIST built via PETSc does not support CUDA 13
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an issue to reference here? So we would know if it's safe to update at a later point

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is here: https://gitlab.com/petsc/petsc/-/work_items/1878, and has been resolved in main, however, we have SUPPORTED_PETSC_VERSION = "v3.24.5", which does not contain this fix. If this were my policy to set, I'd be inclined to leave it until the supported PETSc version is bumped to >=3.25.0.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah then this is safe to remove. The SUPPORTED_PETSC_VERSION is only a thing for Firedrake release. These changes are going into main which tests against PETSc main and we won't make a major release until 3.25.0 comes out.

CUDA_ARCH_MAP = {
"aarch64": "sbsa"
}
# Structure is ( deb_repo_filename, file_contents, GPG_key_URL )
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still correct?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, fixed.

f"libcublas-dev-{cuda_ver_str}",
)

LINUX_APT_PACKAGES = BASE_LINUX_APT_PACKAGES + PETSC_EXTRAS_LINUX_APT_PACKAGES
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LINUX_APT_PACKAGES = BASE_LINUX_APT_PACKAGES + PETSC_EXTRAS_LINUX_APT_PACKAGES
LINUX_APT_NOGPU_PACKAGES = BASE_LINUX_APT_PACKAGES + PETSC_EXTRAS_LINUX_APT_PACKAGES

Comment on lines +496 to +497
PETSC_OPTIONS: -use_gpu_aware_mpi 0
EXTRA_OPTIONS: -use_gpu_aware_mpi 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are these used?

Copy link
Copy Markdown
Contributor Author

@dsroberts dsroberts Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved EXTRA_OPTIONS to the appropriate step (PETSc make check). PETSC_OPTIONS is a just in case. Because system MPI has no GPU-awareness, PETSc will crash out if there are any future tests that use GPU offloading in parallel without that option. Scratch that, turns out firedrake-check needs that to pass. I'll put it back.

Copy link
Copy Markdown
Contributor

@connorjward connorjward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is very very close now.

# 'make test_durations' inside a 'firedrake:latest' Docker image.
EXTRA_PYTEST_ARGS: --splitting-algorithm least_duration --timeout=600 --timeout-method=thread -o faulthandler_timeout=660 --durations-path=./firedrake-repo/tests/test_durations.json --durations=50
PYTEST_MPI_MAX_NPROCS: 8
PETSC_OPTIONS: -use_gpu_aware_mpi 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a comment explaining what this does



@cache
def device_matrix_type(warn: bool = False) -> str | None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that this should probably default to True. We want users to see the warning by default and only disable if they are confident.

},
}

PETSC_EXTERNAL_PACKAGE_SPECS: PetscSpecsDictType = (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PETSC_EXTERNAL_PACKAGE_SPECS: PetscSpecsDictType = (
PETSC_EXTERNAL_PACKAGE_SPECS_NOGPU: PetscSpecsDictType = (

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gpu For special runner macOS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants