Port stress/response calculations to the GPU by abussy · Pull Request #1187 · JuliaMolSim/DFTK.jl

abussy · 2025-11-07T12:43:16Z

This PR enables ForwardDiff calculations (stress and response) on the GPU. Main changes are:

Data transfer from/to the device where necessary
Various small changes to avoid GPU compiler confusion (e.g. see changes in src/workarounds/forwarddiff_rules.jl)
CPU fall-backs for all XC operations taking place in the DftFunctionals.jl package
Refactoring of the ForwardDiff tests, such that all tests can be run on various architectures (CPU, CUDA, AMDGPU)

With this PR, all ForwardDiff workflows currently tested on the CPU successfully run on both NVIDIA and AMD GPUs.

Future improvements will come with:

PR Increase GPU robustness of LOBPCG #1185 for the tests to consistently finish on GPUs (right now, they regularly fail due to Cholesky instability on the GPU)
PR Make DftFunctionals.jl types GPU compatible DftFunctionals.jl#23 to pave the way for XC operations on the GPU
PR Port AtomicLocal integration to GPU #1163 for more efficient PlaneWaveBasis instantiation

abussy · 2025-11-12T16:36:56Z

Merged master. Adapted tests to the refactoring brought by #1182.

Additionally, removed this problematic bit of code in ext/DFTKAMDGPUExt.jl:

# Enable comparisons of Duals on AMD GPUs
_val(x) = x
_val(x::Dual) = _val(ForwardDiff.value(x))
function Base.:<(x::Dual{T,V,N},
                 y::Dual{T,V,N}) where {T,V,N}
    _val(x) < _val(y)
end
function Base.:>(x::Dual{T,V,N},
                 y::Dual{T,V,N}) where {T,V,N}
    _val(x) > _val(y)
end

It turns out that comparison of Duals does not take place on the GPU, as long as all XC operations are done on the CPU. This might become a concern again in the future, once DftFunctionals.jl is refactored.

mfherbst · 2025-11-12T19:59:46Z

+    copyto!(y, _mul(p, x))
 end
-function Base.:*(p::AbstractFFTs.Plan, x::AbstractArray{<:Complex{<:Dual{Tg}}}) where {Tg}
+function _mul(p::AbstractFFTs.Plan, x::AbstractArray{<:Complex{<:Dual{Tg}}}) where {Tg}


Again this feels strange and is surprising to me. Why did you need this ?

Without this workaround, the GPU compiler throws an invalid LLVM IR error during stress calculations. I think there is confusion around which method of Base.:* to use, but I don't understand why.

Ok, this we need to understand.

@niklasschmitz I recall we anyway only needed this because on the AbstractFFT side this was not properly supported. Could it be that now it is and we can drop our type piracy workaround alltogether ?

I made progress there by properly reading the error message, and it turns out that a more specific definition of Base.:* in cufft takes priority: https://github.com/JuliaGPU/CUDA.jl/blob/44cde93bf03812012da5c883b6532d80a5226268/lib/cufft/fft.jl#L359-L377. While I understand the problem now, I don't see a better way to deal with it than the current solution. Any help/suggestion is welcome.

CUDA.jl overloads Base.:* for more specific types than AbstractFFTs.Plan and AbstractArray, but it breaks with (complex) Duals.

So it's a bug in CUDA.jl, effectively ? Their typing is too broad as it covers Duals, which they don't support ?

That's my understanding, yes.

Once there is an issue opened and referenced here (please @mention me) this is fine.

abussy · 2026-01-07T15:35:45Z

I reorganized the ForwardDiff tests:

To avoid having a single gigantic file, I split the tests into 3 separate files based on categories. forwarddiff_geometry.jl contains tests based on perturbation of geometry/symmetry. forwarddiff_parameters.jl contains tests on variation of model parameters. forwarddiff_generic.jl contains small generic FD tests. Thanks to @niklasschmitz for helping with the categories.
The tests now follow the logic of the various silicon_*.jl files: A @testmodule defines a test function, that is then called with various parameters in different @testitem. In this case, it allows running the same test on CPU and GPU with minimal code duplication. The test definition and calls are now in the same location too.

I additionally addressed the various concerns of @Technici4n on comments in src/terms/xc.jl and src/terms/local_nonlinearity.jl. Finally, I also added a couple of GPU tests for stress calculations.

I believe the last remaining issue is the overload of the Base.:* operators, i.e.:

function Base.:*(p::AbstractFFTs.Plan, x::AbstractArray{<:Complex{<:Dual{Tg}}}) where {Tg}

which fails to compile with CUDA.

I made progress there by properly reading the error message, and it turns out that a more specific definition of Base.:* in cufft takes priority: https://github.com/JuliaGPU/CUDA.jl/blob/44cde93bf03812012da5c883b6532d80a5226268/lib/cufft/fft.jl#L359-L377. While I understand the problem now, I don't see a better way to deal with it than the current solution. Any help/suggestion is welcome.

mfherbst

Very nice refactoring !

mfherbst · 2026-01-08T07:55:25Z

+    copyto!(y, _mul(p, x))
 end
-function Base.:*(p::AbstractFFTs.Plan, x::AbstractArray{<:Complex{<:Dual{Tg}}}) where {Tg}
+function _mul(p::AbstractFFTs.Plan, x::AbstractArray{<:Complex{<:Dual{Tg}}}) where {Tg}


Ok, this we need to understand.

@niklasschmitz I recall we anyway only needed this because on the AbstractFFT side this was not properly supported. Could it be that now it is and we can drop our type piracy workaround alltogether ?

abussy · 2026-01-09T14:44:31Z

Addressed review comments on the FD tests, and moved them to their own subfolder.
Addressed src/workarounds/forwarddiff_rules.jl comments.
Rebased on top of current master for merge compatibility

mfherbst · 2026-01-13T08:52:10Z

+    copyto!(y, _mul(p, x))
 end
-function Base.:*(p::AbstractFFTs.Plan, x::AbstractArray{<:Complex{<:Dual{Tg}}}) where {Tg}
+function _mul(p::AbstractFFTs.Plan, x::AbstractArray{<:Complex{<:Dual{Tg}}}) where {Tg}


Once there is an issue opened and referenced here (please @mention me) this is fine.

mfherbst

Final nits before we merge.

abussy mentioned this pull request Nov 11, 2025

Ensure contiguous occupations #1189

Merged

abussy force-pushed the stress_gpu branch from 537eacd to bc593e0 Compare November 12, 2025 16:31

mfherbst reviewed Nov 12, 2025

View reviewed changes

Technici4n reviewed Dec 2, 2025

View reviewed changes

Comment thread src/terms/local_nonlinearity.jl Outdated

Comment thread test/forwarddiff_gpu.jl Outdated

Technici4n reviewed Dec 3, 2025

View reviewed changes

Comment thread src/terms/xc.jl Outdated

mfherbst reviewed Jan 8, 2026

View reviewed changes

abussy and others added 6 commits January 9, 2026 15:18

Port stress/response calculations to the GPU

5168fdb

Test refactoring due to master merge

0e1597f

Clean-up gpu_arrays.jl + use clearer name for parametreized Dual tags

25d26f7

Clean up DFTKAMDGPUExt.jl

f881486

Reorganize ForwardDiff tests

10b206b

Move FD tests to subfolder

033c3f3

abussy force-pushed the stress_gpu branch from 78d21b3 to 033c3f3 Compare January 9, 2026 14:38

mfherbst reviewed Jan 13, 2026

View reviewed changes

Cleaner overload of Base.:*

5bf18c3

mfherbst reviewed Jan 14, 2026

View reviewed changes

Comment thread src/workarounds/forwarddiff_rules.jl Outdated

abussy and others added 2 commits January 14, 2026 12:18

dual_fft --> dual_fft_mul

c753c5f

Merge branch 'master' into stress_gpu

8300e55

mfherbst enabled auto-merge (squash) January 14, 2026 11:37

mfherbst disabled auto-merge January 14, 2026 14:42

mfherbst merged commit d8d425a into JuliaMolSim:master Jan 14, 2026
7 of 10 checks passed

Conversation

abussy commented Nov 7, 2025

Uh oh!

abussy commented Nov 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abussy commented Jan 7, 2026

Uh oh!

mfherbst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abussy commented Jan 9, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mfherbst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants