feat(gint): enable mixed-precision (fp32/fp64) support for GPU path by dzzz2001 · Pull Request #7207 · deepmodeling/abacus-develop

dzzz2001 · 2026-04-03T07:18:30Z

Reminder

Have you linked an issue with this pull request?
Have you added adequate unit tests and/or case tests for your pull request?
Have you noticed possible changes of behavior below or in the linked issue?
Have you explained the changes of codes in core modules of ESolver, HSolver, ElecState, Hamilt, Operator or Psi? (ignore if not applicable)

Linked Issue

Follow-up to #7149 (CPU mixed-precision gint support)

Unit Tests and/or Case Tests for my changes

Tested with 10 LCAO benchmark cases comparing gint_precision = double vs gint_precision = mix on GPU.

What's changed?

This PR extends the mixed-precision grid integration (gint_precision = mix/single) support from CPU-only to GPU. The key changes include:

GPU Kernel Templating

phi_operator_gpu and phi_operator_kernel: Templated the PhiOperatorGpu class and associated CUDA kernels to support both float and double precision types.
dgemm_vbatch and gemm_nn/tn_vbatch: Templated the batch GEMM operations to dispatch between float and double at compile time.

GPU Gint Functions

gint_vl_gpu / gint_rho_gpu: Updated to use the templated GPU operators, dispatching precision based on the gint_precision parameter.
gint_fvl_gpu, gint_tau_gpu, gint_vl_metagga_gpu, gint_vl_nspin4_gpu, gint_fvl_meta_gpu, gint_vl_metagga_nspin4_gpu: Propagated the precision template parameter through all GPU gint entry points.

Input Validation

read_input_item_system.cpp: Removed the restriction that forced gint_precision back to double when running on GPU, allowing single and mix modes to work with GPU acceleration.

How Mixed-Precision Works on GPU

When gint_precision = mix:

Early SCF iterations use fp32 for gint_vl and gint_rho computations → faster kernel execution and reduced memory bandwidth.
Once charge density convergence (drho) approaches scf_thr, the GintPrecisionController switches to fp64 for final convergence accuracy.
Force and stress calculations always use fp64 regardless of precision setting.

Any changes of core modules? (ignore if not applicable)

No changes to core modules (ESolver, HSolver, ElecState, Hamilt, Operator, Psi). All changes are confined to module_gint and input parameter validation.

Template the GPU grid integration kernels, batch GEMM operations, and PhiOperatorGpu class to support both single and double precision. - Template phi_operator_gpu and phi_operator_kernel for fp32/fp64 - Template dgemm_vbatch and gemm kernels for precision dispatch - Update gint_vl_gpu, gint_rho_gpu to use templated GPU operators - Propagate precision template through fvl, tau, metagga GPU paths - Remove GPU restriction for gint_precision=single/mix in input validation

dzzz2001 force-pushed the gpu-mix-precision branch from 6197ddc to 1364327 Compare April 3, 2026 07:23

mohanchen approved these changes Apr 5, 2026

View reviewed changes

mohanchen added GPU & DCU & HPC GPU and DCU and HPC related any issues Refactor Refactor ABACUS codes labels Apr 5, 2026

mohanchen merged commit 3a996b6 into deepmodeling:develop Apr 5, 2026
15 checks passed

mohanchen added the Performance Issues related to fail running ABACUS label Apr 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gint): enable mixed-precision (fp32/fp64) support for GPU path#7207

feat(gint): enable mixed-precision (fp32/fp64) support for GPU path#7207
mohanchen merged 1 commit intodeepmodeling:developfrom
dzzz2001:gpu-mix-precision

dzzz2001 commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dzzz2001 commented Apr 3, 2026

Reminder

Linked Issue

Unit Tests and/or Case Tests for my changes

What's changed?

GPU Kernel Templating

GPU Gint Functions

Input Validation

How Mixed-Precision Works on GPU

Any changes of core modules? (ignore if not applicable)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants