Skip to content

feat(gint): enable mixed-precision (fp32/fp64) support for GPU path#7207

Merged
mohanchen merged 1 commit intodeepmodeling:developfrom
dzzz2001:gpu-mix-precision
Apr 5, 2026
Merged

feat(gint): enable mixed-precision (fp32/fp64) support for GPU path#7207
mohanchen merged 1 commit intodeepmodeling:developfrom
dzzz2001:gpu-mix-precision

Conversation

@dzzz2001
Copy link
Copy Markdown
Collaborator

@dzzz2001 dzzz2001 commented Apr 3, 2026

Reminder

  • Have you linked an issue with this pull request?
  • Have you added adequate unit tests and/or case tests for your pull request?
  • Have you noticed possible changes of behavior below or in the linked issue?
  • Have you explained the changes of codes in core modules of ESolver, HSolver, ElecState, Hamilt, Operator or Psi? (ignore if not applicable)

Linked Issue

Follow-up to #7149 (CPU mixed-precision gint support)

Unit Tests and/or Case Tests for my changes

  • Tested with 10 LCAO benchmark cases comparing gint_precision = double vs gint_precision = mix on GPU.

What's changed?

This PR extends the mixed-precision grid integration (gint_precision = mix/single) support from CPU-only to GPU. The key changes include:

GPU Kernel Templating

  • phi_operator_gpu and phi_operator_kernel: Templated the PhiOperatorGpu class and associated CUDA kernels to support both float and double precision types.
  • dgemm_vbatch and gemm_nn/tn_vbatch: Templated the batch GEMM operations to dispatch between float and double at compile time.

GPU Gint Functions

  • gint_vl_gpu / gint_rho_gpu: Updated to use the templated GPU operators, dispatching precision based on the gint_precision parameter.
  • gint_fvl_gpu, gint_tau_gpu, gint_vl_metagga_gpu, gint_vl_nspin4_gpu, gint_fvl_meta_gpu, gint_vl_metagga_nspin4_gpu: Propagated the precision template parameter through all GPU gint entry points.

Input Validation

  • read_input_item_system.cpp: Removed the restriction that forced gint_precision back to double when running on GPU, allowing single and mix modes to work with GPU acceleration.

How Mixed-Precision Works on GPU

When gint_precision = mix:

  1. Early SCF iterations use fp32 for gint_vl and gint_rho computations → faster kernel execution and reduced memory bandwidth.
  2. Once charge density convergence (drho) approaches scf_thr, the GintPrecisionController switches to fp64 for final convergence accuracy.
  3. Force and stress calculations always use fp64 regardless of precision setting.

Any changes of core modules? (ignore if not applicable)

  • No changes to core modules (ESolver, HSolver, ElecState, Hamilt, Operator, Psi). All changes are confined to module_gint and input parameter validation.

Template the GPU grid integration kernels, batch GEMM operations, and
PhiOperatorGpu class to support both single and double precision.

- Template phi_operator_gpu and phi_operator_kernel for fp32/fp64
- Template dgemm_vbatch and gemm kernels for precision dispatch
- Update gint_vl_gpu, gint_rho_gpu to use templated GPU operators
- Propagate precision template through fvl, tau, metagga GPU paths
- Remove GPU restriction for gint_precision=single/mix in input validation
@dzzz2001 dzzz2001 force-pushed the gpu-mix-precision branch from 6197ddc to 1364327 Compare April 3, 2026 07:23
@mohanchen mohanchen added GPU & DCU & HPC GPU and DCU and HPC related any issues Refactor Refactor ABACUS codes labels Apr 5, 2026
@mohanchen mohanchen merged commit 3a996b6 into deepmodeling:develop Apr 5, 2026
15 checks passed
@mohanchen mohanchen added the Performance Issues related to fail running ABACUS label Apr 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

GPU & DCU & HPC GPU and DCU and HPC related any issues Performance Issues related to fail running ABACUS Refactor Refactor ABACUS codes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants