ggml-cuda : avoid ambiguous operator+ for half/bfloat16 in CUDA 11.8 by danbev · Pull Request #23006 · ggml-org/llama.cpp

danbev · 2026-05-13T08:12:21Z

Overview

This commit add a function named ar_add() (all reduce add) to avoid ambiguous operator+ for half/bfloat16 in CUDA 11.8.

Additional information

The motivation for this changes is that in whisper.cpp CUDA 11.8 is used CI and it is currently failing with the following error:

FAILED: [code=1] ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/Release/allreduce.cu.obj
sccache C:\PROGRA~1\NVIDIA~1\CUDA\v\bin\nvcc.exe -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_CRT_SECURE_NO_WARNINGS -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -DCMAKE_INTDIR=\"Release\" -ID:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\.. -ID:\a\whisper.cpp\whisper.cpp\ggml\src\..\include -isystem "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include" -allow-unsupported-compiler -D_ALLOW_COMPILER_AND_STL_VERSION_MISMATCH -D_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR -Xcompiler="-MD -O2 -Ob2" -DNDEBUG -std=c++17 -arch=native -use_fast_math -extended-lambda -Xcompiler /Zc:preprocessor -MD -MT ggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\allreduce.cu.obj -MF ggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\allreduce.cu.obj.d -x cu -c D:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\allreduce.cu -o ggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\allreduce.cu.obj -Xcompiler=-Fdggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\,-FS
D:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\allreduce.cu(213): error: more than one conversion function from "half" to a built-in type applies:
            function "__half::operator float() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(204): here
            function "__half::operator short() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(222): here
            function "__half::operator unsigned short() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(225): here
            function "__half::operator int() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(228): here
            function "__half::operator unsigned int() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(231): here
            function "__half::operator long long() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(234): here
            function "__half::operator unsigned long long() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(237): here
            function "__half::operator __nv_bool() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(241): here
          detected during:
            instantiation of "void ggml_cuda_ar_add_kernel(T_dst *, const T_src *, int) [with T_dst=half, T_src=half]"
(691): here
            instantiation of "__nv_bool ggml_cuda_ar_allreduce_copy_impl(ggml_cuda_ar_pipeline *, ggml_backend_t *, T_src *const *, T_dst *const *, const __nv_bool *, int64_t, size_t) [with T_src=half, T_dst=half]"
(735): here
            instantiation of "__nv_bool ggml_cuda_ar_allreduce_copy_outer(ggml_cuda_ar_pipeline *, ggml_backend_t *, T_src *const *, T_dst *const *, const __nv_bool *, int64_t) [with T_src=half, T_dst=half]"
(872): here

This commit was an attempt to resolve this issue and it seems to work but I am not sure if it is the best solution. And perhaps this should be added to ggml-cuda/convert.cuh instead.

Refs: https://github.com/ggml-org/whisper.cpp/actions/runs/25713948217/job/75500081939?pr=3803#step:11:206

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: No

This commit add a function named `ar_add()` (all reduce add) to avoid ambiguous operator+ for half/bfloat16 in CUDA 11.8. The motivation for this changes is that in whisper.cpp CUDA 11.8 is used CI and it is currently failing with the following error: ```console FAILED: [code=1] ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/Release/allreduce.cu.obj sccache C:\PROGRA~1\NVIDIA~1\CUDA\v\bin\nvcc.exe -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_CRT_SECURE_NO_WARNINGS -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -DCMAKE_INTDIR=\"Release\" -ID:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\.. -ID:\a\whisper.cpp\whisper.cpp\ggml\src\..\include -isystem "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include" -allow-unsupported-compiler -D_ALLOW_COMPILER_AND_STL_VERSION_MISMATCH -D_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR -Xcompiler="-MD -O2 -Ob2" -DNDEBUG -std=c++17 -arch=native -use_fast_math -extended-lambda -Xcompiler /Zc:preprocessor -MD -MT ggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\allreduce.cu.obj -MF ggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\allreduce.cu.obj.d -x cu -c D:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\allreduce.cu -o ggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\allreduce.cu.obj -Xcompiler=-Fdggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\,-FS D:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\allreduce.cu(213): error: more than one conversion function from "half" to a built-in type applies: function "__half::operator float() const" C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(204): here function "__half::operator short() const" C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(222): here function "__half::operator unsigned short() const" C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(225): here function "__half::operator int() const" C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(228): here function "__half::operator unsigned int() const" C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(231): here function "__half::operator long long() const" C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(234): here function "__half::operator unsigned long long() const" C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(237): here function "__half::operator __nv_bool() const" C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(241): here detected during: instantiation of "void ggml_cuda_ar_add_kernel(T_dst *, const T_src *, int) [with T_dst=half, T_src=half]" (691): here instantiation of "__nv_bool ggml_cuda_ar_allreduce_copy_impl(ggml_cuda_ar_pipeline *, ggml_backend_t *, T_src *const *, T_dst *const *, const __nv_bool *, int64_t, size_t) [with T_src=half, T_dst=half]" (735): here instantiation of "__nv_bool ggml_cuda_ar_allreduce_copy_outer(ggml_cuda_ar_pipeline *, ggml_backend_t *, T_src *const *, T_dst *const *, const __nv_bool *, int64_t) [with T_src=half, T_dst=half]" (872): here ``` This commit was an attempt to resolve this issue and it seems to work but I am not sure if it is the best solution. And perhaps this should be added to ggml-cuda/convert.cuh instead. Refs: https://github.com/ggml-org/whisper.cpp/actions/runs/25713948217/job/75500081939?pr=3803#step:11:206

JohannesGaessler · 2026-05-13T14:10:49Z

As of right now I would prefer #22994 as a solution since I think it is simpler and less prone to more annoying compiler problems. Generally speaking, for compute-bound problems you would want to do neither of the two proposed solutions as you would be operating on the packed 32 bit data types for 2x FP16/BF16.

danbev · 2026-05-13T14:29:56Z

@JohannesGaessler Sounds good, thanks!

danbev requested a review from a team as a code owner May 13, 2026 08:12

github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 13, 2026

ORippler mentioned this pull request May 13, 2026

Compile bug: Compile error on CUDA after PR 22299 #22974

Open

danbev closed this May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-cuda : avoid ambiguous operator+ for half/bfloat16 in CUDA 11.8#23006

ggml-cuda : avoid ambiguous operator+ for half/bfloat16 in CUDA 11.8#23006
danbev wants to merge 1 commit into
ggml-org:masterfrom
danbev:ggml-cuda-half

danbev commented May 13, 2026

Uh oh!

JohannesGaessler commented May 13, 2026

Uh oh!

danbev commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danbev commented May 13, 2026

Overview

Additional information

Requirements

Uh oh!

JohannesGaessler commented May 13, 2026

Uh oh!

danbev commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants