Skip to content

ggml-cuda : avoid ambiguous operator+ for half/bfloat16 in CUDA 11.8#23006

Closed
danbev wants to merge 1 commit into
ggml-org:masterfrom
danbev:ggml-cuda-half
Closed

ggml-cuda : avoid ambiguous operator+ for half/bfloat16 in CUDA 11.8#23006
danbev wants to merge 1 commit into
ggml-org:masterfrom
danbev:ggml-cuda-half

Conversation

@danbev
Copy link
Copy Markdown
Member

@danbev danbev commented May 13, 2026

Overview

This commit add a function named ar_add() (all reduce add) to avoid ambiguous operator+ for half/bfloat16 in CUDA 11.8.

Additional information

The motivation for this changes is that in whisper.cpp CUDA 11.8 is used CI and it is currently failing with the following error:

FAILED: [code=1] ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/Release/allreduce.cu.obj
sccache C:\PROGRA~1\NVIDIA~1\CUDA\v\bin\nvcc.exe -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_CRT_SECURE_NO_WARNINGS -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -DCMAKE_INTDIR=\"Release\" -ID:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\.. -ID:\a\whisper.cpp\whisper.cpp\ggml\src\..\include -isystem "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include" -allow-unsupported-compiler -D_ALLOW_COMPILER_AND_STL_VERSION_MISMATCH -D_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR -Xcompiler="-MD -O2 -Ob2" -DNDEBUG -std=c++17 -arch=native -use_fast_math -extended-lambda -Xcompiler /Zc:preprocessor -MD -MT ggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\allreduce.cu.obj -MF ggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\allreduce.cu.obj.d -x cu -c D:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\allreduce.cu -o ggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\allreduce.cu.obj -Xcompiler=-Fdggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\,-FS
D:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\allreduce.cu(213): error: more than one conversion function from "half" to a built-in type applies:
            function "__half::operator float() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(204): here
            function "__half::operator short() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(222): here
            function "__half::operator unsigned short() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(225): here
            function "__half::operator int() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(228): here
            function "__half::operator unsigned int() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(231): here
            function "__half::operator long long() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(234): here
            function "__half::operator unsigned long long() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(237): here
            function "__half::operator __nv_bool() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(241): here
          detected during:
            instantiation of "void ggml_cuda_ar_add_kernel(T_dst *, const T_src *, int) [with T_dst=half, T_src=half]"
(691): here
            instantiation of "__nv_bool ggml_cuda_ar_allreduce_copy_impl(ggml_cuda_ar_pipeline *, ggml_backend_t *, T_src *const *, T_dst *const *, const __nv_bool *, int64_t, size_t) [with T_src=half, T_dst=half]"
(735): here
            instantiation of "__nv_bool ggml_cuda_ar_allreduce_copy_outer(ggml_cuda_ar_pipeline *, ggml_backend_t *, T_src *const *, T_dst *const *, const __nv_bool *, int64_t) [with T_src=half, T_dst=half]"
(872): here

This commit was an attempt to resolve this issue and it seems to work but I am not sure if it is the best solution. And perhaps this should be added to ggml-cuda/convert.cuh instead.

Refs: https://github.com/ggml-org/whisper.cpp/actions/runs/25713948217/job/75500081939?pr=3803#step:11:206

Requirements

This commit add a function named `ar_add()` (all reduce add) to avoid
ambiguous operator+ for half/bfloat16 in CUDA 11.8.

The motivation for this changes is that in whisper.cpp CUDA 11.8 is used
CI and it is currently failing with the following error:
```console
FAILED: [code=1] ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/Release/allreduce.cu.obj
sccache C:\PROGRA~1\NVIDIA~1\CUDA\v\bin\nvcc.exe -forward-unknown-to-host-compiler -DGGML_BACKEND_BUILD -DGGML_BACKEND_SHARED -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_SCHED_MAX_COPIES=4 -DGGML_SHARED -D_CRT_SECURE_NO_WARNINGS -D_XOPEN_SOURCE=600 -Dggml_cuda_EXPORTS -DCMAKE_INTDIR=\"Release\" -ID:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\.. -ID:\a\whisper.cpp\whisper.cpp\ggml\src\..\include -isystem "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include" -allow-unsupported-compiler -D_ALLOW_COMPILER_AND_STL_VERSION_MISMATCH -D_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR -Xcompiler="-MD -O2 -Ob2" -DNDEBUG -std=c++17 -arch=native -use_fast_math -extended-lambda -Xcompiler /Zc:preprocessor -MD -MT ggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\allreduce.cu.obj -MF ggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\allreduce.cu.obj.d -x cu -c D:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\allreduce.cu -o ggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\allreduce.cu.obj -Xcompiler=-Fdggml\src\ggml-cuda\CMakeFiles\ggml-cuda.dir\Release\,-FS
D:\a\whisper.cpp\whisper.cpp\ggml\src\ggml-cuda\allreduce.cu(213): error: more than one conversion function from "half" to a built-in type applies:
            function "__half::operator float() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(204): here
            function "__half::operator short() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(222): here
            function "__half::operator unsigned short() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(225): here
            function "__half::operator int() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(228): here
            function "__half::operator unsigned int() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(231): here
            function "__half::operator long long() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(234): here
            function "__half::operator unsigned long long() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(237): here
            function "__half::operator __nv_bool() const"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v\include\cuda_fp16.hpp(241): here
          detected during:
            instantiation of "void ggml_cuda_ar_add_kernel(T_dst *, const T_src *, int) [with T_dst=half, T_src=half]"
(691): here
            instantiation of "__nv_bool ggml_cuda_ar_allreduce_copy_impl(ggml_cuda_ar_pipeline *, ggml_backend_t *, T_src *const *, T_dst *const *, const __nv_bool *, int64_t, size_t) [with T_src=half, T_dst=half]"
(735): here
            instantiation of "__nv_bool ggml_cuda_ar_allreduce_copy_outer(ggml_cuda_ar_pipeline *, ggml_backend_t *, T_src *const *, T_dst *const *, const __nv_bool *, int64_t) [with T_src=half, T_dst=half]"
(872): here
```
This commit was an attempt to resolve this issue and it seems to work
but I am not sure if it is the best solution. And perhaps this should be
added to ggml-cuda/convert.cuh instead.

Refs: https://github.com/ggml-org/whisper.cpp/actions/runs/25713948217/job/75500081939?pr=3803#step:11:206
@danbev danbev requested a review from a team as a code owner May 13, 2026 08:12
@github-actions github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 13, 2026
@JohannesGaessler
Copy link
Copy Markdown
Contributor

As of right now I would prefer #22994 as a solution since I think it is simpler and less prone to more annoying compiler problems. Generally speaking, for compute-bound problems you would want to do neither of the two proposed solutions as you would be operating on the packed 32 bit data types for 2x FP16/BF16.

@danbev
Copy link
Copy Markdown
Member Author

danbev commented May 13, 2026

@JohannesGaessler Sounds good, thanks!

@danbev danbev closed this May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants