-
Notifications
You must be signed in to change notification settings - Fork 37
Description
Currently, cppauto resolves the backend within this block in cudacpp.mk:
ifeq ($(BACKEND),cppauto)
ifeq ($(UNAME_P),ppc64le)
override BACKEND = cppsse4
else ifeq ($(UNAME_P),arm)
override BACKEND = cppsse4
else ifeq ($(wildcard /proc/cpuinfo),)
override BACKEND = cppnone
###$(warning Using BACKEND='$(BACKEND)' because host SIMD features cannot be read from /proc/cpuinfo)
else ifeq ($(shell grep -m1 -c avx512vl /proc/cpuinfo)$(shell $(CXX) --version | grep ^clang),1)
override BACKEND = cpp512y
else
override BACKEND = cppavx2
# ...
endif
endifSo it means it will look for AVX512 and if not available, it will fall back to AVX2.
I think this may not be the right behaviour in some particular cases:
The CPU supports only SSE and not AVX/AVX2
I must say I'm don't know if this is possible, or if hardware of this kind exist
In this case, it will fall back to AVX2, and crashing during execution.
The CPU supports only AVX and not AVX2
This is the case of Intel(R) Xeon(R) CPU E5-2695 v2.
In this case it will still fall back to AVX2, so it will be compiled with -march=haswell.
Is this ok in this scenario?
Additionally, at runtime, we check explicitly (MatrixElementKernelHost::hostSupportsSIMD using __builtin_cpu_supports("avx2"), and I'm quite sure that would fail in this particular scenario.
What do you think? Should we add multiple checks for both SSE and AVX?