Skip to content

Conversation

@sssshhhhhh
Copy link
Contributor

Closes #1072

Thanks to the work of everyone at arlo-phoenix/CTranslate2-rocm and the linked issue.
Windows can be compiled with this script: https://github.com/sssshhhhhh/CTranslate2/blob/745f0b46aea94acef514185ed5facbb3fecd6dcd/python/tools/prepare_build_environment_windows.ps1
Linux can follow instructions at: https://github.com/arlo-phoenix/CTranslate2-rocm/blob/rocm/README_ROCM.md

Currently targeting rocm 7.1.1. Passes all tests and successfully outputs for whisper and gemma3. For now, just enough changes to build for amd, specific optimisations like flash attention for the future.
Some questions:
Should having prebuilt whls be a goal or would letting people build themselves be fine?
How should packaging be handled? My windows whls currently need the separate install of rocm_sdk_libraries_custom and include amdhip64_7.dll/amd_comgr0701.dll. Whls are 58MB each, removing the 2 dlls drops it to 12MB.
What should be targeted? Currently I'm doing rocm 7 supported rdna, cdna should work but wave size isn't optimal (nvidia/rdna uses 32). Also unsure about rdna2, this pr should work but its support seems bad and I don't have any to test.

@jordimas
Copy link
Collaborator

Currently targeting rocm 7.1.1. Passes all tests and successfully outputs for whisper and gemma3. For now, just enough changes to build for amd, specific optimisations like flash attention for the future. Some questions: Should having prebuilt whls be a goal or would letting people build themselves be fine? How should packaging be handled? My windows whls currently need the separate install of rocm_sdk_libraries_custom and include amdhip64_7.dll/amd_comgr0701.dll. Whls are 58MB each, removing the 2 dlls drops it to 12MB. What should be targeted? Currently I'm doing rocm 7 supported rdna, cdna should work but wave size isn't optimal (nvidia/rdna uses 32). Also unsure about rdna2, this pr should work but its support seems bad and I don't have any to test.

The distribution part is tricky for rocm. My recommendation from minimum to best:

  1. Be able to compile it (currently not possible, good progress)
  2. Provide a Dockerfile
  3. Provide a whl package

I will start with 1 and 2.

@sssshhhhhh
Copy link
Contributor Author

Added docker and windows whls (artifacts). I give up fixing linux whl, it's dependency hell between cibw/rocm.

broken cibw linux script
#! /bin/bash

set -e
set -x

rm -rf /host/usr/share/dotnet
rm -rf /host/usr/lib/jvm
rm -rf /host/usr/local/lib/android
df -h

ls -la /opt/rh/gcc-toolset-14/root/usr/lib/gcc/x86_64-redhat-linux/14
ls -la /usr/lib64/

export LIBRARY_PATH="/opt/rh/gcc-toolset-14/root/usr/lib/gcc/x86_64-redhat-linux/14:${LIBRARY_PATH:-}"

tee /etc/yum.repos.d/rocm.repo <<EOF
[rocm]
name=ROCm 7.2.0 repository
baseurl=https://repo.radeon.com/rocm/el8/7.2/main
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key

[amdgraphics]
name=AMD Graphics 7.2.0 repository
baseurl=https://repo.radeon.com/graphics/7.2/el/8/main/x86_64/
enabled=1
priority=50
gpgcheck=1
gpgkey=https://repo.radeon.com/rocm/rocm.gpg.key
EOF
dnf clean all
dnf install -y rocm-hip-sdk

pip install "cmake==3.22.*"

# various problems here, will error in ct2 cmake below if ignored
hipcc -v

ONEAPI_VERSION=2025.3.0
dnf config-manager --add-repo https://yum.repos.intel.com/oneapi
rpm --import https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB
dnf install -y intel-oneapi-mkl-devel-$ONEAPI_VERSION

ONEDNN_VERSION=3.10.2
curl -L -O https://github.com/uxlfoundation/oneDNN/archive/refs/tags/v${ONEDNN_VERSION}.tar.gz
tar xf *.tar.gz && rm *.tar.gz
cd oneDNN-*
cmake -DCMAKE_BUILD_TYPE=Release -DONEDNN_LIBRARY_TYPE=STATIC -DONEDNN_BUILD_EXAMPLES=OFF -DONEDNN_BUILD_TESTS=OFF -DONEDNN_ENABLE_WORKLOAD=INFERENCE -DONEDNN_ENABLE_PRIMITIVE="CONVOLUTION;REORDER" -DONEDNN_BUILD_GRAPH=OFF .
make -j$(nproc) install
cd ..
rm -r oneDNN-*

OPENMPI_VERSION=4.1.6
curl -L -O https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-${OPENMPI_VERSION}.tar.bz2
tar xf *.tar.bz2 && rm *.tar.bz2
cd openmpi-*
./configure
make -j$(nproc) install
cd ..
rm -r openmpi-*
export LD_LIBRARY_PATH="/usr/local/lib/:$LD_LIBRARY_PATH"

mkdir build-release && cd build-release

cmake -DCMAKE_C_COMPILER=amdclang -DCMAKE_CXX_COMPILER=amdclang++ -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-msse4.1 -O3 -Wno-deprecated-literal-operator" -DCMAKE_HIP_FLAGS="-O3 -Wno-deprecated-literal-operator" -DBUILD_CLI=OFF -DWITH_DNNL=ON -DOPENMP_RUNTIME=COMP -DWITH_HIP=ON -DCMAKE_HIP_ARCHITECTURES="gfx1030;gfx1100;gfx1101;gfx1102;gfx1150;gfx1151;gfx1200;gfx1201" ..

VERBOSE=1 make -j$(nproc) install
cd ..
rm -r build-release

cp README.md python/

Linux:
Either use the docker image, or:
Install rocm https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html
then follow docker/Dockerfile_rocm to install dependencies/build.

Windows:
https://rocm.docs.amd.com/projects/radeon-ryzen/en/latest/docs/install/installrad/windows/install-pytorch.html
Follow prerequisites then install rocm_sdk_core/rocm_sdk_libraries_custom from step 1
and install ctranslate2 whl from github actions artifacts.
If using torch at the same time you might get OMP: Error 15: Initializing etc. Either symlink the omp in site-packages/torch/lib/ to the omp in site-packages/ctranslate2 to fix or set KMP_DUPLICATE_LIB_OK.

I think I'm done with the code pending any issues. @jordimas feel free to make changes if you want, I don't really understand the distribution side past this point.

@sssshhhhhh
Copy link
Contributor Author

I lied, I think I figured out linux whls, will add soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: AMD GPU support with oneDNN AMD support

2 participants