Skip to content

[build] Add CMake build system alongside xmake (Phase 1: CPU + 6 backends) #1156

@voltjia

Description

@voltjia

Goal

Replace xmake with a modern, idiomatic CMake build system for InfiniCore — without breaking existing users — by shipping CMake side-by-side with the existing `xmake.lua` in Phase 1, validated end-to-end on the available test hardware.

The strategic goal beyond Phase 1 is full xmake removal (Phase 3); Phase 1 establishes the topology and proves the per-backend toolchain story on the most-used backends.

Phase 1 scope

  • New top-level `CMakeLists.txt` covering targets equivalent to xmake's `infini-utils`, `infinirt`, `infiniop`, `infiniccl`, `infinicore_cpp_api`, `_infinicore`, plus the test binaries (`infinirt-test`, `infiniop-test`, `infiniccl-test`, `infinicore-test`, `infiniutils-test`).
  • Backends ported and verified end-to-end: CPU, NVIDIA, MetaX, Iluvatar, Moore Threads, Cambricon, Ascend (7 of 11).
  • New parallel CMake CI workflow alongside the existing xmake one. Both must pass to merge.
  • `scripts/cmake_install.py` peer to `scripts/install.py`, preserving the `--=y` UX.
  • `setup.py` becomes build-system-aware via env var `INFINICORE_BUILD_SYSTEM` (default `cmake`, fallback `xmake` for users on Phase-2-deferred backends).
  • README documents both build systems.

Non-goals (Phase 1)

  • Removing `xmake.lua`. It stays untouched. (Phase 3.)
  • CMake support for Hygon DCU, Kunlun XPU, Ali PPU, Qy GPU. Their `ENABLE_*` options are stubs that fail-fast with a redirect to xmake. (Phase 2.)
  • `flash-attn`, `aten`/`torch`, `ninetoothed` integrations. Preserved as no-op options. (Phase 2.)
  • Verified Windows MSVC builds. The MSVC code paths are written but unverified — no Windows test runner in scope. README flags this explicitly.
  • InfiniLM CMake migration. (Phase 3.)
  • Restructuring source code. Only build-system files change.

Design highlights

Modern, idiomatic CMake. Modular subdirectory `CMakeLists.txt` per backend; `find_package` for spdlog/json/pybind11/OpenMP/CUDA/Boost; modern target_link_libraries / target_include_directories / target_compile_features; generator expressions for per-backend flags.

Per-backend integration (Approach A). CUDA-flavored backends (NVIDIA, Iluvatar) use `-DCMAKE_TOOLCHAIN_FILE=cmake/toolchains/.cmake` with `enable_language(CUDA)` and a swapped `CMAKE_CUDA_COMPILER`. Custom-compiler backends (Cambricon `cncc`, Moore `mcc`, MetaX `htcc`/`mxcc`) get helper functions `infinicore_add_bang_library` / `infinicore_add_musa_library` / `infinicore_add_maca_library` that wrap `add_custom_command` per device source and bundle the resulting `.o` files into a normal STATIC library alongside host `.cc` sources.

Ascend special case. Today's xmake invokes `make` inside `src/infiniop/devices/ascend/` which itself runs CMake. The migration subsumes the nested CMake project directly via `add_subdirectory`, dropping the Makefile shim.

Install layout & target naming preserved. Installed shared library filenames are byte-identical (`libinfiniop.so` etc.); pybind module `_infinicore.cpython-3xx-*.so` keeps its soabi suffix; `$INFINI_ROOT` install layout matches xmake exactly. Target names switch from hyphenated (`infiniop-nvidia`) to underscored (`infiniop_nvidia`) internally, but this is invisible externally.

CI. New `.github/workflows/build-cmake.yml` matrix [ubuntu-latest, windows-latest] × [Debug, Release], CPU only (no GPU runners on GHA). Existing `build.yml` (xmake) untouched.

Validation plan

Validated on six remote test servers, one per backend (NVIDIA, MetaX, Iluvatar, Moore, Cambricon, Ascend), with the following per-server flow:

  1. Baseline: `scripts/install.py` (xmake) + `scripts/python_test.py --` — capture baseline pass list.
  2. CMake: `scripts/cmake_install.py` + `pip install .` + `scripts/python_test.py --` — capture CMake pass list.
  3. Diff. Any pass→fail regression blocks the PR; any test failing in both is filed as a pre-existing issue.

PR description will include the per-server pass/fail diff table and link to build logs.

Phase 2 / 3 (not in this issue)

  • Phase 2: CMake support for Hygon, Kunlun, Ali, Qy. Port `flash-attn`, `aten`, `ninetoothed`. Each requires its own toolchain file or helper plus a test box.
  • Phase 3: Delete `xmake.lua`, the xmake CI workflow, the `INFINICORE_BUILD_SYSTEM` switch in `setup.py`. Migrate InfiniLM the same way.

Branch / PR

Branch `issue/`, pushed to `InfiniTensor/InfiniCore`. PR opened against `main`. Commits structured one-per-backend so the change is bisectable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions