Skip to content

Support Windows and Linux x86_64 binary builds and release artifacts #25

@inureyes

Description

@inureyes

Summary

Extend mlxcel's release build matrix to cover Linux x86_64 and Windows in addition to the current macOS aarch64 + Linux aarch64 (CUDA) targets. This aligns the distribution surface with the Backend.AI:GO desktop application and makes mlxcel deployable to customer sites that standardize on Linux x86_64 (the dominant CUDA host architecture in production).

Background

Today's releases ship two artifact families:

  • mlxcel-macos-aarch64.zip — Apple Silicon
  • mlxcel-linux-aarch64-cuda13-{gb10,gh200} — Linux aarch64 + CUDA (Grace Blackwell / Grace Hopper)

The README prerequisites already claim "Linux (aarch64 or x86_64)" support, but no published binary exists for x86_64 today. Windows is not built at all, although the runtime distribution layout (MLX upstream's mlx/backend/cuda/delayload.cpp delay-load mechanism + NVIDIA DLL bundling, ~1.0–1.3 GB total) has been worked out.

Two motivations:

  1. Customer fit — Most enterprise CUDA hosts (RTX, A100, L40, H100/H200 outside of GH200 Grace systems) are Linux x86_64. Without a published x86_64 binary, those sites cannot adopt mlxcel without building from source.
  2. Backend.AI:GO parity — The Backend.AI:GO desktop application supports macOS + Windows + Linux x86_64. mlxcel should match so the same runtime can serve any Backend.AI:GO target.

Proposed Solution

Split the work into three deliverables.

A. Linux x86_64 + CUDA build

  • Add a release artifact mlxcel-linux-x86_64-cuda13.{tar.gz|zip} produced from an x86_64 + NVIDIA host (build-only is acceptable for the first iteration; smoke test on a real GPU is preferred).
  • Decide on the target CUDA architectures — SM 80/86/89/90 covers Ampere through Hopper; the existing CUDA arch matrix in README.md already documents non-Hopper quantization limitations.
  • Verify src/lib/mlxcel-core/build.rs and the bundled MLX C++ build produce equivalent output on x86_64 — expected to be mechanical because Linux paths already differentiate from macOS via #[cfg(target_os = "linux")].

B. Windows + CUDA build

  • Extend src/lib/mlxcel-core/build.rs with #[cfg(target_os = "windows")] branches:
    • MSVC toolchain detection (vs. GCC/Clang on Linux)
    • CUDA_PATH env var (Windows convention) in addition to CUDA_HOME
    • Link against cudart.lib and friends; static-vs-dynamic linkage decision mirrors MLX's Windows build
  • Add Windows handling in the bundled MLX C++ build for OpenBLAS FetchContent and the delay-load DLL compile definitions (MLX_CUDA_BIN_DIR, MLX_CUDNN_BIN_DIR).
  • Package the Windows artifact (mlxcel.exe + nvidia/cublas/bin/... etc.) per the runtime distribution layout. Bundle size is expected to be ~1.0–1.3 GB because of cuBLAS + NVRTC + cuDNN — fits within the 2 GB-per-asset GitHub Release limit.
  • Code signing — investigate signtool.exe / Azure Code Signing, or ship unsigned for the first release with a documented SmartScreen warning.
  • Confirm all crates compile on x86_64-pc-windows-msvc: tokenizers, safetensors, axum, cxx, sentencepiece-sys, and the multimodal stack (ffmpeg / video frame extraction may need a Windows-friendly variant).

C. Documentation + README updates

  • Update README.md prerequisites to enumerate Linux aarch64, Linux x86_64, and Windows distinctly rather than the current "Linux (aarch64 or x86_64)" shorthand.
  • Add a docs/windows-build-guide.md for developers building from source on Windows.
  • Update the CUDA architecture compatibility table to reflect the broader x86_64 GPU coverage (RTX 30/40/50 series, A100, L40, H100).

Implementation Notes

  • Code surface that already touches Windows
    • src/distributed/rdma_capabilities.rs already has cfg!(target_os = "windows") branches. The rest of src/distributed/ should be audited for Windows-specific socket / transport quirks before claiming pipeline parallelism works on Windows.
    • No other target_os = "windows" gates exist today.
  • MLX upstream Windows status — MLX has the delay-load CUDA mechanism (mlx/backend/cuda/delayload.cpp), but full Windows-only correctness may require additional upstream patches.
  • Self-hosted runner needs — x86_64 + NVIDIA Linux runner (RTX class is enough for build, but actual smoke testing benefits from Hopper or Blackwell). Windows + NVIDIA runner ideally has the same.
  • Build-only vs. test-on-platform — For the first iteration, building plus a smoke test (model load + 10-token generate) on each platform is sufficient. Full benchmark + parity runs can land in a follow-up.
  • Rust target triples
    • Linux x86_64: x86_64-unknown-linux-gnu
    • Windows: x86_64-pc-windows-msvc

The release pipeline lives in the development repository and not this mirror; this issue tracks the user-visible deliverable. Implementation discussion is welcome here.

Acceptance Criteria

  • Release publishes mlxcel-linux-x86_64-cuda13.{tar.gz|zip}
  • Release publishes mlxcel-windows-x86_64-cuda13.zip with bundled CUDA runtime DLLs
  • Smoke test (mlxcel generate with a 1B 4-bit model, 10 tokens) succeeds on Linux x86_64 + NVIDIA GPU
  • Smoke test succeeds on Windows + NVIDIA GPU
  • README.md prerequisites section enumerates Linux aarch64, Linux x86_64, and Windows distinctly
  • docs/windows-build-guide.md walks through a developer-side Windows build
  • CUDA architecture compatibility table reflects x86_64 GPU coverage (Ampere, Ada, Hopper, Blackwell)
  • Code-signing strategy decided for the Windows binary (signed / unsigned with documented warning / deferred)

Original Suggestion

Let's match the compatibility matrix with Backend.AI:GO desktop application, and also make it available to our customer sites which often use Linux x86-64 environments.

Metadata

Metadata

Assignees

Labels

priority:mediumMedium prioritystatus:readyReady to be worked ontype:enhancementNew features, capabilities, or significant additions

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions