Pr/cross platform cuda gradio Windows / Linux / CUDA cross-platform support + Gradio Explorer UI by cronos3k · Pull Request #1 · cronos3k/larql

cronos3k · 2026-04-16T11:22:26Z

Following your video yesterday — watched it, spent the day on this,
here's what came out of it.

What's in this PR

10 commits, each self-contained:

fix(compute/build) — CARGO_CFG_* env vars for correct cross-compilation platform detection
fix(compute) — q4_dot.c: MSVC-safe memcpy + scalar x86 fallback alongside ARM NEON path
feat(compute) — per-platform BLAS: system OpenBLAS on Linux, matrixmultiply on Windows (no install required)
feat(compute) — CUDA/cuBLAS backend, --features cuda, priority: Metal > CUDA > CPU
refactor — replace every hardcoded CpuBackend with default_backend() so GPU features actually fire
feat(models) — Windows HF cache path fix, HF_HOME/HUGGINGFACE_HUB_CACHE support, auto-download via hf-hub
ci — GitHub Actions Linux x86_64 build on ubuntu-22.04 (glibc 2.35, Bookworm-compatible)
chore — .gitignore: exclude vindex dirs, model weights, demo logs
feat(demo) — Gradio 6 Explorer UI: 6 tabs, HF Space Dockerfile, no Rust reimplementation
docs — README: HF Space link

What is not changed

No LQL query language changes
No vindex format changes
No changes to extract-index or walk logic
macOS / Metal path is untouched

Tested on

Windows 11, MSVC, CUDA 12.x, RTX 3080
Ubuntu 22.04 (GitHub Actions CI, CPU only)
HuggingFace Spaces Docker (cpu-basic, glibc 2.35)

Live demo

https://huggingface.co/spaces/cronos3k/LARQL-Explorer
Happy to split into separate PRs or adjust anything you'd like to change.

…fe platform detection build.rs previously used #[cfg(target_arch = "...")] macros directly inside the build script. These macros reflect the HOST architecture, not the TARGET, which causes incorrect behaviour during cross-compilation. Changed to read CARGO_CFG_TARGET_ARCH and CARGO_CFG_TARGET_ENV environment variables, which Cargo sets to the actual target platform. Also added the missing cargo:rerun-if-changed directive so the build script re-runs whenever q4_dot.c is modified. In lib.rs, guarded `extern crate blas_src` with #[cfg(unix)] so Windows builds do not attempt to link a BLAS library that is not present on that platform. No algorithm changes. No behaviour changes on macOS or existing Linux builds.

…allback Two portability issues prevented the file from compiling on Windows (MSVC) and on non-ARM platforms: 1. __builtin_memcpy is a GCC/Clang extension. MSVC does not recognise it. Replaced every use with the standard <string.h> memcpy, which is safe and equally inlined by all major compilers at -O2 or higher. 2. The ARM NEON dot-product path (vdotq_s32) was the only implementation. Added a portable scalar path that activates on any non-ARM platform (x86_64 Windows, x86_64 Linux, RISC-V, etc.) via #ifdef __aarch64__. The decode_f16 helper is now defined unconditionally and shared by both paths. No changes to the algorithm or numerical results on ARM. The scalar path produces identical results to the ARM path, just without SIMD acceleration.

…ixmultiply on Windows Previously the codebase assumed macOS Accelerate everywhere and would fail to compile on Linux or Windows due to unconditional blas-src/accelerate dependencies. Changes per platform, using Cargo's [target.'cfg(...)'.dependencies] tables: macOS — unchanged: Accelerate framework, ships with every Mac. Linux — blas-src + openblas-src with the "system" feature flag. Links the installed system library (apt install libopenblas-dev). Does NOT build OpenBLAS from source (avoids 10+ minute CI builds). Windows — pure ndarray/matrixmultiply backend, no external library required. Performant enough for CPU-only extraction; BLAS can be added later via openblas-src with the "static" feature if needed. cpu::device_info() updated to report the actual backend in use per OS. Feature flags (metal, cuda) are also threaded through larql-cli/Cargo.toml so they can be activated from the workspace root with --features metal/cuda. No changes to any algorithm or numerical behaviour.

Adds a new compute backend backed by cudarc + cuBLAS for f32 GEMM. The CUDA path accelerates the two hottest operations during vindex extraction: the down_meta projection and the embedding similarity pass. Implementation notes: Row-major (ndarray) → column-major (cuBLAS) conversion: C[m×n] = A[m×k] · B[k×n] is equivalent to C^T[n×m] = B_colmaj[n×k] · A_colmaj[k×m] so A and B are swapped in the cuBLAS call with M and N also swapped, keeping OP_N for both operands. No explicit transpose is performed. Q4 operations are not yet implemented on GPU; they fall back to the existing CPU scalar kernel automatically. CudaBackend::new() returns None if no CUDA device is found, allowing default_backend() to fall back to CPU transparently. Feature flag: cargo build --release --features cuda Requires: CUDA toolkit ≥ 12.0, cuBLAS, a CUDA-capable GPU. Not available on macOS — use --features metal there instead. default_backend() priority is now: Metal > CUDA > CPU.

All internal call sites that created larql_compute::CpuBackend directly have been replaced with larql_compute::default_backend(). Without this change, building with --features cuda or --features metal compiles the GPU backend but never uses it: every matmul inside larql-vindex and larql-inference still dispatches to the CPU. This refactor closes that gap. No algorithmic changes. On a CPU-only build, default_backend() returns CpuBackend as before, so behaviour is identical for existing users. Also adds #[cfg(unix)] guards to the blas_src extern crate declaration in example and bench files that previously assumed a Unix host.

… auto-download Three related improvements to model resolution in safetensors.rs: 1. Windows HF cache path The previous code used $HOME/.cache/huggingface/hub which does not exist on Windows (the env var is USERPROFILE, not HOME). Resolution order is now: HUGGINGFACE_HUB_CACHE → HF_HOME/hub → $HOME/.cache/huggingface/hub (Unix) / %USERPROFILE%\.cache\huggingface\hub (Windows). This matches the behaviour of the official huggingface-hub Python and Rust libraries. 2. HF_HOME / HUGGINGFACE_HUB_CACHE support Both env vars are now respected per the HuggingFace caching spec, so users with non-default cache locations don't need to copy files. 3. Auto-download via hf-hub When a model string looks like a HuggingFace repo ID (contains '/') and is not found in the local cache, the model is now downloaded automatically using the hf-hub crate. HF_TOKEN is forwarded if set. This removes the need to manually download models before running larql extract-index. larql-models/Cargo.toml: added hf-hub = "0.5" dependency (was already present in larql-vindex; aligning both crates).

Adds a workflow that builds the larql CLI binary for Linux x86_64 on every push to main and on manual dispatch. Runner: ubuntu-22.04 Pinned to 22.04 rather than latest to produce a binary linked against glibc 2.35, which is compatible with Debian Bookworm, Ubuntu 22.04+, and other currently-supported distributions. A glibc 2.39 binary (from ubuntu-24.04) would not run on Bookworm. Dependencies installed: libopenblas-dev, pkg-config, libssl-dev. Artefacts: - Binary uploaded as a workflow artefact (90-day retention). - A rolling GitHub prerelease tagged latest-linux is created/updated with the binary, so it can be fetched from a stable URL by external tools (e.g. a HuggingFace Space Dockerfile). Requires the GITHUB_TOKEN secret, which is provided automatically by GitHub Actions — no additional setup needed.

…logs Added entries for artefacts that are generated locally and should not be tracked: models/ — downloaded HuggingFace model weights (can be multi-GB) *.vindex/ — extracted vindex directories (binary data) demo/*.log — Gradio and subprocess logs written by the demo app Also fixes a missing newline at end of file from the original .gitignore.

… config Adds a self-contained web interface for exploring vindexes interactively, located in demo/. No changes to any Rust crate. Six tabs: Walk Explorer — per-layer FFN feature activation for a prompt Knowledge Probe — compare how three prompts encode at the same layer LQL Console — run raw LQL queries against the vindex Vindex Info — metadata, layer count, model family from index.json Extract — trigger larql extract-index from the UI Setup & About — build instructions, binary check, environment info Key implementation details: demo/app.py — Gradio 6 Blocks app, ~640 lines. Calls the larql binary as a subprocess; no Python reimplementation of any Rust logic. Results are parsed and displayed via gr.HTML (avoids Gradio DataFrame JS issues). demo/utils.py — Output parsers for larql walk, verify, lql output. Also provides vindex discovery and index.json loading. demo/Dockerfile — Docker image for HuggingFace Spaces (Docker SDK). Downloads the pre-built Linux binary at image build time from the latest-linux GitHub release; no Rust toolchain required in the image. demo/setup.sh — Local setup helper: builds the binary and installs Python deps. demo/hf_space/ — Minimal HuggingFace Space configuration template. Maintainers wanting to deploy their own Space can copy this directory and adjust the repo URLs. The demo auto-downloads a small demo vindex from HuggingFace Hub on first start if no local vindex is found (requires huggingface_hub Python package). Live reference deployment: https://huggingface.co/spaces/cronos3k/LARQL-Explorer

Adds a one-line reference to the live LARQL Explorer Space so users reading the README can try the tool without building from source.

workturnedplay · 2026-05-19T20:06:04Z

I'm a bit confused, was this PR meant to be made on the original repo instead of on your own fork? I might be missing something here

ghmk added 10 commits April 16, 2026 13:16

docs: add HuggingFace Space link to README

f91fdde

Adds a one-line reference to the live LARQL Explorer Space so users reading the README can try the tool without building from source.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pr/cross platform cuda gradio Windows / Linux / CUDA cross-platform support + Gradio Explorer UI#1

Pr/cross platform cuda gradio Windows / Linux / CUDA cross-platform support + Gradio Explorer UI#1
cronos3k wants to merge 10 commits into
mainfrom
pr/cross-platform-cuda-gradio

cronos3k commented Apr 16, 2026

Uh oh!

workturnedplay commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cronos3k commented Apr 16, 2026

What's in this PR

What is not changed

Tested on

Live demo

Uh oh!

workturnedplay commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants