The definitive Strix Halo LLM guide — 65 t/s on a $2,999 mini PC. Live benchmarks, tested optimizations, and everything that doesn't work.
-
Updated
Apr 26, 2026 - Shell
The definitive Strix Halo LLM guide — 65 t/s on a $2,999 mini PC. Live benchmarks, tested optimizations, and everything that doesn't work.
A CUDA implementation of the transpose-free Quasi-Minimal Residual method
Operator-grade GPU monitor for NVIDIA GPUs with native GB10 / DGX Spark coherent UMA support — PSI pressure, clock detection, ConnectX-7 network layer
Unified Memory Abstraction Layer for AI Inference on AMD APUs and Intel iGPUs
Fundamentals of Accelerated Computing C/C++ is a course provided by NVIDIA.
Performance comparison of two different forms of memory management in CUDA
NVML unified memory shim for NVIDIA DGX Spark Grace Blackwell GB10 - enables MAX Engine, PyTorch, and GPU monitoring
Talos-O (Omni): A sovereign, embodied agentic organism forged on AMD Strix Halo. Integrating the Chimera Kernel (Linux 7.0), Zero-Copy Introspection, and the Phronesis Engine. Built from First Principles.
gpu thrashingNVIDIA GPU Unified Memory diagnostic tool — architecture-aware, measurement-based, PCIe/coherent transport detection
3D U-Net with tf.keras for Large-Model-Support or Unified Memory
Apple Silicon Unified Memory for GPU-Accelerated Analytics — TPC-H benchmarks across DuckDB, NumPy, and MLX
Reproducible Pascal GPU Unified Memory benchmark with Nsight and nvprof profiling
Local inference server for Apple Silicon — hot-swaps MLX models (LLM, vision, embeddings, TTS, STT) via OpenAI API
NVIDIA GPU validation: PCIe transport, Unified Memory prefetch, SGEMM compute, drift detection.
Extended the UVM Benchmark such that we can test for huge data workloads(16GiB and more). Needed to make it overflow save and add dataset creation logic for some Applications.
GB10 unified memory diagnostic suite — bandwidth, contention, atomic coherence, CUPTI activity, power and thermal correlation
Run LLMs larger than your RAM — native GGUF inference engine with SSD streaming, no GPU required
Cycle-accurate UMA fault latency and bandwidth measurement for NVIDIA GPUs. C and PTX. No Python. Pascal (SM 6.0) through Blackwell GB10 (SM 12.1).
3-bit Lloyd-Max KV Cache Compression for LLM Inference on NVIDIA DGX Spark GB10 — 5.12x compression, 0.983 cosine similarity, pure numpy on ARM unified memory
CUPTI UVM activity API diagnostic for GB10 / DGX Spark — determines whether CUPTI_ACTIVITY_KIND_UNIFIED_MEMORY_COUNTER is supported at the API level
Add a description, image, and links to the unified-memory topic page so that developers can more easily learn about it.
To associate your repository with the unified-memory topic, visit your repo's landing page and select "manage topics."