RAZZULLIX

RAZ RAZZULLIX

Pinned Loading

cuda_replace cuda_replace Public

GPU-accelerated byte-pattern replacement. Python bytes.replace() semantics with streaming support for multi-GB files. CUDA C++ with Python wrapper.

Cuda 3
fast_topk_batched fast_topk_batched Public

High-performance batched Top-K selection for CPU inference. Up to 80x faster than PyTorch, optimized for LLM sampling with AVX2 SIMD.

C++ 15 1