Pinned Loading
-
cuda_replace
cuda_replace PublicGPU-accelerated byte-pattern replacement. Python bytes.replace() semantics with streaming support for multi-GB files. CUDA C++ with Python wrapper.
Cuda 3
-
fast_topk_batched
fast_topk_batched PublicHigh-performance batched Top-K selection for CPU inference. Up to 80x faster than PyTorch, optimized for LLM sampling with AVX2 SIMD.
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.