FlashInfer: Kernel Library for LLM Serving
-
Updated
Dec 10, 2025 - C++
FlashInfer: Kernel Library for LLM Serving
GPU cluster manager for optimized AI model deployment
一個基於 llama.cpp 的分佈式 LLM 推理程式,讓您能夠利用區域網路內的多台電腦協同進行大型語言模型的分佈式推理,使用 Electron 的製作跨平台桌面應用程式操作 UI。
Analyze and generate unstructured data using LLMs, from quick experiments to billion token jobs.
Source code of the paper "Private Collaborative Edge Inference via Over-the-Air Computation".
Super Ollama Load Balancer - Performance-aware routing for distributed Ollama deployments with Ray, Dask, and adaptive metrics
Official impl. of ACM MM paper "Identity-Aware Attribute Recognition via Real-Time Distributed Inference in Mobile Edge Clouds". A distributed inference model for pedestrian attribute recognition with re-ID in an MEC-enabled camera monitoring system. Jointly training of pedestrian attribute recognition and Re-ID.
A comprehensive framework for multi-node, multi-GPU scalable LLM inference on HPC systems using vLLM and Ollama. Includes distributed deployment templates, benchmarking workflows, and chatbot/RAG pipelines for high-throughput, production-grade AI services
Encrypted Decentralized Inference and Learning (E.D.I.L.)
neurogrid-inference delivers decentralized AI inference for NeuroGrid, enabling secure, on-chain-verified model outputs. Supports biomedical AI agents, permissioned data retrieval, encrypted model calls, and verifiable inference proofs. Built for BNB Chain and fully open-source.
Add a description, image, and links to the distributed-inference topic page so that developers can more easily learn about it.
To associate your repository with the distributed-inference topic, visit your repo's landing page and select "manage topics."