Intelligent load balancer for distributed vLLM server clusters 分布式 vLLM 服务器集群的智能负载均衡器
-
Updated
Oct 22, 2025 - Python
Intelligent load balancer for distributed vLLM server clusters 分布式 vLLM 服务器集群的智能负载均衡器
agentsculptor is an experimental AI-powered development agent designed to analyze, refactor, and extend Python projects automatically. It uses an OpenAI-like planner–executor loop on top of a vLLM backend, combining project context analysis, structured tool calls, and iterative refinement. It has only been tested with gpt-oss-120b via vLLM.
A curated list of plugins built on top of vLLM
Deploy the Magistral-Small-2506 model using vLLM and Modal
[KAIST CS632] Road damage detection using YOLOv8 on Xilinx FPGA, repair estimation with vLLM-Serve Phi-3.5 FAISS RAG, and data management via GS1 EPCISv2 and React dashboard
This Repository contains terraform configuration for vllm production-stack in the cloud managed K8s
Performant LLM inferencing on Kubernetes via vLLM
This project offers a production-ready RAG (Retrieval-Augmented Generation) API running on FastAPI, utilizing the high-performance vLLM engine.
Wheels & Docker images for running vLLM on CPU-only systems, optimized for different CPU instruction sets
Load testing openai/gpt-oss-20b with vLLM and Docker
Production-grade vLLM serving with an OpenAI-compatible API, per-request LoRA routing, KEDA autoscaling on Prometheus metrics, Grafana/OTel observability, and a benchmark comparing AWQ vs GPTQ vs GGUF.
A simple app to generate caption for your instagram post using `JoyCaption` model hosted in RunPod.io
Finetunned llm for domain usecase and inference using the vllm and serving on ollama
[2024 Elice AI Hellothon Excellence Award (2nd Place)] Caregiver cognitive activity lesson guide creator and elderly interactive AI drawing diary service, Saem, Sam
A comprehensive framework for multi-node, multi-GPU scalable LLM inference on HPC systems using vLLM and Ollama. Includes distributed deployment templates, benchmarking workflows, and chatbot/RAG pipelines for high-throughput, production-grade AI services
Đây là mô hình OCR được tinh chỉnh từ Vintern1B (InternVL 1B) với 1 tỷ tham số. Mô hình có khả năng nhận diện văn bản trong nhiều ngữ cảnh khác nhau như chữ viết tay, chữ in, và văn bản trên các đối tượng thực tế.
Project to set up a UI for users to interact with a LLM being served using vLLM
This repository provides a Docker image for vLLM with transformers>=5.0.0rc0 pre-installed to support newer models.
Add a description, image, and links to the vllm-serve topic page so that developers can more easily learn about it.
To associate your repository with the vllm-serve topic, visit your repo's landing page and select "manage topics."