-
-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Spec Decode] Don't compile
update_num_computed_tokens_for_batch_change
speculative-decoding
v1
#41484
opened May 1, 2026 by
MatthewBonanni
Collaborator
•
Draft
4 tasks
[bug] Fix lora reloading from disk every step when using AsyncLLM
bug
Something isn't working
frontend
#41482
opened May 1, 2026 by
hao-aaron
Contributor
Loading…
4 tasks
[Spec Decode][V1][V2] Warm spec-decode helper kernels at startup
speculative-decoding
v1
#41481
opened May 1, 2026 by
li-xinwei
Loading…
[Bugfix] Fix B200 batch determinism in fused_moe logic
bug
Something isn't working
#41480
opened May 1, 2026 by
Lucaskabela
Contributor
•
Draft
3 of 4 tasks
[Bug] Fix /pooling 404 after resolve_lora in _maybe_get_adapters
bug
Something isn't working
frontend
#41479
opened May 1, 2026 by
MattThomas-fastino
Loading…
Limit concurrency on
test_transcription_api_correctness.py
#41478
opened May 1, 2026 by
ekagra-ranjan
Contributor
Loading…
[Perf] Batch Weight Prefetching via cuMemcpyBatchAsync to Reduce Latency
#41474
opened May 1, 2026 by
xiaobao520123
Loading…
3 of 4 tasks
[Refactor] Remove dead code in tests and parallel_state
kv-connector
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#41471
opened May 1, 2026 by
yewentao256
Member
Loading…
[Bugfix] Validate 'text' field in Responses API multi-modal content
bug
Something isn't working
frontend
#41470
opened May 1, 2026 by
ankrovv
Loading…
4 tasks done
[Bugfix] Detect MTP truncation at reasoning-to-tool-call boundary
bug
Something isn't working
frontend
#41467
opened May 1, 2026 by
ToastyTheBot
•
Draft
4 tasks
[Bugfix] Fix Qwen3Coder prev_tool_call_arr double-emission on parse failure
bug
Something isn't working
qwen
Related to Qwen models
tool-calling
#41466
opened May 1, 2026 by
ToastyTheBot
•
Draft
3 tasks
[BugFix][MyPy]: Module has no attribute "sched_getaffinity" [attr-defined]
bug
Something isn't working
cpu
Related to CPU backends
#41465
opened May 1, 2026 by
hickeyma
Contributor
Loading…
Fix DeepSeek-OCR for Transformers v4
deepseek
Related to DeepSeek models
#41460
opened May 1, 2026 by
hmellor
Member
Loading…
fix(frontend): Add multimodal placeholders to Gemma4 tool message template
documentation
Improvements or additions to documentation
#41459
opened May 1, 2026 by
harshaljanjani
Loading…
4 of 5 tasks
[Perf] Fuse Qwen3.5 GDN in_proj_ba into 6-way in_proj MergedColumnParallelLinear
qwen
Related to Qwen models
#41457
opened May 1, 2026 by
jhsmith409
Contributor
Loading…
5 tasks done
[Bugfix] Reject unsupported FlexAttention head sizes
bug
Something isn't working
v1
#41454
opened May 1, 2026 by
bugkeep
Loading…
[ROCm][Deepseekv4] DeepseekV4 Mi300 support
ci/build
deepseek
Related to DeepSeek models
documentation
Improvements or additions to documentation
rocm
Related to AMD ROCm
v1
#41451
opened May 1, 2026 by
ganyi1996ppo
Contributor
•
Draft
4 tasks
fix: handle missing parent modules in _has_module
#41449
opened May 1, 2026 by
ShuhaoZhangTony
Loading…
[Kernel][AMD] Optimize GatedDeltaNet FLA prefill kernels on MI300X
rocm
Related to AMD ROCm
#41446
opened May 1, 2026 by
zobinHuang
Loading…
Revert "[MoE] Make MoERunnerInterface a PluggableLayer for OOT support" (#35178)
documentation
Improvements or additions to documentation
#41440
opened May 1, 2026 by
vllm-agent
•
Draft
Revert "Fix Cohere ASR after HF upgrade" (#40582)
multi-modality
Related to multi-modality (#4194)
#41439
opened May 1, 2026 by
vllm-agent
•
Draft
Previous Next
ProTip!
Updated in the last three days: updated:>2026-04-28.