-
-
Notifications
You must be signed in to change notification settings - Fork 12.1k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Quantization] enable compressed-tensors marlin support for turing (2)
#31008
opened Dec 19, 2025 by
jinzhen-lin
Loading…
[Qwen3-Omni] fixed _get_feat_extract_output_lengths function
qwen
Related to Qwen models
#31007
opened Dec 19, 2025 by
wangxiongts
Loading…
5 tasks
[Bugfix] Fix Ray GPU availability warning message
v1
#31006
opened Dec 19, 2025 by
jarieshan
Loading…
5 tasks
feat(kernel): patch fused_gdn_gating
qwen
Related to Qwen models
#31002
opened Dec 19, 2025 by
OsirisDuan
Loading…
5 tasks
[Perf][ROCm][AWQ] Improve performance of fused MoE GPTQ-AWQ and AWQ dequant kernels
rocm
Related to AMD ROCm
#30998
opened Dec 19, 2025 by
yuttian1
Loading…
Add Molmo2 multimodal model support
documentation
Improvements or additions to documentation
multi-modality
Related to multi-modality (#4194)
new-model
Requests to new models
#30997
opened Dec 19, 2025 by
sangho-vision
Loading…
4 of 5 tasks
[WIP]improve cpu Benchmark Suite tests for 0.12.0
ci/build
cpu
Related to CPU backends
performance
Performance-related issues
#30994
opened Dec 19, 2025 by
louie-tsai
Loading…
5 tasks
Bump Flashinfer to v0.6.0rc1
ci/build
nvidia
#30993
opened Dec 18, 2025 by
elvischenv
Loading…
5 tasks
[Misc] Remove deprecated metric vllm:time_per_output_token_seconds for v0.13 release
v1
#30992
opened Dec 18, 2025 by
jliu9515
Loading…
3 of 5 tasks
[ROCm][CI/Build] Update ROCm dockerfiles
ci/build
rocm
Related to AMD ROCm
#30991
opened Dec 18, 2025 by
gshtras
Loading…
[MoE Refactor][3/N] Deprecate cutlass block quant fp8 (b200)
nvidia
ready
ONLY add when PR is ready to merge/full CI is needed
#30990
opened Dec 18, 2025 by
robertgshaw2-redhat
Loading…
5 tasks
Eagle 3 fix sometimes when you don't set architectures you get "Model architectures ['EagleLlamaModel'] are not supported for now.
llama
Related to Llama models
new-model
Requests to new models
#30987
opened Dec 18, 2025 by
aidando73
Loading…
5 tasks
Grid construction based on num_active_lora and support CUDA graph capture across various num_active_lora
nvidia
v1
#30984
opened Dec 18, 2025 by
yugong333
Loading…
5 tasks
Update Pytorch version update docs
documentation
Improvements or additions to documentation
#30982
opened Dec 18, 2025 by
atalman
Loading…
[Do not merge][Async] Asynchronous DP coordination
v1
#30980
opened Dec 18, 2025 by
MatthewBonanni
•
Draft
3 of 5 tasks
[MoE Refactor] Add mk for cutlass fp8 block
nvidia
v1
#30979
opened Dec 18, 2025 by
robertgshaw2-redhat
•
Draft
Add positional embedding and kv_cache fusion for llama and gpt-oss
gpt-oss
Related to GPT-OSS models
llama
Related to Llama models
v1
#30978
opened Dec 18, 2025 by
dllehr-amd
•
Draft
5 tasks
Docs: add OpenAI SDK example for Qwen2.5-VL classification
documentation
Improvements or additions to documentation
qwen
Related to Qwen models
#30977
opened Dec 18, 2025 by
Dhruv-80
Loading…
[Misc] Disable default Performance-related issues
--ready-check-timeout-sec extra call in vllm bench
performance
#30975
opened Dec 18, 2025 by
NickLucche
Loading…
[Bugfix] Fix incorrect tiles creation for mm prefix triton attention
ready
ONLY add when PR is ready to merge/full CI is needed
#30974
opened Dec 18, 2025 by
Isotr0py
Loading…
5 tasks
Previous Next
ProTip!
Adding no:label will show everything without a label.