vllm-project / vllm Public

Sponsor vllm-project/vllm

GitHub Sponsors
Learn more about Sponsors

vllm-project

vllm-project

Sponsor

External links

opencollective.com/vllm

Learn more about funding links in repositories.

Report abuse
Notifications
Fork 7.2k
Star 46.4k

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: vllm-project/vllm

Labels 47 Milestones 1

New pull request New

605 Open 8,003 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[Misc] add get kv cache token capacity frontend v1

#17538 opened May 1, 2025 by lengrongfu • Changes requested

[Model] Refactor Ovis2 to support original tokenizer documentation

Improvements or additions to documentation

ready

ONLY add when PR is ready to merge/full CI is needed

#17537 opened May 1, 2025 by Isotr0py • Review required

[CI/Build] Remove awscli dependency ci/build ready

ONLY add when PR is ready to merge/full CI is needed

#17532 opened May 1, 2025 by DarkLight1337 • Review required

Move the last arguments in arg_utils.py to be in their final groups ready

ONLY add when PR is ready to merge/full CI is needed

#17531 opened May 1, 2025 by hmellor

Loading…

[FEAT][ROCm]: Support AITER MLA on V1 Engine ci/build rocm

Related to AMD ROCm

#17523 opened May 1, 2025 by vllmellm

Loading…

[prototype] prioritized block soft pinning/evictions documentation

Improvements or additions to documentation

frontend v1

#17520 opened May 1, 2025 by simon-mo • Draft

[V1] Add num_cached_tokens stats for request output ready

ONLY add when PR is ready to merge/full CI is needed

#17519 opened May 1, 2025 by simon-mo

Loading…

[Bugfix][Model] vllm-v0 engine run eagle algo with qwen2.5 model, KeyError: 'norm.weight' bugfix

#17518 opened May 1, 2025 by Greatpanc

Loading…

[Bugfix][V1][Spec Dec] Add generator to request even when no seed is provided. speculative-decoding v1

#17509 opened May 1, 2025 by luyuzhe111

Loading…

[Frontend][TPU] Add TPU default max-num-batched-tokens based on device name

#17508 opened May 1, 2025 by Chenyaaang

Loading…

[BugFix] Qwen3 tool calling failed using qwen3 reasoning parser. documentation

Improvements or additions to documentation

frontend tool-calling

#17506 opened Apr 30, 2025 by Xu-Wenqing

Loading…

[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE3 ready

ONLY add when PR is ready to merge/full CI is needed

#17504 opened Apr 30, 2025 by zixi-qi

Loading…

[RFC][core][V1] generalize structured output manager and backends structured-output tpu

Related to Google TPUs

#17503 opened Apr 30, 2025 by william-baker-inflection

Loading…

[Chore] Ignore Ruff warning on E501

#17502 opened Apr 30, 2025 by aarnphm

Loading…

[Model] Add GraniteMoeHybrid 4.0 model

#17497 opened Apr 30, 2025 by s3woz

Loading…

[TPU] Add kernel test for moe_pallas ci/build ready

ONLY add when PR is ready to merge/full CI is needed

tpu

Related to Google TPUs

#17496 opened Apr 30, 2025 by mgoin

Loading…

[Bugfix] Adding maxnreg to lora expand/shrink kernel definition

#17492 opened Apr 30, 2025 by ozziemoreno

Loading…

Fix arg checking for GGUF/Quark/GPTQMarlin quantized MoE methods bug

Something isn't working

ready

ONLY add when PR is ready to merge/full CI is needed

#17491 opened Apr 30, 2025 by mgoin

Loading…

[V1] Add VLLM_ALLOW_INSECURE_SERIALIZATION env var v1

#17490 opened Apr 30, 2025 by russellb

Loading…

Add full API docs and improve the UX of navigating them ci/build documentation

Improvements or additions to documentation

frontend multi-modality

Related to multi-modality (#4194)

#17485 opened Apr 30, 2025 by hmellor

Loading…

[Attention] MLA move o_proj q_proj into cuda-graph region v1

#17484 opened Apr 30, 2025 by LucasWilkinson

Loading…

[v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders tpu

Related to Google TPUs

#17483 opened Apr 30, 2025 by heheda12345

Loading…

[v1] Introduce KVCacheBlocks as interface between Scheduler and KVCacheManager v1

#17479 opened Apr 30, 2025 by heheda12345

Loading…

[v1] Move block management logic from KVCacheManager to SpecializedManager v1

#17474 opened Apr 30, 2025 by heheda12345

Loading…

fix: restore http metrics for V0 engine v1

#17471 opened Apr 30, 2025 by davidxia • Draft

Previous 1 2 3 4 5 … 24 25 Next

Previous Next

ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors