-
-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Misc] add get kv cache token capacity
frontend
v1
#17538
opened May 1, 2025 by
lengrongfu
•
Changes requested
[Model] Refactor Ovis2 to support original tokenizer
documentation
Improvements or additions to documentation
ready
ONLY add when PR is ready to merge/full CI is needed
#17537
opened May 1, 2025 by
Isotr0py
•
Review required
[CI/Build] Remove ONLY add when PR is ready to merge/full CI is needed
awscli
dependency
ci/build
ready
#17532
opened May 1, 2025 by
DarkLight1337
•
Review required
Move the last arguments in ONLY add when PR is ready to merge/full CI is needed
arg_utils.py
to be in their final groups
ready
#17531
opened May 1, 2025 by
hmellor
Loading…
[prototype] prioritized block soft pinning/evictions
documentation
Improvements or additions to documentation
frontend
v1
[V1] Add num_cached_tokens stats for request output
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#17519
opened May 1, 2025 by
simon-mo
Loading…
[Bugfix][Model] vllm-v0 engine run eagle algo with qwen2.5 model, KeyError: 'norm.weight' bugfix
#17518
opened May 1, 2025 by
Greatpanc
Loading…
[Bugfix][V1][Spec Dec] Add generator to request even when no seed is provided.
speculative-decoding
v1
#17509
opened May 1, 2025 by
luyuzhe111
Loading…
[Frontend][TPU] Add TPU default max-num-batched-tokens based on device name
#17508
opened May 1, 2025 by
Chenyaaang
Loading…
[BugFix] Qwen3 tool calling failed using qwen3 reasoning parser.
documentation
Improvements or additions to documentation
frontend
tool-calling
#17506
opened Apr 30, 2025 by
Xu-Wenqing
Loading…
[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE3
ready
ONLY add when PR is ready to merge/full CI is needed
v1
#17504
opened Apr 30, 2025 by
zixi-qi
Loading…
[RFC][core][V1] generalize structured output manager and backends
structured-output
tpu
Related to Google TPUs
v1
#17503
opened Apr 30, 2025 by
william-baker-inflection
Loading…
[Bugfix] Adding maxnreg to lora expand/shrink kernel definition
#17492
opened Apr 30, 2025 by
ozziemoreno
Loading…
Fix arg checking for GGUF/Quark/GPTQMarlin quantized MoE methods
bug
Something isn't working
ready
ONLY add when PR is ready to merge/full CI is needed
#17491
opened Apr 30, 2025 by
mgoin
Loading…
[V1] Add VLLM_ALLOW_INSECURE_SERIALIZATION env var
v1
#17490
opened Apr 30, 2025 by
russellb
Loading…
Add full API docs and improve the UX of navigating them
ci/build
documentation
Improvements or additions to documentation
frontend
multi-modality
Related to multi-modality (#4194)
v1
#17485
opened Apr 30, 2025 by
hmellor
Loading…
[Attention] MLA move o_proj q_proj into cuda-graph region
v1
#17484
opened Apr 30, 2025 by
LucasWilkinson
Loading…
[v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders
tpu
Related to Google TPUs
v1
#17483
opened Apr 30, 2025 by
heheda12345
Loading…
[v1] Introduce KVCacheBlocks as interface between Scheduler and KVCacheManager
v1
#17479
opened Apr 30, 2025 by
heheda12345
Loading…
[v1] Move block management logic from KVCacheManager to SpecializedManager
v1
#17474
opened Apr 30, 2025 by
heheda12345
Loading…
Previous Next
ProTip!
Find all pull requests that aren't related to any open issues with -linked:issue.