Open
Description
🚀 The feature, motivation and pitch
Motivation:
To reduce the accuracy drop when using FP4/ FP8. It should retain 99% of FP16 performance.
Evaluation timeline: End of 8 June 2025
Deadline: Mid July 2025
Tasks:
Find out how much things that we need to modify in vLLM to support this KVCache
Find out if to support this new KVCache quantization, is there any custom ops from the original repo is needed.
References:
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.