turboquant-vllm
TurboQuant KV cache compression for vLLM — fused Triton kernels, 3.76x compression, 3.7x faster decode on RTX 4090
Installation
In a virtualenv (see these instructions if you need to create one):
pip3 install turboquant-vllm
Dependencies
Releases
Issues with this package?
- Search issues for this package
- Package or version missing? Open a new issue
- Something else? Open a new issue