turboquant-vllm

TurboQuant KV cache compression for vLLM — fused Triton kernels, 3.76x compression, 3.7x faster decode on RTX 4090

Installation

In a virtualenv (see these instructions if you need to create one):

pip3 install turboquant-vllm

Dependencies

Releases

Version Released Bullseye
Python 3.9
Bookworm
Python 3.11
Trixie
Python 3.13
Files
1.5.0 2026-04-08      
1.4.1 2026-04-04      
1.4.0 2026-04-01      
1.3.0 2026-03-31      
1.2.2 2026-03-30      
1.2.1 2026-03-30      
1.2.0 2026-03-29      
1.1.1 2026-03-28      
1.1.0 2026-03-27      
1.0.0 2026-03-27      
0.1.0 2026-03-27      

Issues with this package?

Page last updated 2026-04-11 04:39:30 UTC