vllm-tpu

A high-throughput and memory-efficient inference and serving engine for LLMs

Installation

In a virtualenv (see these instructions if you need to create one):

pip3 install vllm-tpu

Releases

Version Released Bullseye
Python 3.9
Bookworm
Python 3.11
Files
0.9.4.dev0 pre-release 2025-05-28
0.9.3 2025-05-27

Issues with this package?

Page last updated 2025-05-28 16:45:50 UTC