llm-benchmark-toolkit

Benchmark LLMs with 10 benchmarks & 132K+ questions. 8 providers: OpenAI, Anthropic, Groq, Together, Fireworks, DeepSeek, Ollama, HuggingFace. Unified CLI + Web dashboard.

Installation

In a virtualenv (see these instructions if you need to create one):

pip3 install llm-benchmark-toolkit

Dependencies

Releases

Version Released Bullseye
Python 3.9
Bookworm
Python 3.11
Trixie
Python 3.13
Files
2.4.2 2025-12-05      
2.4.1 2025-12-05      
2.4.0 2025-12-05      
2.3.2 2025-12-04      
2.3.1 2025-12-04      
2.3.0 2025-12-03      
2.2.1 2025-12-02      
2.2.0 2025-12-02      
2.1.0 2025-12-02      
2.0.0 2025-12-01      
0.4.1 2025-12-01      
0.4.0 2025-12-01
0.3.2 2025-12-01
0.3.1 2025-11-30      
0.3.0 2025-11-30      

Issues with this package?

Page last updated 2026-05-13 02:30:51 UTC