llm-benchmark-toolkit
Benchmark LLMs with 10 benchmarks & 132K+ questions. 8 providers: OpenAI, Anthropic, Groq, Together, Fireworks, DeepSeek, Ollama, HuggingFace. Unified CLI + Web dashboard.
Installation
In a virtualenv (see these instructions if you need to create one):
pip3 install llm-benchmark-toolkit
Dependencies
Releases
Issues with this package?
- Search issues for this package
- Package or version missing? Open a new issue
- Something else? Open a new issue