piwheels - llm-benchmark-toolkit

llm-benchmark-toolkit

Benchmark LLMs with 10 benchmarks & 132K+ questions. 8 providers: OpenAI, Anthropic, Groq, Together, Fireworks, DeepSeek, Ollama, HuggingFace. Unified CLI + Web dashboard.

Installation

In a virtualenv (see these instructions if you need to create one):

pip3 install llm-benchmark-toolkit

Dependencies

PyPI page: pypi.org/project/llm-benchmark-toolkit

Project JSON: piwheels.org/project/llm-benchmark-toolkit/json

Versions: 15

Files: 13

Downloads (all time): loading...

Downloads (last 30 days): loading...

Releases

Version	Released	Bullseye Python 3.9	Bookworm Python 3.11	Trixie Python 3.13	Files
2.4.2	2025-12-05
llm_benchmark_toolkit-2.4.2-py3-none-any.whl (357 KB)
2.4.1	2025-12-05
llm_benchmark_toolkit-2.4.1-py3-none-any.whl (357 KB)
2.4.0	2025-12-05
llm_benchmark_toolkit-2.4.0-py3-none-any.whl (340 KB)
2.3.2	2025-12-04
llm_benchmark_toolkit-2.3.2-py3-none-any.whl (320 KB)
2.3.1	2025-12-04
llm_benchmark_toolkit-2.3.1-py3-none-any.whl (319 KB)
2.3.0	2025-12-03
llm_benchmark_toolkit-2.3.0-py3-none-any.whl (319 KB)
2.2.1	2025-12-02
llm_benchmark_toolkit-2.2.1-py3-none-any.whl (287 KB)
2.2.0	2025-12-02
llm_benchmark_toolkit-2.2.0-py3-none-any.whl (287 KB)
2.1.0	2025-12-02
llm_benchmark_toolkit-2.1.0-py3-none-any.whl (72 KB)
2.0.0	2025-12-01
llm_benchmark_toolkit-2.0.0-py3-none-any.whl (64 KB)
0.4.1	2025-12-01
llm_benchmark_toolkit-0.4.1-py3-none-any.whl (44 KB)
0.4.0	2025-12-01

0.3.2	2025-12-01

0.3.1	2025-11-30
llm_benchmark_toolkit-0.3.1-py3-none-any.whl (44 KB)
0.3.0	2025-11-30
llm_benchmark_toolkit-0.3.0-py3-none-any.whl (44 KB)

Issues with this package?

Search issues for this package
Package or version missing? Open a new issue
Something else? Open a new issue

Page last updated 2026-05-13 02:30:51 UTC

	Build succeeded
	Build failed
	Build skipped
	Build pending

llm-benchmark-toolkit

Installation

Dependencies

Releases

Issues with this package?

Key