llama-optimus
llama-optimus is a lightweight Python tool to automatically optimize llama.cpp performance flags for maximum tg & pp token/s throughput. Powered by Bayesian optimization with Optuna
Installation
In a virtualenv (see these instructions if you need to create one):
pip3 install llama-optimus
Dependencies
Releases
Issues with this package?
- Search issues for this package
- Package or version missing? Open a new issue
- Something else? Open a new issue