py-data-juicer
Data Processing for and with Foundation Models.
Installation
In a virtualenv (see these instructions if you need to create one):
pip3 install py-data-juicer
Dependencies
- gitpython
- resampy
- uv
- matplotlib
- emoji
- librosa
- pylance
- mwparserfromhell
- plotly
- pandas
- seaborn
- tomli-w
- python-docx
- jsonlines
- samplerate
- zstandard
- loguru
- multiprocess
- regex
- tomli
- bs4
- dill
- psutil
- fastapi
- tqdm
- pdfplumber
- streamlit
- fsspec
- pillow
- lz4
- requests
- datasets
- mcp
- spacy
- httpx
- jsonargparse
- pydantic
- tabulate
- wordcloud
- numpy
- av
- wget
Releases
Issues with this package?
- Search issues for this package
- Package or version missing? Open a new issue
- Something else? Open a new issue