py-data-juicer
Data Processing for and with Foundation Models.
Installation
In a virtualenv (see these instructions if you need to create one):
pip3 install py-data-juicer
Dependencies
- uv
- av
- numpy
- tqdm
- bs4
- streamlit
- seaborn
- pdfplumber
- matplotlib
- datasets
- gitpython
- pydantic
- requests
- samplerate
- pandas
- librosa
- psutil
- spacy
- tomli-w
- zstandard
- jsonargparse
- dill
- fastapi
- resampy
- regex
- httpx
- emoji
- fsspec
- pylance
- wget
- tomli
- jsonlines
- mcp
- loguru
- plotly
- lz4
- tabulate
- wordcloud
- multiprocess
- mwparserfromhell
- python-docx
- pillow
Releases
Issues with this package?
- Search issues for this package
- Package or version missing? Open a new issue
- Something else? Open a new issue