py-data-juicer
Data Processing for and with Foundation Models.
Installation
In a virtualenv (see these instructions if you need to create one):
pip3 install py-data-juicer
Dependencies
- lz4
- pandas
- regex
- datasets
- fastapi
- resampy
- seaborn
- spacy
- jsonlines
- samplerate
- pdfplumber
- tomli
- pylance
- tqdm
- mwparserfromhell
- emoji
- wget
- dill
- matplotlib
- httpx
- zstandard
- psutil
- mcp
- tomli-w
- gitpython
- plotly
- pydantic
- pillow
- numpy
- tabulate
- fsspec
- av
- jsonargparse
- multiprocess
- requests
- librosa
- uv
- loguru
- bs4
- wordcloud
- python-docx
- streamlit
Releases
Issues with this package?
- Search issues for this package
- Package or version missing? Open a new issue
- Something else? Open a new issue