py-data-juicer
Data Processing for and with Foundation Models.
Installation
In a virtualenv (see these instructions if you need to create one):
pip3 install py-data-juicer
Dependencies
- requests
- wordcloud
- jsonlines
- zstandard
- matplotlib
- fastapi
- bs4
- emoji
- loguru
- plotly
- mwparserfromhell
- resampy
- regex
- datasets
- psutil
- gitpython
- uv
- wget
- pydantic
- tqdm
- tabulate
- lz4
- fsspec
- jsonargparse
- librosa
- samplerate
- httpx
- dill
- multiprocess
- pandas
- tomli-w
- tomli
- spacy
- pillow
- seaborn
- pylance
- numpy
- python-docx
- av
- mcp
- pdfplumber
- streamlit
Releases
Issues with this package?
- Search issues for this package
- Package or version missing? Open a new issue
- Something else? Open a new issue