py-data-juicer
Data Processing for and with Foundation Models.
Installation
In a virtualenv (see these instructions if you need to create one):
pip3 install py-data-juicer
Dependencies
- jsonargparse
- wget
- pdfplumber
- resampy
- pylance
- uv
- httpx
- pydantic
- mwparserfromhell
- dill
- matplotlib
- spacy
- pandas
- jsonlines
- samplerate
- pillow
- gitpython
- fsspec
- mcp
- regex
- tomli
- loguru
- bs4
- psutil
- python-docx
- requests
- seaborn
- numpy
- datasets
- tqdm
- emoji
- tabulate
- fastapi
- plotly
- streamlit
- lz4
- av
- multiprocess
- librosa
- wordcloud
- tomli-w
- zstandard
Releases
Issues with this package?
- Search issues for this package
- Package or version missing? Open a new issue
- Something else? Open a new issue