llm-data-converter

Best open-source document to markdown converter for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, …

Installation

In a virtualenv (see these instructions if you need to create one):

pip3 install llm-data-converter

Releases

Version Released Bullseye
Python 3.9
Bookworm
Python 3.11
Files
2.2.0 2025-07-25
2.1.7 2025-07-23
2.1.6 2025-07-21
2.1.5 2025-07-21
2.1.3 2025-07-17
2.1.2 2025-07-16
2.1.1 2025-07-16
2.1.0 2025-07-16
2.0.7 2025-07-15
2.0.6 2025-07-15
2.0.5 2025-07-15
2.0.4 2025-07-15
2.0.3 2025-07-15
2.0.2 2025-07-15
2.0.1 2025-07-15
2.0.0 2025-07-15
0.4.1 2025-07-14
0.4.0 2025-07-14
0.2.3 2025-07-14
0.2.2 2025-07-09
0.2.1 2025-07-09
0.2.0 2025-07-09
0.1.0 2025-07-09

Issues with this package?

Page last updated 2025-07-25 13:38:11 UTC