pdfmux

Self-healing PDF extraction for RAG. Per-page confidence scoring, auto re-extracts bad pages, MCP server, LangChain/LlamaIndex loaders. LlamaParse alternative, #2 on opendataloader-bench.

Installation

In a virtualenv (see these instructions if you need to create one):

pip3 install pdfmux

Dependencies

Releases

Version Released Bullseye
Python 3.9
Bookworm
Python 3.11
Trixie
Python 3.13
Files
1.7.0 2026-05-22      
1.6.4 2026-05-05      
1.6.3 2026-05-02      
1.6.2 2026-05-01      
1.6.1 2026-05-01      
1.6.0 2026-04-30      
1.5.2 2026-04-25      
1.5.1 2026-04-16      
1.5.0 2026-04-06      
1.4.0 2026-04-04      
1.3.0 2026-03-21      
1.2.0 2026-03-18      
1.1.0 2026-03-12      
1.0.1 2026-03-05      
1.0.0 2026-03-05      
0.4.0 2026-03-04      
0.2.2 2026-03-04      
0.2.1 2026-03-04      
0.2.0 2026-03-03      

Issues with this package?

Page last updated 2026-05-22 09:40:00 UTC