Metadata-Version: 2.4
Name: zettabrain-rag
Version: 0.5.26
Summary: Private AI document assistant — local RAG pipeline with web GUI. Zero cloud. Supports local, NFS, SMB and object storage.
Author-email: Olajide <olajide@zettabrain.io>
License: MIT
Project-URL: Homepage, https://github.com/zettabrain/zettabrain-rag
Project-URL: Repository, https://github.com/zettabrain/zettabrain-rag
Project-URL: Issues, https://github.com/zettabrain/zettabrain-rag/issues
Keywords: rag,llm,langchain,ollama,chromadb,local-ai,private-ai,nfs,smb,object-storage,gui,private,self-hosted
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: langchain>=0.2.0
Requires-Dist: langchain-community>=0.2.0
Requires-Dist: langchain-ollama>=0.1.0
Requires-Dist: langchain-chroma>=0.1.0
Requires-Dist: langchain-core>=0.2.0
Requires-Dist: langchain-text-splitters>=0.2.0
Requires-Dist: chromadb>=0.5.0
Requires-Dist: pypdf>=4.0.0
Requires-Dist: pymupdf>=1.23.0
Requires-Dist: python-docx>=1.1.0
Requires-Dist: docx2txt>=0.8
Requires-Dist: requests>=2.31.0
Requires-Dist: fastapi>=0.111.0
Requires-Dist: uvicorn[standard]>=0.29.0
Requires-Dist: websockets>=12.0
Requires-Dist: rank-bm25>=0.2.0
Requires-Dist: flashrank>=0.2.7
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: black>=24.0; extra == "dev"
Requires-Dist: ruff>=0.4.0; extra == "dev"
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"

# ZettaBrain RAG

**Private AI document assistant — your documents, your hardware, zero cloud.**

<p align="center">
  <img src="demo/hero.gif" alt="ZettaBrain demo — install, setup, ingest, chat" width="800">
</p>

Chat with your documents using a fully local AI. No API keys. No data leaving your machine. Runs on your own server or laptop with a secure HTTPS web GUI. Supports local disk, NFS, SMB and object storage.

---

## Quick Install

```bash
curl -fsSL https://zettabrain.app/install.sh | sudo bash
```

Alternative mirror:

```bash
curl -fsSL https://install.zettabrain.io | sudo bash
```

What the installer does:
- Detects your OS (Ubuntu, Debian, Amazon Linux, RHEL, Fedora)
- Installs Python 3.9+ and system dependencies
- Installs `zettabrain-rag` via **pipx** (isolated, no virtualenv management needed)
- Installs and starts Ollama
- Pulls the `nomic-embed-text` embedding model (~275 MB)

---

## Install via pipx (developers)

```bash
# Install pipx if you don't have it
apt install -y pipx          # Ubuntu / Debian
brew install pipx            # macOS

# Install ZettaBrain
pipx install zettabrain-rag

# Verify
zettabrain --version
```

---

## First-time setup

### 1. Run setup wizard

```bash
sudo zettabrain-setup
```

Configures storage (Local / NFS / SMB), selects an LLM model based on your hardware, and enables HTTPS.

### 2. Launch the web GUI

```bash
zettabrain-server
```

Open **https://local.zettabrain.app:7860** in your browser — trusted HTTPS, fully private.

### 3. Or use the CLI chat

```bash
zettabrain-chat
```

---

## Commands

| Command | Description |
|---|---|
| `sudo zettabrain-setup` | Storage wizard + model selection + TLS cert |
| `zettabrain-server` | Launch secure HTTPS web GUI (port 7860) |
| `zettabrain-chat` | Interactive RAG chat in the terminal |
| `zettabrain-chat --rebuild` | Rebuild vector store then start chat |
| `zettabrain-chat --debug` | Show retrieved chunks on every query |
| `zettabrain-ingest` | Ingest documents into the vector store |
| `zettabrain-ingest --folder /path` | Ingest a specific folder |
| `zettabrain-ingest --file /path/doc.pdf` | Ingest a single file |
| `zettabrain-ingest --stats` | Show what is in the vector store |
| `zettabrain-ingest --clear` | Wipe the vector store |
| `zettabrain-status` | Show install paths, cert info, and store statistics |
| `sudo zettabrain-storage add` | Add a new storage source after initial setup |
| `zettabrain-storage list` | List configured storage sources |

### CLI chat commands

While inside `zettabrain-chat`:

| Type | Action |
|---|---|
| Any question | Query your documents |
| `sources` | Show which document chunks were used |
| `timing` | Show retrieve / generate time for all queries this session |
| `debug on` | Show retrieved chunks on every query |
| `debug off` | Hide debug output |
| `quit` | Exit |

---

## System requirements

| | Minimum | Recommended |
|---|---|---|
| **RAM** | 4 GB | 8 GB (CPU) · 16 GB+ (GPU) |
| **CPU** | 4 cores / 2.5 GHz | 8 cores / 3.0 GHz |
| **Disk** | 10 GB free | 40 GB free |
| **OS** | See below | See below |
| **Python** | 3.9 | 3.11+ |

**Supported operating systems**

| Platform | Versions |
|---|---|
| **Ubuntu** | 20.04, 22.04, 24.04 |
| **Debian** | 11, 12 |
| **Amazon Linux** | 2, 2023 |
| **RHEL / CentOS Stream / Rocky / AlmaLinux** | 8, 9 |
| **Fedora** | 38+ |
| **Linux Mint / Pop!\_OS** | Current releases |
| **macOS** | 12 Monterey+ (via `pipx install`) |
| **Windows** | 10 / 11 via WSL2, or `pipx install` for Python components |

> **RAM depends on model:** `qwen3:0.6b` runs on 2 GB; `phi4:3.8b` (CPU default) needs ~6 GB; GPU models from `mistral:7b` upward need 8–24 GB VRAM. See the performance table above for per-model requirements.

---

## GPU & model selection

Ollama **auto-detects your GPU** on install — NVIDIA (CUDA), AMD (ROCm), and Apple Silicon (Metal). No configuration needed beyond having the correct drivers installed.

`sudo zettabrain-setup` detects your hardware and shows the right menu for your machine.

**CPU-only (no GPU detected):**

```
Hardware detected: CPU only
Recommended model: phi4:3.8b  (CPU-only: best reasoning for RAG without GPU)

  Available models (optimised for CPU):
    1) qwen3:0.6b      — instant  (~500MB)   quick lookups and routing
    2) gemma3:1b       — very fast (~815MB)  structured explanations
    3) tinyllama:1.1b  — very fast (~638MB)  basic Q&A, coherent chat
    4) phi4:3.8b       — moderate (~2.5GB)   best reasoning for RAG    ← recommended
    5) llama3.2:3b     — moderate (~2GB)     general purpose
    6) mistral:7b      — slow     (~4GB)     strong instruction (needs 12GB+ RAM)
    7) llama3.1:8b     — slow     (~5GB)     balanced quality (needs 16GB+ RAM)
    8) openhermes:7b   — slow     (~4GB)     best formatted RAG (needs 12GB+ RAM)
    9) Custom
```

**GPU detected:**

```
Hardware detected: NVIDIA GeForce RTX 3080 (10GB VRAM)
Recommended model: llama3.1:8b  (10GB VRAM: balanced quality/speed)

  Available models:
    1) phi4:3.8b         — fast on GPU    (~2.5GB)  best reasoning per GB
    2) mistral:7b        — fast on GPU    (~4GB)    strong instruction following
    3) openhermes:7b     — fast on GPU    (~4GB)    best formatted RAG responses
    4) llama3.1:8b       — fast on GPU    (~5GB)    balanced quality for most
    5) mistral-nemo:12b  — moderate       (~7GB)    better reasoning  (needs 8GB+ VRAM)
    6) qwen2.5:14b       — moderate       (~9GB)    excellent quality (needs 10GB+ VRAM)
    7) qwen2.5:32b       — slower         (~20GB)   best quality      (needs 24GB+ VRAM)
    8) Custom
```

You can switch model at any time by editing `/opt/zettabrain/src/zettabrain.env`:

```bash
ZETTABRAIN_LLM_MODEL=qwen2.5:14b
```

Then restart the server: `zettabrain-server`

### Performance reference

Timings for a real compliance query against a 10-document financial services corpus:

> **"What is the pre-clearance process for personal securities trades and how long does approval last?"**

| Model | Min RAM | Retrieve | Generate | Total |
|---|---|---|---|---|
| qwen3:0.6b | 2 GB | ~1 s | 15–40 s | ~1 min |
| phi4-mini | 6 GB | ~1 s | 120–300 s | ~2–5 min |
| llama3.2:3b | 6 GB | ~1 s | 90–180 s | ~2–3 min |
| llama3.1:8b (CPU) | 16 GB | ~1 s | 200–400 s | ~4–7 min |
| mistral:7b (GPU 5–8 GB VRAM) | 8 GB | ~1 s | 5–12 s | ~6–13 s |
| llama3.1:8b (GPU 8–10 GB VRAM) | 10 GB | ~1 s | 3–7 s | ~4–8 s |
| qwen2.5:14b (GPU 16 GB VRAM) | 20 GB | ~1 s | 4–10 s | ~5–11 s |
| Apple M2 / M3 (16 GB unified) | 16 GB | ~1 s | 10–20 s | ~11–21 s |

**Retrieve** covers: query embedding + ChromaDB MMR search + BM25 keyword search + FlashRank re-ranking.  
**Generate** depends on model size and hardware. A GPU reduces CPU generate time by 30–60×.

The web UI shows per-query timing after every response: `⚡ 938ms retrieve · 🤖 6.3s generate`.

---

## Retrieval pipeline

ZettaBrain uses a hybrid retrieval approach for accuracy:

1. **Adaptive chunking** — chunk size tuned per document type (PDF / DOCX / TXT) and text density
2. **MMR semantic search** — Maximum Marginal Relevance via ChromaDB (diversity + relevance)
3. **BM25 keyword search** — exact term matching on the same corpus
4. **Merge & deduplicate** — semantic results ranked first, duplicates removed by content hash
5. **Cross-encoder re-ranking** — FlashRank (`ms-marco-MiniLM-L-12-v2`) picks the best chunks before sending to the LLM

---

## Supported document formats

`.pdf`  `.txt`  `.md`  `.docx`

---

## Sample Test Data

Not ready to use your own documents yet? Download ready-made test datasets to evaluate ZettaBrain against realistic enterprise content.

### Available datasets

| Industry | Documents | Organisation (fictional) |
|---|---|---|
| **Financial Services** | 10 DOCX files | Apex Financial Group — trading policy, AML/KYC procedures, insider trading, risk framework, employee handbook |
| **Healthcare** | 10 DOCX files | Riverside Medical Center — HIPAA privacy & security, medication protocols, emergency response codes, clinical documentation |

### Download

| File | Size | Link |
|---|---|---|
| Financial Services documents | ~90 KB | [zettabrain-financial-test-docs.zip](https://zettabrain.io/sample-data/zettabrain-financial-test-docs.zip) |
| Healthcare documents | ~91 KB | [zettabrain-healthcare-test-docs.zip](https://zettabrain.io/sample-data/zettabrain-healthcare-test-docs.zip) |
| Test prompts guide (40 prompts) | ~7 KB | [RAG_Test_Prompts_Guide.md](https://zettabrain.io/sample-data/RAG_Test_Prompts_Guide.md) |

The prompts guide includes 20 industry-specific prompts per dataset, cross-document summary prompts, and adversarial prompts that verify ZettaBrain correctly declines to answer questions not present in the documents.

### Quick start with sample data

```bash
# Download and unzip the financial services dataset
curl -LO https://zettabrain.io/sample-data/zettabrain-financial-test-docs.zip
unzip zettabrain-financial-test-docs.zip -d ~/zettabrain-test

# Point ZettaBrain at the folder and ingest
zettabrain-ingest --folder ~/zettabrain-test/financial

# Start chatting
zettabrain-chat
```

Open the web GUI at `https://local.zettabrain.app:7860` and paste prompts from the guide directly into the chat.

### Sample prompts from the guide

**Financial Services — Apex Financial Group**

- *"What is the pre-clearance process for personal securities trades and how long does approval last?"*
- *"When do I need to file a Suspicious Activity Report and what is the deadline for filing?"*
- *"What is the maximum hotel rate I can expense in New York City?"*
- *"What happens when a risk event has a financial impact of over $10 million — who needs to be notified and how quickly?"*

**Healthcare — Riverside Medical Center**

- *"What should I do if I suspect a PHI breach — who do I contact and what is the notification timeline?"*
- *"Which medications require an independent double-check by a second nurse before administration?"*
- *"A patient received the wrong medication — what are the steps I need to take to report it?"*
- *"What are the emergency response codes and what action should staff take for each?"*

The full guide includes 20 prompts per dataset plus cross-document and adversarial prompts.

---

## Configuration

All settings can be overridden via environment variables or `/opt/zettabrain/src/zettabrain.env`:

| Variable | Default | Description |
|---|---|---|
| `ZETTABRAIN_DOCS` | `/opt/zettabrain/data` | Documents folder |
| `ZETTABRAIN_CHROMA` | `/opt/zettabrain/src/zettabrain_vectorstore` | ChromaDB path |
| `ZETTABRAIN_LLM_MODEL` | `llama3.1:8b` | Ollama LLM model |
| `ZETTABRAIN_EMBED_MODEL` | `nomic-embed-text` | Ollama embedding model |
| `ZETTABRAIN_CHUNK_SIZE` | `1000` (PDF) / `800` (TXT) | Chunk size (adaptive) |
| `ZETTABRAIN_CHUNK_OVERLAP` | `150` (PDF) / `100` (TXT) | Chunk overlap (adaptive) |
| `OLLAMA_HOST` | `http://localhost:11434` | Ollama API endpoint |

---

## Diagnostics

```bash
# Full status — version, certs, vector store stats
zettabrain-status

# Verify ChromaDB is working
python3 /opt/zettabrain/src/01_chromadb_setup.py

# Verify embedding model is working
python3 /opt/zettabrain/src/02_embeddings_test.py

# Check Ollama is running
curl http://localhost:11434

# List downloaded models
ollama list

# View server logs
journalctl -u zettabrain -f
```

---

## Uninstall

### pipx install
```bash
pipx uninstall zettabrain-rag
sudo rm -rf /opt/zettabrain
```

### One-line installer
```bash
pipx uninstall zettabrain-rag
sudo rm -rf /opt/zettabrain /var/log/zettabrain-install.log
sudo systemctl disable --now zettabrain 2>/dev/null || true
```

---

## Contributors

| | |
|---|---|
| **[@zettabrain](https://github.com/zettabrain)** | Creator & maintainer |

---

## License

MIT — © ZettaBrain
