Metadata-Version: 2.4
Name: CASSIA
Version: 1.3.9
Summary: CASSIA (Cell type Annotation using Specialized System with Integrated AI) is a Python package for automated cell type annotation in single-cell RNA sequencing data using large language models.
Home-page: https://github.com/elliotxe/CASSIA
Author: Elliot Yixuan Xie
Author-email: xie227@wisc.edu
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.21.0
Requires-Dist: pandas>=1.3.0
Requires-Dist: openai>=1.0.0
Requires-Dist: anthropic>=0.3.0
Requires-Dist: requests>=2.25.0
Requires-Dist: matplotlib>=3.3.0
Requires-Dist: seaborn>=0.11.0
Requires-Dist: mygene>=3.2.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# CASSIA

**CASSIA** (Collaborative Agent System for Single-cell Interpretable Annotation) is a Python and R package designed for **automated, accurate, and interpretable single-cell RNA-seq cell type annotation** using a modular **multi-agent LLM framework**.

📖 [Read our paper in Nature Communications](https://doi.org/10.1038/s41467-025-67084-x)

## Highlights

- 🔬 **Reference-free and interpretable** LLM-based cell type annotation
- 🧠 Multi-agent architecture with dedicated agents for annotation, validation, formatting, quality scoring, and reporting
- 📈 **Quality scores (0–100)** and optional consensus scoring to quantify annotation reliability
- 📊 Detailed **HTML reports** with reasoning and marker validation
- 💬 Supports OpenAI, Anthropic, OpenRouter, DeepSeek, and any OpenAI-compatible API (including local LLMs)
- 🧬 Compatible with markers from Seurat (`FindAllMarkers`) and Scanpy (`tl.rank_genes_groups`)
- 🚀 Optional agents: Annotation Boost, Subclustering, RAG (retrieval-augmented generation), Uncertainty Quantification
- 🌎 Cross-species annotation capabilities, validated across human, mouse, and non-model organisms
- 🧪 Web UI also available: [cassia.bio](https://www.cassia.bio/)

## Installation

```bash
pip install CASSIA
```

To enable optional RAG functionality:

```bash
pip install CASSIA_rag
```

**Note**: For R users, see the R package on [GitHub](https://github.com/ElliotXie/CASSIA-SingleCell-LLM-Annotation).

## Set Up API Key

**You only need one API key to use CASSIA.** We recommend OpenRouter since it provides access to most models (OpenAI, Anthropic, Google, etc.) through a single API key.

```python
import CASSIA

# For OpenRouter (recommended — access all models with one key)
CASSIA.set_api_key("your_openrouter_api_key", provider="openrouter")

# For OpenAI
CASSIA.set_api_key("your_openai_api_key", provider="openai")

# For Anthropic
CASSIA.set_api_key("your_anthropic_api_key", provider="anthropic")

# For custom OpenAI-compatible APIs (e.g., DeepSeek)
CASSIA.set_api_key("your_deepseek_api_key", provider="https://api.deepseek.com")
```

## Quick Start

```python
import CASSIA

# Load example marker data
unprocessed_markers = CASSIA.load_example_markers(processed=False)

# Run the full CASSIA pipeline (annotation + scoring + boost + report)
CASSIA.runCASSIA_pipeline(
    output_file_name="MyAnalysis",
    tissue="large intestine",
    species="human",
    marker=unprocessed_markers,
    max_workers=4,
    overall_provider="openrouter",
    annotation_model="anthropic/claude-sonnet-4.6",
    score_model="anthropic/claude-sonnet-4.6",
    score_threshold=75
)
```

> **Quick annotation only?** Use `CASSIA.runCASSIA_batch()` for fast batch annotation without scoring or boosting.

## CLI Quick Start

The Python package installs a `cassia` command. In addition to API providers, it
can call local agent CLIs such as Claude Code, Codex CLI, Cursor Agent, or any
custom shell command.

```bash
cassia doctor
cassia backends list
cassia examples --out cassia_example
cassia validate markers.csv

cassia annotate \
  --input markers.csv \
  --backend codex-cli \
  --tissue brain \
  --species human \
  --out runs/brain_codex

cassia boost query \
  --markers raw_findallmarkers.csv \
  --cluster 3 \
  --genes CD3D,CD3E,TRAC

cassia boost run \
  --run runs/brain_codex \
  --markers raw_findallmarkers.csv \
  --cluster 3 \
  --backend codex-cli

cassia boost auto \
  --run runs/brain_codex \
  --markers raw_findallmarkers.csv \
  --backend codex-cli \
  --max-clusters 5

cassia subcluster run \
  --markers cd8_subcluster_markers.csv \
  --major-cluster-info "CD8 T cell in human tumor" \
  --backend codex-cli \
  --out runs/cd8_subcluster

cassia consensus \
  --inputs runs/brain_codex/summary.csv runs/brain_claude/summary.csv \
  --out runs/brain_consensus.csv
```

Agent CLI backends reuse the local tool's own authentication, so they do not
require CASSIA API keys. `cassia examples` creates a runnable mini project with
marker tables, consensus inputs, shell scripts, and an offline toy agent.
`cassia validate` checks marker CSV structure, inferred columns, ranking columns,
and prepared marker counts before running annotation. `cassia boost auto`
automatically prioritizes low-confidence, mixed, or ambiguous clusters and writes
aggregate CSV/HTML reports under `RUN/boost/_auto`. `cassia subcluster run`
annotates subclusters inside one parent cluster from a subcluster marker table
and writes CSV/HTML reports in the requested output directory. `cassia consensus`
deterministically votes across multiple CASSIA summary/subcluster CSVs and writes
CSV/HTML consensus reports without calling an LLM.

## Supported Models

You can choose any model for annotation and scoring. CASSIA also supports custom providers and local open-source models.

| Provider | Model | Notes |
|----------|-------|-------|
| OpenRouter | `anthropic/claude-sonnet-4.6` | Best-performing (Recommended) |
| OpenRouter | `openai/gpt-5.4` | Best-performing |
| OpenRouter | `google/gemini-3-flash-preview` | Best low-cost option |
| OpenRouter | `x-ai/grok-4.20-beta` | Best low-cost option |
| OpenAI | `gpt-5.4` | Balanced option |
| Anthropic | `claude-sonnet-4-6` | Latest best-performing |
| DeepSeek | `deepseek-chat` | Very affordable |
| Local | Any Ollama model | Zero cost, full privacy |

## Documentation

📚 [Complete Documentation & Vignettes](https://docs.cassia.bio/en)

🤖 [LLMs Annotation Benchmark](https://sc-llm-benchmark.com/methods/cassia)

## Citation

Xie, E., Cheng, L., Shireman, J. et al. CASSIA: a multi-agent large language model for automated and interpretable cell annotation. *Nat Commun* (2025). https://doi.org/10.1038/s41467-025-67084-x

## Contributing

We welcome contributions! Please submit pull requests or open issues via [GitHub](https://github.com/ElliotXie/CASSIA/issues).

## License

MIT License © 2025 Elliot Xie and contributors.

## Support

Open an issue on [GitHub](https://github.com/ElliotXie/CASSIA/issues) or email **xie227@wisc.edu** for help.
