Metadata-Version: 2.4
Name: topiary
Version: 5.15.0
Summary: Predict, filter, and rank MHC-presented peptides from any antigen source
Author: Alex Rubinsteyn, Tavi Nathanson
Project-URL: Homepage, https://github.com/openvax/topiary
Project-URL: Bug Tracker, https://github.com/openvax/topiary/issues
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy<3.0.0,>=2.0.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: mhctools>=3.14.1
Requires-Dist: varcode>=4.18.0
Requires-Dist: gtfparse>=0.0.4
Requires-Dist: mhcnames
Requires-Dist: argcomplete>=1.10
Dynamic: license-file

[![Tests](https://github.com/openvax/topiary/actions/workflows/tests.yml/badge.svg)](https://github.com/openvax/topiary/actions/workflows/tests.yml)
<a href="https://coveralls.io/github/openvax/topiary?branch=master">
    <img src="https://coveralls.io/repos/openvax/topiary/badge.svg?branch=master&service=github" alt="Coverage Status" />
</a>
<a href="https://pypi.python.org/pypi/topiary/">
    <img src="https://img.shields.io/pypi/v/topiary.svg?maxAge=1000" alt="PyPI" />
</a>

# Topiary

Topiary predicts, filters, and ranks MHC-presented peptides from any antigen source. It wraps [mhctools](https://github.com/openvax/mhctools) binding predictors (NetMHCpan, MHCflurry, etc.) with a composable filtering and ranking DSL, expression-aware prioritization, and wide/long serialization.

Applications include personalized cancer vaccine design, viral epitope mapping, and characterizing T-cell responses.

## Installation

Requires Python ≥ 3.9.

```bash
pip install topiary
```

For variant annotation, gene lookups, and `SelfProteome`, download
Ensembl reference data (any release that matches your reference genome
works — GRCh38 releases are 76+; GRCh37 is release 75):

```bash
pyensembl install --release 112 --species human
```

For cancer-testis antigen and tissue expression features:

```bash
pip install pirlygenes
```

**MHC predictors** (NetMHCpan, mhcflurry, etc.) are installed
separately — topiary calls them through
[mhctools](https://github.com/openvax/mhctools). mhcflurry installs
via pip (`pip install mhcflurry && mhcflurry-downloads fetch`);
NetMHCpan and other DTU tools require a license from
[DTU Health Tech](https://services.healthtech.dtu.dk/).

## Predicting MHC binding

The simplest use case: score a handful of peptides against one or more HLA alleles.

```python
from topiary import TopiaryPredictor
from mhctools import NetMHCpan

predictor = TopiaryPredictor(
    models=[NetMHCpan],
    alleles=["HLA-A*02:01"],
)

df = predictor.predict_from_named_peptides({
    "peptide_1": "YLQLVFGIEV",
    "peptide_2": "LLFNILGGWV",
    "peptide_3": "GILGFVFTL",
})
```

The result is a DataFrame with one row per peptide-allele-kind combination. Each row has `score` (normalized, higher = better), `value` (raw prediction, e.g. IC50 nM), and `percentile_rank` (lower = better).

To scan full-length proteins with a sliding window instead:

```python
df = predictor.predict_from_named_sequences({
    "BRAF_V600E": "MAALSGGGGG...LATEKSRWSG",
})
```

Multiple models can run together. The DataFrame will have rows for each model's prediction kinds:

```python
from mhctools import NetMHCpan, MHCflurry

predictor = TopiaryPredictor(
    models=[NetMHCpan, MHCflurry],
    alleles=["HLA-A*02:01", "HLA-B*07:02"],
)
```

## MHC prediction models

Specify one or more predictors with `--mhc-predictor` and alleles with `--mhc-alleles`:

```bash
--mhc-predictor netmhcpan mhcflurry --mhc-alleles HLA-A*02:01,HLA-B*07:02
```

All predictors come from [mhctools](https://github.com/openvax/mhctools).

With `mhctools 3.7.0+`, upstream predictor parsing supports multiple
predictors in one CLI invocation, so commands like
`--mhc-predictor netmhcpan42 bigmhc-el` are supported directly. Topiary keeps
its higher-level `--filter-by` / `--sort-by` DSL on top of that lower-level
predictor interface. Topiary's ranking/filtering DSL is also compatible with
the simplified `mhctools 3.7.0+` kind constants API. NetChop and Pepsickle
behavior follows the upstream changes as well: improved NetChop error handling
and Pepsickle's epitope-focused model selection.

| CLI name | Predicts |
|----------|----------|
| `netmhcpan` | affinity + presentation (auto-detects installed version) |
| `netmhcpan4` / `netmhcpan41` / `netmhcpan42` | specific NetMHCpan 4.x versions |
| `netmhcpan4-ba` / `netmhcpan4-el` | single-mode (binding affinity or eluted ligand) |
| `mhcflurry` | affinity + presentation + processing |
| `mixmhcpred` | presentation |
| `netmhciipan` / `netmhciipan4` / `netmhciipan43` | MHC class II |
| `bigmhc` / `bigmhc-el` / `bigmhc-im` | presentation / immunogenicity |
| `netmhcstabpan` | pMHC stability |
| `pepsickle` / `netchop` | proteasomal cleavage |
| `netmhcpan-iedb` / `netmhccons-iedb` / `smm-iedb` | IEDB web API (no local install) |
| `random` | random predictions (for testing) |

Peptide lengths: `--mhc-epitope-lengths 8,9,10,11` (defaults come from the predictor).

## Filtering and ranking

Topiary has an expression language for filtering and ranking predictions. It works identically as Python objects and as CLI strings.

### Filters

Keep only peptides that pass a threshold:

```python
from topiary import TopiaryPredictor, Affinity, Presentation
from mhctools import NetMHCpan

predictor = TopiaryPredictor(
    models=[NetMHCpan],
    alleles=["HLA-A*02:01"],
    filter_by=Affinity <= 500,                       # IC50 nM cutoff
)

# Or with presentation score:
predictor = TopiaryPredictor(
    models=[NetMHCpan],
    alleles=["HLA-A*02:01"],
    filter_by=(Affinity <= 500) | (Presentation.rank <= 2.0),  # keep if either passes
)
```

### Prediction kinds and fields

| Accessor | Aliases | What it measures |
|----------|---------|------------------|
| `Affinity` | `ba`, `aff`, `ic50` | Binding affinity (IC50 nM) |
| `Presentation` | `el` | Eluted ligand / presentation score |
| `Stability` | | pMHC complex stability |
| `Processing` | | Antigen processing / cleavage |

Each kind has three fields:

- `.value` — raw prediction (e.g. IC50 in nM). Default when no field is specified, so `Affinity <= 500` means `Affinity.value <= 500`.
- `.rank` — percentile rank (lower = better).
- `.score` — normalized score (higher = better).

### Ranking with transforms

Sort surviving peptides with `sort_by`. Transforms normalize heterogeneous scores to a common scale:

```python
from topiary import TopiaryPredictor, Affinity, Presentation, mean
from mhctools import NetMHCpan, MHCflurry

predictor = TopiaryPredictor(
    models=[NetMHCpan, MHCflurry],
    alleles=["HLA-A*02:01"],
    filter_by=Affinity <= 500,
    sort_by=mean(
        Affinity["netmhcpan"].logistic(350, 150),
        Affinity["mhcflurry"].logistic(350, 150),
    ),
)
```

When multiple models produce the same kind, qualify with brackets: `Affinity["netmhcpan"]`.

Available transforms:

| Transform | What it does |
|-----------|-------------|
| `.descending_cdf(mean, std)` | Lower input -> higher output (for IC50, rank) |
| `.ascending_cdf(mean, std)` | Higher input -> higher output (for scores) |
| `.logistic(midpoint, width)` | Sigmoid normalization |
| `.clip(lo, hi)` | Clamp to range |
| `.hinge()` | `max(0, x)` |
| `.log()` / `.log2()` / `.log10()` / `.log1p()` | Logarithms |
| `.sqrt()` / `.exp()` | Square root, exponential |
| `abs(...)` | Absolute value |

Aggregations: `mean()`, `geomean()`, `minimum()`, `maximum()`, `median()`

### String expressions

Every filter and ranking expression has a string form for use at the CLI or in configuration:

| Python | String |
|--------|--------|
| `Affinity <= 500` | `affinity <= 500` / `ba <= 500` |
| `Affinity.rank <= 2` | `affinity.rank <= 2` |
| `Presentation.score >= 0.5` | `el.score >= 0.5` |
| `(Affinity <= 500) \| (Presentation.rank <= 2)` | `affinity <= 500 \| el.rank <= 2` |
| `Affinity["netmhcpan"] <= 500` | `netmhcpan:affinity <= 500` |
| `Affinity["netmhcpan", "4.1b"].score` | `netmhcpan[4.1b]:affinity.score` |
| `0.5 * Affinity.score + 0.5 * Presentation.score` | `0.5 * affinity.score + 0.5 * presentation.score` |
| `Affinity.logistic(350, 150)` | `affinity.logistic(350, 150)` |
| `mean(Affinity.score, Presentation.score)` | `mean(affinity.score, presentation.score)` |
| `Column("gene_tpm") >= 5` | `gene_tpm >= 5` |
| `Column("gene_tpm").log()` | `gene_tpm.log()` |

For method-qualified predictions, prefer `model:kind` and
`model[version]:kind`; `model-version:kind` is accepted for
numeric-leading versions such as `netmhcpan-4.1b:affinity`. See
[docs/ranking.md](docs/ranking.md) for the full syntax and alias table.

### Column references

Any DataFrame column can be used directly in expressions — expression data, peptide properties, custom annotations. Unknown identifiers are automatically treated as column references:

```python
from topiary import Affinity, Column

# As a filter — bare name or Column() both work
filter_by = Column("gene_tpm") >= 5.0

# As a sorting signal
sort_by = 0.5 * Affinity.logistic(350, 150) - 0.1 * Column("cysteine_count")

# Transforms work on columns too
sort_by = Column("gene_tpm").log1p()
```

In CLI strings, bare names work directly:

```bash
--filter-by "ba <= 500 & gene_tpm >= 5"
--sort-by "affinity.logistic(350, 150) + 0.1 * gene_tpm.log1p()"
```

The explicit `column(name)` syntax also works: `gene_tpm >= 5`.

Misspelled column names get a helpful error at evaluation time: `Column 'hydrophobicty' not found. Did you mean: ['hydrophobicity']?`

### Scope prefixes

The `wt.` prefix reads predictions for the wildtype peptide at the same position (variant workflows only). Use it for differential binding:

```python
from topiary import Affinity, wt

sort_by = Affinity.score - wt.Affinity.score  # mutant advantage
```

For CLI variant runs, add `--predict-wt` before using `wt.*` in
`--sort-by`:

```bash
--predict-wt --sort-by "affinity.score - wt.affinity.score"
```

`shuffled.` and `self.` prefixes work the same way for shuffled-decoy and self-proteome contexts.

### Peptide-level expressions

`Len()` reads the peptide length. `Count("C")` counts amino acid occurrences:

```python
from topiary import Len, Count, wt

sort_by = Count("C") - wt.Count("C")   # gained/lost cysteines vs wildtype
```

## Expression data

Load gene-level, transcript-level, or variant-level quantification to annotate predictions. Expression values become DataFrame columns that you can reference in ranking and filtering.

### How it works

Each `--*-expression` flag takes a spec string: `[name:]file[:id_col[:val_col]]`.

1. The file is auto-detected (Salmon `.sf`, Kallisto `abundance.tsv`, RSEM `.genes.results` / `.isoforms.results`, StringTie `.gtf`, Cufflinks `.fpkm_tracking`) or treated as generic TSV/CSV.
2. The loaded columns are joined onto the prediction DataFrame by `gene_id`, `transcript_id`, or `variant` respectively.
3. Value columns are prefixed with the name (default: `gene`, `transcript`, or `variant`) and lowercased. For example, Salmon's `TPM` column loaded with `--gene-expression` becomes `gene_tpm`.

### Python API

```python
import pandas as pd
from topiary import TopiaryPredictor, Affinity, Column
from topiary.rna import load_expression_from_spec
from mhctools import NetMHCpan
from varcode import load_vcf

# Load expression data
gene_name, gene_id_col, gene_df = load_expression_from_spec(
    "salmon_quant.sf", default_name="gene"
)
# gene_df has columns: ["Name", "TPM"]
# gene_name = "gene", gene_id_col = "Name"

expression_data = {
    "gene": [(gene_name, gene_id_col, gene_df)],
    "transcript": [],
    "variant": [],
}

predictor = TopiaryPredictor(
    models=[NetMHCpan],
    alleles=["HLA-A*02:01"],
    filter_by=(Affinity <= 500) & (Column("gene_tpm") >= 5.0),
)

variants = load_vcf("somatic.vcf")
df = predictor.predict_from_variants(
    variants,
    expression_data=expression_data,
)
# df now has a "gene_tpm" column from the Salmon file
```

### Column naming

The name prefix and the original column name are combined as `{prefix}_{column}`, both lowercased:

| Flag | File | Original column | Result column |
|------|------|----------------|---------------|
| `--gene-expression quant.sf` | Salmon | `TPM` | `gene_tpm` |
| `--gene-expression quant.sf` | Salmon | `NumReads` | `gene_numreads` |
| `--transcript-expression abundance.tsv` | Kallisto | `tpm` | `transcript_tpm` |
| `--variant-expression isovar.tsv` | generic | `num_alt_reads` | `variant_num_alt_reads` |
| `--gene-expression mygene:quant.sf` | Salmon | `TPM` | `mygene_tpm` |

The join keys are implicit: gene-level data joins on `gene_id`, transcript-level on `transcript_id`, variant-level on `variant`. These columns are present in variant-pipeline output.

### Transcript selection

When transcript-level expression data is provided, Topiary uses it to select the highest-expressed transcript per variant (instead of the default priority-based selection). This matches the behavior of the legacy `--rna-transcript-fpkm-tracking-file` flag.

### Auto-detected formats

| Extension / pattern | Format | Default ID column | Default value column |
|---------------------|--------|-------------------|---------------------|
| `.sf` | Salmon | `Name` | `TPM` |
| `abundance*.tsv` (with `target_id` + `tpm` header) | Kallisto | `target_id` | `tpm` |
| `.genes.results` | RSEM (gene) | `gene_id` | `TPM` |
| `.isoforms.results` | RSEM (transcript) | `transcript_id` | `TPM` |
| `.gtf` | StringTie | `reference_id` | `TPM` (or `FPKM`) |
| `.fpkm_tracking` | Cufflinks | `tracking_id` | `FPKM` |
| anything else | generic | first column | all numeric columns |

Override defaults with the full spec: `name:file:id_col:val_col`.

## CLI usage

### Score peptides

```bash
topiary \
  --peptide-csv peptides.csv \
  --mhc-predictor netmhcpan \
  --mhc-alleles HLA-A*02:01 \
  --output-csv results.csv
```

The CSV needs a `peptide` column (and optionally `name`). Each peptide is scored as-is.

### Scan proteins

```bash
topiary \
  --fasta proteins.fasta \
  --mhc-predictor netmhcpan \
  --mhc-alleles HLA-A*02:01,HLA-B*07:02 \
  --ic50-cutoff 500 \
  --output-csv results.csv
```

### Filter and rank

```bash
# Filter: keep if affinity OR presentation passes
topiary \
  --fasta proteins.fasta \
  --mhc-predictor netmhcpan \
  --mhc-alleles HLA-A*02:01 \
  --filter-by "ba <= 500 | el.rank <= 2" \
  --output-csv results.csv

# Rank: composite score from two models
topiary \
  --fasta antigens.fasta \
  --mhc-predictor netmhcpan mhcflurry \
  --mhc-alleles HLA-A*02:01 \
  --filter-by "ba <= 500" \
  --sort-by "mean(netmhcpan:affinity.logistic(350, 150), mhcflurry:affinity.logistic(350, 150))" \
  --output-csv results.csv
```

### Neoantigen discovery from somatic variants

```bash
topiary \
  --vcf somatic.vcf \
  --mhc-predictor netmhcpan \
  --mhc-alleles HLA-A*02:01,HLA-B*07:02 \
  --filter-by "ba <= 500 | el.rank <= 2" \
  --sort-by "0.6 * affinity.descending_cdf(500, 200) + 0.4 * presentation.score" \
  --only-novel-epitopes \
  --output-csv epitopes.csv
```

### Add expression data

```bash
# Gene-level expression from Salmon (auto-detected)
topiary \
  --vcf somatic.vcf \
  --mhc-predictor netmhcpan \
  --mhc-alleles HLA-A*02:01 \
  --gene-expression salmon_quant.sf \
  --filter-by "ba <= 500 & gene_tpm >= 5" \
  --only-novel-epitopes \
  --output-csv results.csv

# Transcript-level from Kallisto
topiary \
  --vcf somatic.vcf \
  --mhc-predictor netmhcpan \
  --mhc-alleles HLA-A*02:01 \
  --transcript-expression kallisto/abundance.tsv \
  --filter-by "ba <= 500" \
  --sort-by "affinity.descending_cdf(500, 200) + 0.1 * transcript_tpm.log1p()" \
  --output-csv results.csv

# Variant-level read support from isovar
topiary \
  --vcf somatic.vcf \
  --mhc-predictor netmhcpan \
  --mhc-alleles HLA-A*02:01 \
  --variant-expression isovar_output.tsv \
  --filter-by "ba <= 500 & variant_num_alt_reads >= 3" \
  --output-csv results.csv

# Combine all three levels
topiary \
  --vcf somatic.vcf \
  --mhc-predictor netmhcpan \
  --mhc-alleles HLA-A*02:01 \
  --gene-expression salmon_quant.sf \
  --transcript-expression kallisto/abundance.tsv \
  --variant-expression isovar_output.tsv \
  --filter-by "ba <= 500 & gene_tpm >= 5 & variant_num_alt_reads >= 3" \
  --output-csv results.csv

# Override auto-detection with explicit spec
topiary \
  --vcf somatic.vcf \
  --mhc-predictor netmhcpan \
  --mhc-alleles HLA-A*02:01 \
  --gene-expression mygene:custom_quant.tsv:ensembl_id:tpm_value \
  --output-csv results.csv
```

Expression flags are only valid with the variant pipeline (`--vcf`, `--maf`, `--variant`). They are rejected with direct sequence inputs (`--fasta`, `--peptide-csv`, etc.) because those outputs lack the `gene_id` / `transcript_id` / `variant` columns needed for joining.

### Gene and transcript lookups

Pull protein sequences from Ensembl:

```bash
topiary --gene-names BRAF TP53 EGFR --mhc-predictor netmhcpan --mhc-alleles HLA-A*02:01
topiary --gene-ids ENSG00000157764 --mhc-predictor netmhcpan --mhc-alleles HLA-A*02:01
topiary --transcript-ids ENST00000288602 --mhc-predictor netmhcpan --mhc-alleles HLA-A*02:01
topiary --ensembl-proteome --mhc-predictor netmhcpan --mhc-alleles HLA-A*02:01
topiary --cta --mhc-predictor netmhcpan --mhc-alleles HLA-A*02:01  # cancer-testis antigens
```

### Restrict to protein regions

```bash
topiary \
  --fasta proteins.fasta \
  --mhc-predictor netmhcpan \
  --mhc-alleles HLA-A*02:01 \
  --regions spike:319-541 nucleocapsid:0-50 \
  --output-csv results.csv
```

Format: `NAME:START-END` (0-indexed, half-open).

### Exclusion filtering

Remove peptides found in reference proteomes — for tumor-specific or pathogen-specific peptide selection:

```bash
--exclude-ensembl                    # human Ensembl proteome
--exclude-non-cta                    # non-CTA proteins (requires pirlygenes)
--exclude-tissues heart_muscle lung  # genes expressed in these tissues
--exclude-fasta reference.fasta      # custom reference sequences
--exclude-mode substring             # "substring" (default) or "exact"
```

## Output

```bash
--output-csv results.csv
--output-html results.html
--output-csv-sep "\t"
--subset-output-columns peptide allele affinity
--rename-output-column value ic50
```

**All predictions:** `source_sequence_name`, `peptide`, `peptide_offset`, `peptide_length`, `allele`, `kind`, `score`, `value`, `affinity`, `percentile_rank`, `prediction_method_name`

**ProteinFragment predictions add** (both `predict_from_fragments` and the variant-based methods that build fragments internally): `fragment_id`, `source_type`, `overlaps_target`, `wt_peptide`, `wt_peptide_length`, plus any fragment-level annotations flattened to columns. With `TopiaryPredictor(predict_wt=True)`, they also add WT prediction columns such as `wt_value`, `wt_score`, `wt_affinity`, and `wt_percentile_rank`.

**Variant predictions additionally add:** `variant`, `gene`, `gene_id`, `transcript_id`, `transcript_name`, `effect`, `effect_type`, `contains_mutant_residues`, `mutation_start_in_peptide`, `mutation_end_in_peptide`

**Expression data adds:** columns named `{prefix}_{column}` as described in [Column naming](#column-naming), e.g. `gene_tpm`, `transcript_tpm`, `variant_num_alt_reads`.

## Peptide properties

Compute amino acid properties and use them in ranking:

```python
from topiary import TopiaryPredictor, Affinity, Column
from topiary.properties import add_peptide_properties

df = predictor.predict_from_named_sequences(seqs)
df = add_peptide_properties(df, groups=["manufacturability"])

# Properties become ranking signals
score = Affinity.logistic(350, 150) - 0.1 * Column("cysteine_count")
# CLI: --sort-by "affinity.logistic(350, 150) - 0.1 * cysteine_count"
```

Named groups:
- `"core"` — charge, hydrophobicity, aromaticity, molecular_weight
- `"manufacturability"` — core + cysteine_count, instability_index, max_7mer_hydrophobicity, difficult_nterm/cterm, asp_pro_bonds
- `"immunogenicity"` — core + tcr_charge, tcr_aromaticity, tcr_hydrophobicity (TCR-facing positions for MHC-I)

## Protein sources (Python API)

```python
from topiary.sources import (
    ensembl_proteome,
    sequences_from_gene_names,
    cta_sequences,
    non_cta_sequences,
    tissue_expressed_sequences,
    available_tissues,
)

# All return dict[name -> amino_acid_sequence]
seqs = sequences_from_gene_names(["BRAF", "TP53", "EGFR"])
cta = cta_sequences()                          # cancer-testis antigens
heart = tissue_expressed_sequences(["heart_muscle"])
print(available_tissues())                      # list tissue names
```

## Cached predictions

Skip the live predictor by loading pre-computed scores into a
`CachedPredictor`, then passing it to `TopiaryPredictor(models=…)`.
Useful for reproducibility, iterating on filters/ranking without
paying the predictor cost, or ingesting predictions from a tool
topiary doesn't natively run.

Load from a prior topiary run (Parquet or TSV):

```python
from topiary import CachedPredictor, TopiaryPredictor

cache = CachedPredictor.from_topiary_output("prior_run.parquet")
predictor = TopiaryPredictor(models=cache)
df = predictor.predict_from_variants(variants)
```

Load from mhcflurry output (all three kinds — affinity, presentation,
processing — are preserved):

```python
cache = CachedPredictor.from_mhcflurry("mhcflurry_predictions.csv")
```

Load from NetMHCpan stdout capture (both binding-affinity and
eluted-ligand kinds from a `-BA` run):

```python
cache = CachedPredictor.from_netmhcpan_stdout("netmhcpan_run.out")
```

Load from a generic TSV/CSV with column mapping (requires a `kind`
column in the file):

```python
cache = CachedPredictor.from_tsv(
    "third_party.tsv",
    columns={"affinity": "ic50_nM", "percentile_rank": "rank"},
    prediction_method_name="netchop",
    predictor_version="3.1",
)
```

Merge shards from parallel prediction jobs:

```python
cache = CachedPredictor.from_directory("caches/", pattern="*.parquet")
```

Every cache holds exactly one `(predictor_name, predictor_version)`
pair — mixing versions is rejected. See
[docs/cached.md](docs/cached.md) for full detail, including
mhcflurry composite versions, cache-plus-fallback mode, and
opt-in version equivalence.

### From the CLI

Format is sniffed from file content — no `--mhc-cache-format` needed
for NetMHC-family, mhcflurry, or topiary-output files:

```bash
topiary --peptide-csv peptides.csv \
    --mhc-cache-file netmhcpan_run.out \
    --output-csv results.csv
```

Sharded caches from parallel jobs:

```bash
topiary --peptide-csv peptides.csv \
    --mhc-cache-directory ./caches \
    --output-csv results.csv
```

`--mhc-predictor` and `--mhc-alleles` become optional when a cache
supplies predictions.
