Metadata-Version: 2.4
Name: autoslm-train
Version: 0.1.0
Summary: Managed consumer-GPU LoRA post-train package on RunPod Flash
Project-URL: Homepage, https://github.com/Eric-Mao06/autoslm
Project-URL: Repository, https://github.com/Eric-Mao06/autoslm
Project-URL: Documentation, https://github.com/Eric-Mao06/autoslm/tree/main/docs
Project-URL: Issues, https://github.com/Eric-Mao06/autoslm/issues
Project-URL: Changelog, https://github.com/Eric-Mao06/autoslm/blob/main/CHANGELOG.md
Author: AutoSLM contributors
Maintainer: AutoSLM contributors
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: fine-tuning,grpo,llm,lora,post-train,rlhf,runpod,sft
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: <3.13,>=3.11
Requires-Dist: datasets>=2.19
Requires-Dist: huggingface-hub>=0.25
Requires-Dist: runpod-flash
Provides-Extra: all
Requires-Dist: accelerate; extra == 'all'
Requires-Dist: fastapi; extra == 'all'
Requires-Dist: peft; extra == 'all'
Requires-Dist: torch==2.8.0; extra == 'all'
Requires-Dist: transformers<5,>=4.55; extra == 'all'
Requires-Dist: trl<0.24,>=0.23; extra == 'all'
Requires-Dist: uvicorn; extra == 'all'
Requires-Dist: vllm<0.11; extra == 'all'
Provides-Extra: gpu
Requires-Dist: accelerate; extra == 'gpu'
Requires-Dist: peft; extra == 'gpu'
Requires-Dist: torch==2.8.0; extra == 'gpu'
Requires-Dist: transformers<5,>=4.55; extra == 'gpu'
Requires-Dist: trl<0.24,>=0.23; extra == 'gpu'
Requires-Dist: vllm<0.11; extra == 'gpu'
Provides-Extra: server
Requires-Dist: fastapi; extra == 'server'
Requires-Dist: uvicorn; extra == 'server'
Description-Content-Type: text/markdown

# AutoSLM — managed consumer-GPU LoRA post-train

[![CI](https://github.com/Eric-Mao06/autoslm/actions/workflows/ci.yml/badge.svg)](https://github.com/Eric-Mao06/autoslm/actions/workflows/ci.yml) <!-- pragma: allowlist secret -->
[![License: Apache-2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE)
[![Python](https://img.shields.io/badge/python-3.11%20%7C%203.12-blue.svg)](pyproject.toml)

AutoSLM is a **managed fine-tuning package** for LoRA fine-tuning (SFT + GRPO) of small models.
It gives a **focused developer experience** — discover an environment, point a TOML
at it, train, evaluate, deploy — while the infrastructure is fully managed: every run is
assigned its **own dedicated GPU** on **[RunPod Flash](https://docs.runpod.io/flash/overview)**
(serverless, **no Docker**), restricted to **RTX 4090 / RTX 5090**.

- **Managed infra, one GPU per run.** No backend to configure, no Docker images, no cluster.
  Flash provisions the GPU, installs deps, runs your job, and scales to zero.
- **Composable, interoperable environments.** A simple TOML/config mental model; environments
  interoperate with the `verifiers` / Environments Hub ecosystem.
- **`uv`-native.** Install and run everything with [uv](https://docs.astral.sh/uv/).

## Install

AutoSLM is `uv`-native, but it's a normal Python package (`autoslm-train` on PyPI, imports as `autoslm`, CLI entry point `slm`),
so any of these work:

```bash
# As a CLI tool (recommended for just using AutoSLM):
uv tool install autoslm-train  # installs the `slm` command
# or: pipx install autoslm-train / pip install autoslm-train

# From source (for development / latest main):
git clone https://github.com/Eric-Mao06/autoslm   # pragma: allowlist secret
cd autoslm   # pragma: allowlist secret
uv sync                            # create the venv from the lockfile
uv run slm --help
```

> The heavy GPU fine-tuning stack (`torch`/`vllm`/`trl`/`peft`) is **worker-side** — RunPod
> Flash installs it on the GPU for each run, so you don't need it locally. If you want to run
> the worker yourself, install the extra: `uv sync --extra gpu` (or `pip install
> "autoslm-train[gpu]"`).

Requires Python 3.11 or 3.12 and a [RunPod](https://runpod.io) account.

(Examples below use `uv run slm …`; if you installed the CLI with `uv tool`/`pipx`/`pip`,
just drop the `uv run` prefix and call `slm …`.)

## Quickstart

```bash
uv run slm login --api-key $RUNPOD_API_KEY     # store your RunPod key (~/.autoslm/config.json)
export HF_REPO=your-org/autoslm-runs               # HF dataset repo for adapters/checkpoints
export HUGGINGFACE_TOKEN=...                   # write access to HF_REPO

uv run slm lab setup                            # scaffold configs/ + environments/
uv run slm models                               # list curated consumer-GPU models
uv run slm train configs/examples/gsm8k_grpo.toml --dry-run   # validate without provisioning
uv run slm train configs/examples/gsm8k_grpo.toml --background
uv run slm status <run_id>
uv run slm eval configs/examples/gsm8k_grpo.toml --adapter <run_id>   # eval base + adapter
uv run slm deploy <run_id>                        # warm a managed GPU serving the adapter
uv run slm chat <run_id> -m "What is 17 * 23?"    # OpenAI-shaped chat via the managed GPU
```

A config is a single TOML:

```toml
model = "Qwen/Qwen3-4B-Instruct-2507"
algorithm = "grpo"            # or "sft"

[environment]
id = "gsm8k"                   # built-in: gsm8k | math | tests_pass (or path = "...")

[train]
steps = 150
lora_rank = 32
seeds = [0]

[gpu]
type = "RTX 5090"             # RTX 4090 or RTX 5090 only (defaults from model size)
```

## How it works

```
slm train cfg.toml
  └─ autoslm.orchestrator           # one dedicated GPU per run/seed
       └─ autoslm.flash.train       # @Endpoint on RunPod Flash (RTX 4090/5090, no Docker)
            └─ autoslm.engine.worker   # SFT/GRPO + eval on the GPU (TRL + colocated vLLM, PEFT LoRA)
HF dataset repo                # adapter + checkpoints (resume) stream here; metrics return inline
```

The fine-tuning worker runs two fresh processes on the GPU — train, then eval — so the
TRL trainer and the vLLM eval engine never share a GPU allocation. Adapters and checkpoints
are streamed to a Hugging Face dataset repo (`HF_REPO`) for serving and preemption-resilient
resume; metrics are returned directly from the Flash call.

## Environments

Built-ins implement a small task interface (`dataset`, `prompt_messages`, `sft_target`,
`reward`, `grade`):

- **`gsm8k`** — GSM8K verifiable math with a shared boxed/`####` grader.
- **`math`** — competition math (DeepScaleR train → MATH-500 eval), LaTeX/numeric grading.
- **`tests_pass`** — coding tasks: apply a generated diff and reward on a test command.

Scaffold a custom one with `uv run slm env init my-env`, then point a config at it via
`[environment] path = "environments/my_env"`.

**Prime Hub / `verifiers` interop.** AutoSLM runs Prime Intellect `verifiers` environments
unchanged (single-turn). Install one and point a config at it:

```bash
uv run slm env install owner/my-env       # via the prime CLI or pip; recorded in ~/.autoslm/envs.json
uv run slm env list                        # shows built-in + installed + local envs
```
```toml
[environment]
id = "owner/my-env"                       # loaded through the verifiers adapter
```
The GPU worker automatically installs `verifiers` + the env package for that run.
Multi-turn/tool environments are a follow-up.

## Layout

```
autoslm/
  cli/             `slm` CLI (login/train/eval/deploy/status/logs/env/models)
  catalog.py       curated consumer-GPU model catalog
  config_schema.py TOML -> JobSpec (validates RTX 4090/5090)
  orchestrator.py  drives RunPod Flash (one dedicated GPU per run)
  flash/           RunPod Flash integration (gpus, train endpoints, serving, auth, preflight)
  engine/          substrate-neutral internals: recipe, data, grading, accounting, worker
  envs/            environment interface + built-ins (gsm8k, math, tests_pass)
  serve/           adapter serving via Flash (deploy + OpenAI-shaped chat)
  server/          optional FastAPI control plane
  mcp/             stdio tool bridge for coding agents
  _logging.py      package logging helpers (quiet by default; `-v` / `AUTOSLM_LOG_LEVEL`)
tests/             CPU tests (no GPU/network required)
```

## Docs

- [Config reference](docs/config-reference.md) — TOML schema, `--set`/`--config`, env vars.
- [Environments](docs/environments.md) — authoring + Prime Hub / `verifiers` interop.

## Cold starts

The first run on a fresh worker pulls the image + installs deps (a few minutes) + downloads
the model (~8 GB). Set `AUTOSLM_WORKER_IMAGE` to a **prebuilt image** with the deps **and the base
model** baked in to skip both:

```bash
export AUTOSLM_WORKER_IMAGE=your-registry/autoslm-worker:cu128
```

Build it (on any Docker host — no GPU needed to build; see [`docker/`](docker/)):

```bash
docker/build.sh your-registry/autoslm-worker:cu128 Qwen/Qwen3-4B-Instruct-2507 --push
```

The worker still **scales to zero when idle** (`workers=(0,1)`) — the image only speeds the
cold start, it does **not** keep a GPU online. The baked model lives in the image's `HF_HOME`
cache, so it loads locally instead of downloading.

## Troubleshooting

- **`error: cannot start a managed run — missing required configuration`** — set your RunPod
  key (`slm login --api-key <key>` or `RUNPOD_API_KEY`) and export `HF_REPO` +
  `HUGGINGFACE_TOKEN`. Add `--dry-run` to validate a config without provisioning a GPU.
- **`error: unsupported model …`** — the model isn't in the catalog. Run `slm models`
  (add `--experimental` to see the bleeding-edge tier). The default,
  `Qwen/Qwen3-4B-Instruct-2507`, is a proven dense model that loads on the pinned worker
  stack.
- **Experimental models fail to load on the worker** — the `Qwen3.5*` / `Qwen3.6*` entries
  use newer architectures than the pinned `vllm`/`transformers`. Bake a newer stack into a
  `AUTOSLM_WORKER_IMAGE` to use them.
- **vLLM OOM / KV-cache errors during GRPO** — see the colocated-GRPO memory recipe in the
  [config reference](docs/config-reference.md#colocated-grpo-on-one-consumer-gpu-memory-recipe).
- **Cold starts are slow** — bake a prebuilt `AUTOSLM_WORKER_IMAGE` (see [Cold starts](#cold-starts)).
- **See more detail** — add `-v` (info) or `-vv` (debug) to any command, or set
  `AUTOSLM_LOG_LEVEL=DEBUG`. Use `--debug` (or `AUTOSLM_DEBUG=1`) to get a full traceback on error.
- **Check the version** — `slm version` (or `slm --version`).

## Development

```bash
uv sync                       # install dev deps from the lockfile
AUTOSLM_SKIP_NET=1 uv run pytest tests -q   # CPU unit tests (AUTOSLM_SKIP_NET=1 skips live HF)
uv run ruff check .           # lint
uv run ruff format .          # format
```

See [CONTRIBUTING.md](CONTRIBUTING.md) for the full guide. CI runs the tests (Python 3.11 +
3.12) and the ruff checks on every push/PR.

## License

[Apache-2.0](LICENSE).
