Metadata-Version: 2.4
Name: realtimestt
Version: 1.0.1
Summary: A fast Voice Activity Detection and Transcription System
Home-page: https://github.com/KoljaB/RealTimeSTT
Author: Kolja Beigel
Author-email: kolja.beigel@web.de
License: MIT
Keywords: real-time,audio,transcription,speech-to-text,voice-activity-detection,VAD,real-time-transcription,ambient-noise-detection,microphone-input,faster_whisper,speech-recognition,voice-assistants,audio-processing,buffered-transcription,pyaudio,ambient-noise-level,voice-deactivity
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: PyAudio==0.2.14
Requires-Dist: webrtcvad-wheels==2.0.14
Requires-Dist: halo==0.0.31
Requires-Dist: torch
Requires-Dist: torchaudio
Requires-Dist: scipy==1.17.1
Requires-Dist: websockets==16.0
Requires-Dist: websocket-client==1.9.0
Requires-Dist: soundfile==0.13.1
Provides-Extra: minimal
Provides-Extra: faster-whisper
Requires-Dist: faster-whisper==1.2.1; extra == "faster-whisper"
Provides-Extra: whisper-cpp
Requires-Dist: pywhispercpp; extra == "whisper-cpp"
Provides-Extra: whispercpp
Requires-Dist: pywhispercpp; extra == "whispercpp"
Provides-Extra: openai-whisper
Requires-Dist: openai-whisper; extra == "openai-whisper"
Provides-Extra: sherpa-onnx
Requires-Dist: sherpa-onnx; extra == "sherpa-onnx"
Provides-Extra: sherpa
Requires-Dist: sherpa-onnx; extra == "sherpa"
Provides-Extra: silero-vad
Requires-Dist: silero-vad>=6.2.1; python_version >= "3.8" and extra == "silero-vad"
Provides-Extra: silero
Requires-Dist: silero-vad>=6.2.1; python_version >= "3.8" and extra == "silero"
Provides-Extra: silero-onnx
Requires-Dist: silero-vad[onnx-cpu]>=6.2.1; python_version >= "3.8" and extra == "silero-onnx"
Provides-Extra: silero-onnx-cpu
Requires-Dist: silero-vad[onnx-cpu]>=6.2.1; python_version >= "3.8" and extra == "silero-onnx-cpu"
Provides-Extra: vad-onnx
Requires-Dist: silero-vad[onnx-cpu]>=6.2.1; python_version >= "3.8" and extra == "vad-onnx"
Provides-Extra: silero-onnx-gpu
Requires-Dist: silero-vad[onnx-gpu]>=6.2.1; python_version >= "3.8" and extra == "silero-onnx-gpu"
Provides-Extra: vad-onnx-gpu
Requires-Dist: silero-vad[onnx-gpu]>=6.2.1; python_version >= "3.8" and extra == "vad-onnx-gpu"
Provides-Extra: transformers
Requires-Dist: transformers; extra == "transformers"
Provides-Extra: moonshine
Requires-Dist: transformers; extra == "moonshine"
Provides-Extra: granite
Requires-Dist: transformers; extra == "granite"
Provides-Extra: cohere
Requires-Dist: transformers; extra == "cohere"
Provides-Extra: parakeet
Requires-Dist: nemo_toolkit[asr]; extra == "parakeet"
Provides-Extra: nvidia-parakeet
Requires-Dist: nemo_toolkit[asr]; extra == "nvidia-parakeet"
Provides-Extra: omnilingual-asr
Requires-Dist: torch==2.8.0; (python_version >= "3.10" and python_version < "3.12" and platform_system != "Windows") and extra == "omnilingual-asr"
Requires-Dist: torchaudio==2.8.0; (python_version >= "3.10" and python_version < "3.12" and platform_system != "Windows") and extra == "omnilingual-asr"
Requires-Dist: omnilingual-asr>=0.2.0; (python_version >= "3.10" and python_version < "3.12" and platform_system != "Windows") and extra == "omnilingual-asr"
Provides-Extra: omnilingual
Requires-Dist: torch==2.8.0; (python_version >= "3.10" and python_version < "3.12" and platform_system != "Windows") and extra == "omnilingual"
Requires-Dist: torchaudio==2.8.0; (python_version >= "3.10" and python_version < "3.12" and platform_system != "Windows") and extra == "omnilingual"
Requires-Dist: omnilingual-asr>=0.2.0; (python_version >= "3.10" and python_version < "3.12" and platform_system != "Windows") and extra == "omnilingual"
Provides-Extra: meta-omnilingual-asr
Requires-Dist: torch==2.8.0; (python_version >= "3.10" and python_version < "3.12" and platform_system != "Windows") and extra == "meta-omnilingual-asr"
Requires-Dist: torchaudio==2.8.0; (python_version >= "3.10" and python_version < "3.12" and platform_system != "Windows") and extra == "meta-omnilingual-asr"
Requires-Dist: omnilingual-asr>=0.2.0; (python_version >= "3.10" and python_version < "3.12" and platform_system != "Windows") and extra == "meta-omnilingual-asr"
Provides-Extra: qwen
Requires-Dist: qwen-asr; extra == "qwen"
Provides-Extra: qwen3-asr
Requires-Dist: qwen-asr; extra == "qwen3-asr"
Provides-Extra: qwen-vllm
Requires-Dist: qwen-asr[vllm]; extra == "qwen-vllm"
Provides-Extra: kroko-builder
Requires-Dist: huggingface_hub; extra == "kroko-builder"
Provides-Extra: porcupine
Requires-Dist: pvporcupine==1.9.5; extra == "porcupine"
Provides-Extra: pvporcupine
Requires-Dist: pvporcupine==1.9.5; extra == "pvporcupine"
Provides-Extra: pvp
Requires-Dist: pvporcupine==1.9.5; extra == "pvp"
Provides-Extra: openwakeword
Requires-Dist: openwakeword>=0.6.0; extra == "openwakeword"
Provides-Extra: oww
Requires-Dist: openwakeword>=0.6.0; extra == "oww"
Provides-Extra: wakewords
Requires-Dist: pvporcupine==1.9.5; extra == "wakewords"
Requires-Dist: openwakeword>=0.6.0; extra == "wakewords"
Provides-Extra: wake-words
Requires-Dist: pvporcupine==1.9.5; extra == "wake-words"
Requires-Dist: openwakeword>=0.6.0; extra == "wake-words"
Provides-Extra: recommended
Requires-Dist: faster-whisper==1.2.1; extra == "recommended"
Requires-Dist: silero-vad[onnx-cpu]>=6.2.1; python_version >= "3.8" and extra == "recommended"
Provides-Extra: default
Requires-Dist: faster-whisper==1.2.1; extra == "default"
Requires-Dist: silero-vad[onnx-cpu]>=6.2.1; python_version >= "3.8" and extra == "default"
Provides-Extra: all
Requires-Dist: faster-whisper==1.2.1; extra == "all"
Requires-Dist: pywhispercpp; extra == "all"
Requires-Dist: openai-whisper; extra == "all"
Requires-Dist: sherpa-onnx; extra == "all"
Requires-Dist: silero-vad[onnx-cpu]>=6.2.1; python_version >= "3.8" and extra == "all"
Requires-Dist: transformers; extra == "all"
Requires-Dist: nemo_toolkit[asr]; extra == "all"
Requires-Dist: torch==2.8.0; (python_version >= "3.10" and python_version < "3.12" and platform_system != "Windows") and extra == "all"
Requires-Dist: torchaudio==2.8.0; (python_version >= "3.10" and python_version < "3.12" and platform_system != "Windows") and extra == "all"
Requires-Dist: omnilingual-asr>=0.2.0; (python_version >= "3.10" and python_version < "3.12" and platform_system != "Windows") and extra == "all"
Requires-Dist: qwen-asr; extra == "all"
Requires-Dist: huggingface_hub; extra == "all"
Requires-Dist: pvporcupine==1.9.5; extra == "all"
Requires-Dist: openwakeword>=0.6.0; extra == "all"
Dynamic: author
Dynamic: author-email
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary


RealtimeSTT lets you choose the transcription and wake-word dependencies you
want to install.

Recommended default local Whisper install:

    pip install "realtimestt[recommended]"

Main ASR backend only, without the faster packaged Silero ONNX Runtime VAD:

    pip install "realtimestt[faster-whisper]"

Core package only, without a transcription engine or wake-word backend:

    pip install realtimestt

Install multiple extras by separating them with commas:

    pip install "realtimestt[faster-whisper,porcupine]"
    pip install "realtimestt[whisper-cpp,openwakeword]"

Available extras include:

- faster-whisper: default CTranslate2 Whisper backend
- whisper-cpp: whisper.cpp backend through pywhispercpp
- openai-whisper: original OpenAI Whisper Python backend
- sherpa-onnx: sherpa-onnx CPU backends
- silero-vad: packaged Silero model assets and PyTorch wrapper
- silero-onnx/silero-onnx-cpu: fastest Silero VAD CPU ONNX Runtime backend
- silero-onnx-gpu: installs Silero's ONNX GPU runtime extra for experiments
- parakeet: NVIDIA NeMo Parakeet backend
- omnilingual/omnilingual-asr: Meta Omnilingual ASR backend for Linux/WSL2 with Python 3.11.x only; uses omnilingual-asr>=0.2.0 with matching torch/torchaudio builds
- transformers: shared Transformers dependency for Moonshine, Granite, and Cohere
- moonshine, granite, cohere: aliases for the Transformers dependency set
- qwen: Qwen ASR backend
- qwen-vllm: Qwen ASR with vLLM extras
- kroko-builder: helper command for building/installing Kroko-ONNX plus Hugging Face model downloads
- porcupine: Porcupine wake-word backend
- openwakeword: OpenWakeWord wake-word backend
- wakewords: both wake-word backends
- recommended/default: faster-whisper backend plus fast Silero CPU ONNX VAD
- all: all PyPI-installable optional backends

WebRTC VAD is installed with the core package. AudioToTextRecorder also
initializes a Silero VAD path. Install the recommended/default or
silero-onnx-cpu extra for a self-contained local Silero ONNX Runtime backend.

Meta Omnilingual ASR install note: use Linux or WSL2 with Python 3.11.x.
Native Windows cannot run the Omnilingual runtime because fairseq2n has no
Windows wheel, and Python 3.12.x currently cannot resolve omnilingual-asr>=0.2.0
from PyPI because the upstream package metadata excludes normal 3.12 patch
releases.

For live Kroko-ONNX usage, install the builder helper and then build Kroko in
the same Python environment:

    pip install "realtimestt[kroko-builder,silero-onnx-cpu]"
    stt-install-kroko --build

The silero-onnx-cpu extra is not needed to build Kroko-ONNX itself, but
recorder-based Kroko smoke tests and live AudioToTextRecorder use need a local
VAD backend.

On Windows, use Python 3.12 x64 and start Docker Desktop before running the
builder. Check that Docker's Linux engine is available with:

    python --version
    git --version
    docker version

`docker version` must show a Server section. `docker --version` only checks
that the Docker CLI is installed.

If the default builder cache is not writable, use a project-local work
directory:

    stt-install-kroko --build --work-dir .\kroko-builder-work

The kroko-builder extra includes huggingface_hub. Download a public Community
model after the builder finishes:

    mkdir test-model-cache\kroko-onnx
    python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='Banafo/Kroko-ASR', filename='Kroko-EN-Community-64-L-Streaming-001.data', local_dir='test-model-cache/kroko-onnx')"

# RealtimeSTT

RealtimeSTT is a Python speech-to-text library for applications that need
voice activity detection, fast transcription, optional realtime text updates,
wake words, and direct access to audio streams. It is designed for assistants,
dictation tools, browser streaming servers, and prototypes that need to turn
speech into text with only a few lines of code.

The recommended default path uses `faster_whisper`. Other engines are available
through install extras when their optional dependencies and models are present.

## Demo

https://github.com/user-attachments/assets/797e6552-27cd-41b1-a7f3-e5cbc72094f5

[CLI demo code (reproduces the video above)](tests/realtimestt_test.py)

## Featured Integration: Kroko/Banafo ASR

RealtimeSTT 1.0.1 adds native support for `kroko_onnx`, the local streaming ASR
engine from the Kroko/Banafo team.

This integration has been on my wishlist for a long time. Kroko is a strong fit
for RealtimeSTT's goals: fast, accurate local speech recognition.

Start with the public Community models for local testing, or see Kroko/Banafo's
commercial model options if you need production licensing and higher-end models.

```bash
pip install "RealtimeSTT[kroko-builder,silero-onnx-cpu]"
stt-install-kroko --build
```

The `silero-onnx-cpu` extra gives `AudioToTextRecorder` a local VAD backend for
recorder-based smoke tests and live microphone use.

See the [Kroko-ONNX engine guide](docs/engines/kroko-onnx.md),
[Kroko ASR docs](https://docs.kroko.ai/on-premise/), and
[kroko-onnx on GitHub](https://github.com/kroko-ai/kroko-onnx).

## Install

Use Python 3.11 or newer for the current pinned dependency set.

```bash
pip install "RealtimeSTT[faster-whisper]"
```

On Linux, install PortAudio headers before installing the package:

```bash
sudo apt-get update
sudo apt-get install python3-dev portaudio19-dev
```

On macOS:

```bash
brew install portaudio
```

For CUDA, platform notes, and optional engine stacks, see
[docs/installation.md](docs/installation.md).

## Microphone Example

This waits for speech, stops after the detected utterance, and prints the final
transcript:

```python
from RealtimeSTT import AudioToTextRecorder

if __name__ == "__main__":
    with AudioToTextRecorder() as recorder:
        print("Speak now")
        print(recorder.text())
```

Use the `if __name__ == "__main__":` guard when running scripts, especially on
Windows, because RealtimeSTT uses multiprocessing for model work.

## Automatic Recording Loop

For continuous dictation, pass a callback to `text()` so transcription work can
complete asynchronously while your loop keeps listening:

```python
from RealtimeSTT import AudioToTextRecorder


def process_text(text):
    print(text)


if __name__ == "__main__":
    recorder = AudioToTextRecorder()

    while True:
        recorder.text(process_text)
```

## External Audio

Set `use_microphone=False` when audio comes from a file, stream, websocket, or
another process. Feed 16-bit mono PCM chunks at 16 kHz, or pass the original
sample rate so RealtimeSTT can resample:

```python
from RealtimeSTT import AudioToTextRecorder

if __name__ == "__main__":
    recorder = AudioToTextRecorder(use_microphone=False)

    with open("audio_chunk.pcm", "rb") as audio_file:
        recorder.feed_audio(audio_file.read(), original_sample_rate=16000)

    print(recorder.text())
    recorder.shutdown()
```

More examples are in [docs/quick-start.md](docs/quick-start.md) and
[docs/external-audio.md](docs/external-audio.md).

## Configuration Reference

Every `AudioToTextRecorder` constructor parameter is documented in
[docs/configuration.md](docs/configuration.md), including model/engine
selection, realtime transcription, VAD timing, wake words, callbacks, external
audio, logging, and executor injection.

## Features

- Voice activity detection with WebRTC VAD and Silero VAD.
- Final and realtime transcription with selectable engines.
- Optional wake word activation through Porcupine or OpenWakeWord.
- Direct microphone input or application-fed audio chunks.
- Event callbacks for recording, VAD, realtime text, transcription, and wake
  word state.
- A FastAPI browser streaming server example with multi-user session isolation,
  shared inference resources, metrics, and health endpoints.

## Documentation

- [Quick start](docs/quick-start.md): shortest demos and common recording
  patterns.
- [Installation](docs/installation.md): platform setup, CUDA notes, and optional
  dependencies.
- [Configuration](docs/configuration.md): complete `AudioToTextRecorder`
  parameter reference.
- [Transcription engines](docs/transcription-engines.md): engine selection and
  setup links.
- [Wake words](docs/wake-words.md): Porcupine and OpenWakeWord setup.
- [External audio](docs/external-audio.md): feeding audio without a microphone.
- [Testing](docs/testing.md): maintained unit and opt-in golden test workflow.
- [Test scripts](docs/test-scripts.md): demos, manual tests, regressions, and
  legacy experiments under `tests/`.
- [FastAPI server](docs/fastapi-server.md): browser server configuration,
  protocol, metrics, and deployment notes.
- [Troubleshooting](docs/troubleshooting.md): common install, audio, CUDA,
  model, dependency, and runtime errors.
- [Engine licenses](docs/licenses.md): license notes for optional engine
  runtimes and model families.

Engine-specific references:

- [faster-whisper](docs/engines/faster-whisper.md)
- [whisper.cpp](docs/engines/whisper-cpp.md)
- [OpenAI Whisper](docs/engines/openai-whisper.md)
- [Moonshine](docs/engines/moonshine.md)
- [sherpa-onnx](docs/engines/sherpa-onnx.md)
- [Kroko-ONNX](docs/engines/kroko-onnx.md)
- [Parakeet NeMo](docs/engines/parakeet-nemo.md)
- [Meta Omnilingual ASR](docs/engines/omnilingual-asr.md)
- [Granite/Qwen Transformers engines](docs/engines/hf-transformers.md)
- [Cohere Transcribe](docs/engines/cohere.md)

## Server Example

The browser FastAPI reference server lives in `example_fastapi_server` and is
intended for source checkouts. It is not installed by the PyPI wheel; keeping it
source-only keeps the wheel lean and avoids adding web-server dependencies for
users who only need the recorder/API library.

```bash
python -m pip install -r example_fastapi_server/requirements.txt
python example_fastapi_server/server.py --host 0.0.0.0 --port 8010
```

For pip-only installs, use the Python recorder/API examples instead. If you
want the FastAPI reference server, clone the repository or install from Git.

Open `http://localhost:8010`. See [docs/fastapi-server.md](docs/fastapi-server.md)
for engine recipes, websocket protocol details, health checks, and metrics.

## Contributing

Focused tests and small changes are easiest to review. The project keeps fast
unit tests separate from opt-in real-model tests; see [docs/testing.md](docs/testing.md).

## License

MIT

## Author

Kolja Beigel
