Web UI for local speech-to-text using voxtral.c on Apple Silicon.
Prerequisites: macOS with Apple Silicon, ffmpeg, Python 3.10+, uv
git clone https://github.com/r-dh/voxtral-app
cd voxtral-app
# Build voxtral.c
cd voxtral.c
make mps
./download_model.sh # ~8.9 GB
cd ..
# Run
uv sync
uv run uvicorn server:app --host 0.0.0.0 --port 8000Open http://localhost:8000.
- Drag & drop audio files (WAV, MP3, OGG, M4A, FLAC, AAC) or record from the microphone
- Tokens stream to the browser as the model generates them
- Model stays loaded in GPU memory between transcriptions (~8.2 GB)
- Cancel mid-transcription without unloading the model (SIGUSR1)
- Adjustable speed/accuracy tradeoff (WER 6.7% to 12.6%)
- Single HTML file, no build tools
Browser <-- SSE --> FastAPI <-- stdin/stdout --> voxtral --server
|
voxtral-model/
(8.9 GB, Metal GPU)
The server manages a persistent voxtral --server process. The model loads once into GPU memory and stays resident across transcriptions. Each request sends a WAV path over stdin, receives tokens on stdout and progress on stderr. Server-Sent Events relay everything to the browser.
Cancelling sends SIGUSR1 to abort the current transcription without killing the process. The model stays loaded and the next transcription starts immediately.
Tested on M1 Max (32-core GPU, 64 GB unified memory):
| Audio | Time | Speed | Quality setting |
|---|---|---|---|
| 7s clip | ~18s | 0.4x realtime | Balanced |
| 60s clip | ~150s | 0.4x realtime | Balanced |
| 60s clip | ~250s | 0.2x realtime | Most accurate |
First run includes ~25s to load the model. Subsequent transcriptions start immediately.
This is a 4 billion parameter model running locally. For faster local transcription with smaller models, see whisper.cpp.
server.py FastAPI server, process management, SSE streaming
static/index.html Single-file frontend (vanilla HTML/CSS/JS)
voxtral.c/ antirez's voxtral.c (pure C, Metal GPU)
- Apple Silicon Mac (M1, M2, M3, or M4)
- ~10 GB disk for model weights
- ffmpeg (
brew install ffmpeg) - Python 3.10+ (
uv syncorpip install fastapi uvicorn python-multipart)
