GigaAM v3 RNNT — MLX (Apple Silicon)
GigaAM v3 RNNT (Conformer, 16 layers, 768d + RNNT Joint & Decoder) converted to MLX for native inference on Apple Silicon.
48× realtime on M4 — transcribes 11 seconds of Russian speech in 230ms. Compared to the CTC version, RNNT offers ~9% lower Word Error Rate (WER) across benchmarks due to the autoregressive joint language modeling loop, with slightly slower sequential decoding.
Original model: ai-sage/GigaAM
Quick Start
pip install mlx safetensors numpy
from huggingface_hub import snapshot_download
model_dir = snapshot_download("al-bo/gigaam-v3-rnnt-mlx")
Or use with the inference code from GigaAM MLX:
from gigaam_mlx import load_model, load_audio
model = load_model("./gigaam-v3-rnnt-mlx")
text = model.transcribe(load_audio("audio.wav"))
print(text)
# → ничьих не требуя похвал счастлив уж я надеждой сладкой
Architecture
Audio (16kHz) → Log-Mel Spectrogram (64 bins)
→ Conv1d Subsampling (4× stride)
→ 16× Conformer Layers:
├─ FFN₁ (half-step residual)
├─ RoPE Multi-Head Self-Attention (16 heads)
├─ Convolution Module (GLU + depthwise conv)
└─ FFN₂ (half-step residual)
→ RNNT Head (Joint + LSTM Decoder)
→ Greedy Decode
Performance (Apple M4)
| Metric | Value |
|---|---|
| Batch (11s audio) | 230ms (48× realtime) |
| Model size | 423 MB (fp16) |
| Parameters | ~222M |
Files
model.safetensors— weights (fp16, 423 MB)config.json— model config + vocabulary (34 Russian characters)
Conversion
Converted from PyTorch using convert_gigaam_to_mlx.py.
LSTM weights are transformed from PyTorch (weight_ih, weight_hh, bias_ih, bias_hh) to MLX layout (Wx, Wh, bias).
License
MLX conversion code: MIT. Model weights: see ai-sage/GigaAM license.
- Downloads last month
- 16
Hardware compatibility
Log In to add your hardware
Quantized
Model tree for al-bo/gigaam-v3-rnnt-mlx
Base model
ai-sage/GigaAM