google/fleurs
Viewer • Updated • 768k • 57.3k • 402
How to use Scrya/whisper-medium-id-augmented with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="Scrya/whisper-medium-id-augmented") # Load model directly
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
processor = AutoProcessor.from_pretrained("Scrya/whisper-medium-id-augmented")
model = AutoModelForSpeechSeq2Seq.from_pretrained("Scrya/whisper-medium-id-augmented")This model is a fine-tuned version of openai/whisper-medium on the following datasets:
It achieves the following results on the evaluation set (Common Voice 11.0):
More information needed
More information needed
Training:
Evaluation:
Datasets were augmented on-the-fly using audiomentations via PitchShift, AddGaussianNoise and TimeStretch transformations at p=0.3.
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Wer | Cer |
|---|---|---|---|---|---|
| 0.3002 | 1.9 | 1000 | 0.1659 | 8.1850 | 2.5333 |
| 0.0514 | 3.8 | 2000 | 0.1818 | 8.0559 | 2.5244 |
| 0.0145 | 5.7 | 3000 | 0.2150 | 7.8945 | 2.5281 |
| 0.0037 | 7.6 | 4000 | 0.2248 | 7.7100 | 2.3738 |
| 0.0016 | 9.51 | 5000 | 0.2402 | 7.6224 | 2.3591 |
| 0.0009 | 11.41 | 6000 | 0.2525 | 7.7654 | 2.3952 |
| 0.0005 | 13.31 | 7000 | 0.2609 | 7.5994 | 2.3487 |
| 0.0008 | 15.21 | 8000 | 0.2682 | 7.5855 | 2.3347 |
| 0.0002 | 17.11 | 9000 | 0.2756 | 7.6178 | 2.3288 |
| 0.0002 | 19.01 | 10000 | 0.2788 | 7.6132 | 2.3332 |