|
|
--- |
|
|
base_model: openai/whisper-large-v3 |
|
|
datasets: |
|
|
- gl |
|
|
language: gl |
|
|
library_name: transformers |
|
|
license: apache-2.0 |
|
|
model-index: |
|
|
- name: Finetuned openai/whisper-large-v3 on Galician |
|
|
results: |
|
|
- task: |
|
|
type: automatic-speech-recognition |
|
|
name: Speech-to-Text |
|
|
dataset: |
|
|
name: Common Voice (Galician) |
|
|
type: common_voice |
|
|
metrics: |
|
|
- type: wer |
|
|
value: 5.143 |
|
|
--- |
|
|
|
|
|
# Finetuned penai/whisper-large-v3 on 116954 Galician training audio samples from cv-corpus-21.0-2025-03-14/gl. |
|
|
|
|
|
This model was created from the Mozilla.ai Blueprint: |
|
|
[speech-to-text-finetune](https://github.com/mozilla-ai/speech-to-text-finetune). |
|
|
|
|
|
## Evaluation results on 29239 audio samples of Galician: |
|
|
|
|
|
### Baseline model (before finetuning) on Galician |
|
|
- Word Error Rate (Normalized): 20.140 |
|
|
- Word Error Rate (Orthographic): 25.293 |
|
|
- Character Error Rate (Normalized): 7.427 |
|
|
- Character Error Rate (Orthographic): 6.224 |
|
|
- Loss: 1.905 |
|
|
|
|
|
### Finetuned model (after finetuning) on Galician |
|
|
- Word Error Rate (Normalized): 5.143 |
|
|
- Word Error Rate (Orthographic): 8.320 |
|
|
- Character Error Rate (Normalized): 1.865 |
|
|
- Character Error Rate (Orthographic): 2.446 |
|
|
- Loss: 0.126 |
|
|
""" |
|
|
### Finetuned model (after finetuning) on the Galician FLEURS test set (total of 927 samples) |
|
|
- Word Error Rate (Normalized): 9.804 |
|
|
- Word Error Rate (Orthographic): 13.147 |
|
|
- Character Error Rate (Normalized): 5.827 |
|
|
- Character Error Rate (Orthographic): 5.007 |
|
|
- Loss: 0.383 |
|
|
|