Instructions to use openai/whisper-large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openai/whisper-large with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="openai/whisper-large")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("openai/whisper-large") model = AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-large") - Notebooks
- Google Colab
- Kaggle
Decoding of 'mp3' failed
Hey, I'm trying to run the French to French example posted in the model card (see below) on a google colab free tier gpu
...
load dummy dataset and read soundfiles
ds = load_dataset("common_voice", "fr", split="test", streaming=True)ds = ds.cast_column("audio", datasets.Audio(sampling_rate=16_000))input_speech = next(iter(ds))["audio"]["array"]
It fails at the last step, giving me the error belowRuntimeError: Decoding of 'mp3' failed, probably because of streaming mode (librosa cannot decode 'mp3' file-like objects, only path-like)
Has anybody seen this? am I missing a dependency or something?
Thank you
Hey! Not really sure why, but I think it is related to librosa. I just tried on a local computer and it works properly. But I had this bug on colab ....
I found the cause, the mp3 files corrupted. you should make validator mp3 files and seperated it.
Datasets got a big update to only use soundfile + librosa now, regardless of the file type: https://github.com/huggingface/datasets/pull/5573
There should be a better error message from soundfile.read