Audio Classification
Keras
speech_emotion_recognition
Mel-Frequency Cepstral Coefficients
wav2vec2
bi-lstm
cnn
Instructions to use Sharath45/SPEECH_EMOTION_RECOGNITION with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Keras
How to use Sharath45/SPEECH_EMOTION_RECOGNITION with Keras:
# Available backend options are: "jax", "torch", "tensorflow". import os os.environ["KERAS_BACKEND"] = "jax" import keras model = keras.saving.load_model("hf://Sharath45/SPEECH_EMOTION_RECOGNITION") - Notebooks
- Google Colab
- Kaggle
A newer version of this model is available: facebook/wav2vec2-base
license: apache-2.0 tags: - audio - mfcc - speech-recognition - classification
Updated MFCC Model
Model Description
This model leverages updated Mel-Frequency Cepstral Coefficients (MFCC) features to perform robust audio analysis. It is designed for tasks such as audio classification or speech recognition, capturing spectral properties of audio signals even in noisy conditions.
Intended Use
- Primary Use: Audio classification, speech recognition, or any audio analysis tasks.
- Target Users: Researchers, developers, and hobbyists working in audio processing and machine learning.
- Out-of-Scope Use: Not intended for real-time processing in highly dynamic environments without further adaptation or for applications requiring precise speech-to-text conversion in multiple languages.
Model Architecture
- Base Architecture: (e.g., Convolutional Neural Network, Recurrent Neural Network, Transformer, etc.)
- Input: Preprocessed audio signals represented as updated MFCC features.
- Output: Depending on the task, the model outputs class probabilities or transcriptions.
Training Data
- Dataset(s): (CREMA-D, RAVDESS)
- Preprocessing: Audio normalization, MFCC extraction parameters (e.g., number of coefficients, window size, hop length).
- Splits: Details on training, validation, and testing splits.
- Augmentation: (Apply random pitch shifting and noise addition)
Evaluation Metrics
- Accuracy:
- Precision/Recall/F1-Score:
- Additional Metrics: (e.g., ROC-AUC, confusion matrices, etc.)
- Benchmarking: (Optional – describe how your model compares against baselines.)
Limitations
- Sensitivity to very high levels of background noise.
- Potential performance degradation on audio types not represented in the training data.
- (Any other model-specific limitations or failure modes.)
Ethical Considerations
- Ensure privacy and consent when processing audio data.
- Consider potential biases if the training data is not diverse.
- Avoid deploying in contexts where misclassifications could have serious consequences without thorough validation.
How to Use
Below is an example code snippet to load and use the model:
from transformers import AutoModel, AutoTokenizer
# Replace 'username/updated-mfcc-model' with your model's path on Hugging Face
model = AutoModel.from_pretrained("username/updated-mfcc-model")
tokenizer = AutoTokenizer.from_pretrained("username/updated-mfcc-model")
# Example: processing an audio file
# audio_input = ... (your audio processing code to extract MFCC features)
# outputs = model(audio_input)
# print(outputs)
- Downloads last month
- 10
Model tree for Sharath45/SPEECH_EMOTION_RECOGNITION
Base model
facebook/wav2vec2-base-960h