YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

🗣️ StutteredSpeechASR Research Demo

A Gradio-based research demonstration showcasing StutteredSpeechASR, a Whisper model fine-tuned specifically for stuttered speech recognition (Mandarin). Compare its performance against baseline Whisper models to see the improvement on stuttered speech patterns.

🎯 Features

StutteredSpeechASR Research: Showcases fine-tuned Whisper model specifically designed for stuttered speech
Comparative Analysis: Side-by-side comparison with baseline Whisper models
Audio Input Flexibility: Record via microphone or upload audio files
Specialized for Stuttered Speech: Better handling of repetitions, prolongations, and blocks
Clean Interface: Organized model cards with clear transcription results
Lightweight Deployment: All inference via Hugging Face APIs - no GPU required

🤖 Models Included

Model	Type	Description
🗣️ StutteredSpeechASR	Fine-tuned Research Model	Whisper fine-tuned specifically for stuttered speech (Mandarin)
🎙️ Whisper Large V3	Baseline Model	OpenAI's Whisper Large V3 model via HF Inference API
🔊 Whisper Large V3 Turbo	Baseline Model	OpenAI's Whisper Large V3 Turbo (faster) via HF Inference API

📋 Requirements

Python 3.9+
Hugging Face API key
Docker (optional, for containerized deployment)

🔑 Environment Setup

Create a .env file in the project root with your Hugging Face credentials:

HF_ENDPOINT=https://your-endpoint-url.aws.endpoints.huggingface.cloud
HF_API_KEY=hf_your_api_key_here

Variable	Description
`HF_ENDPOINT`	Your dedicated Hugging Face Inference Endpoint URL for StutteredSpeechASR
`HF_API_KEY`	Your Hugging Face API token (get one at huggingface.co/settings/tokens)

🚀 Quick Start

Option 1: Run with Docker (Recommended)

Create your .env file with HuggingFace credentials (see above)
Build and run with Docker Compose
```
docker compose up --build
```
Open your browser and navigate to http://localhost:7860

Option 2: Run Locally

Clone the repository
```
git clone <your-repo-url>
cd asr_demo
```

Create a virtual environment (recommended)

python -m venv venv

# Windows
venv\Scripts\activate

# Linux/macOS
source venv/bin/activate

Install dependencies
```
pip install -r requirements.txt
```
Create your .env file with HuggingFace credentials (see Environment Setup above)
Run the application
```
python app.py
```
Open your browser and navigate to http://localhost:7860

🧪 Research Notes

Target Language: The StutteredSpeechASR model is specifically trained for Mandarin Chinese
Use Cases: Research demonstration, stuttered speech analysis, comparative ASR evaluation
Best Results: Use clear audio recordings for optimal model performance
Baseline Comparison: The Whisper models may struggle with stuttered speech patterns that StutteredSpeechASR handles well

📚 References

Downloads last month: 16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support