YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

πŸ—£οΈ StutteredSpeechASR Research Demo

A Gradio-based research demonstration showcasing StutteredSpeechASR, a Whisper model fine-tuned specifically for stuttered speech recognition (Mandarin). Compare its performance against baseline Whisper models to see the improvement on stuttered speech patterns.

Python Gradio Research

🎯 Features

  • StutteredSpeechASR Research: Showcases fine-tuned Whisper model specifically designed for stuttered speech
  • Comparative Analysis: Side-by-side comparison with baseline Whisper models
  • Audio Input Flexibility: Record via microphone or upload audio files
  • Specialized for Stuttered Speech: Better handling of repetitions, prolongations, and blocks
  • Clean Interface: Organized model cards with clear transcription results
  • Lightweight Deployment: All inference via Hugging Face APIs - no GPU required

πŸ€– Models Included

Model Type Description
πŸ—£οΈ StutteredSpeechASR Fine-tuned Research Model Whisper fine-tuned specifically for stuttered speech (Mandarin)
πŸŽ™οΈ Whisper Large V3 Baseline Model OpenAI's Whisper Large V3 model via HF Inference API
πŸ”Š Whisper Large V3 Turbo Baseline Model OpenAI's Whisper Large V3 Turbo (faster) via HF Inference API

πŸ“‹ Requirements

  • Python 3.9+
  • Hugging Face API key
  • Docker (optional, for containerized deployment)

πŸ”‘ Environment Setup

Create a .env file in the project root with your Hugging Face credentials:

HF_ENDPOINT=https://your-endpoint-url.aws.endpoints.huggingface.cloud
HF_API_KEY=hf_your_api_key_here
Variable Description
HF_ENDPOINT Your dedicated Hugging Face Inference Endpoint URL for StutteredSpeechASR
HF_API_KEY Your Hugging Face API token (get one at huggingface.co/settings/tokens)

πŸš€ Quick Start

Option 1: Run with Docker (Recommended)

  1. Create your .env file with HuggingFace credentials (see above)

  2. Build and run with Docker Compose

    docker compose up --build
    
  3. Open your browser and navigate to http://localhost:7860

Option 2: Run Locally

  1. Clone the repository

    git clone <your-repo-url>
    cd asr_demo
    
  2. Create a virtual environment (recommended)

    python -m venv venv
    
    # Windows
    venv\Scripts\activate
    
    # Linux/macOS
    source venv/bin/activate
    
  3. Install dependencies

    pip install -r requirements.txt
    
  4. Create your .env file with HuggingFace credentials (see Environment Setup above)

  5. Run the application

    python app.py
    
  6. Open your browser and navigate to http://localhost:7860

πŸ§ͺ Research Notes

  • Target Language: The StutteredSpeechASR model is specifically trained for Mandarin Chinese
  • Use Cases: Research demonstration, stuttered speech analysis, comparative ASR evaluation
  • Best Results: Use clear audio recordings for optimal model performance
  • Baseline Comparison: The Whisper models may struggle with stuttered speech patterns that StutteredSpeechASR handles well

πŸ“š References

Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support