MiniLM Hate Speech Detector

Fine-tuned MiniLM-L12 model for binary hate speech detection, optimized for browser deployment.

Model Description

This model classifies text as either hate speech or not hate speech (binary classification). It was fine-tuned on the HateXplain dataset and exported to ONNX format with INT8 quantization for efficient on-device inference in browser extensions.

Training Data

  • Dataset: HateXplain (~20K samples)
  • Sources: Twitter, Gab
  • Label Mapping:
    • hate (1): Hate speech
    • not_hate (0): Offensive + Normal content merged

Evaluation Results

Metric Score
F1 (Macro) 0.8302
Accuracy 0.8567
Precision 0.8342
Recall 0.8266

Model Comparison (RQ3)

Model F1 Size (Quantized)
MiniLM (this) 0.830 32.6 MB
DistilBERT 0.825 64 MB
TinyBERT 0.816 15 MB

Intended Use

  • Primary: Browser extension for real-time hate speech detection
  • Framework: Transformers.js for on-device inference
  • Privacy: 100% client-side processing, no data sent to servers

Usage

With Transformers.js (Browser)

import { pipeline } from '@huggingface/transformers';

const classifier = await pipeline(
  'text-classification',
  'TaiwoOgun/minilm-hate-speech-onnx'
);

const result = await classifier('Some text to classify');
console.log(result);
// [{ label: 'hate', score: 0.85 }]

With Python

from transformers import pipeline

classifier = pipeline('text-classification', 'TaiwoOgun/minilm-hate-speech')
result = classifier('Some text to classify')

Limitations

  • Language: English only
  • Binary: Does not distinguish between types of hate speech
  • Context: May miss context-dependent hate speech or sarcasm
  • Dataset Bias: Trained on Twitter/Gab data, may not generalize to all platforms

Citation

If you use this model, please cite:

@misc{ogunbanwo2025hatespeech,
  author = {Ogunbanwo, Taiwo},
  title = {Real-Time Hate Speech Detection Browser Extension},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/TaiwoOgun/minilm-hate-speech-onnx}
}

License

MIT License

Downloads last month
50
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train TaiwoOgun/minilm-hate-speech-onnx