Instructions to use SupraLabs/SupraSafety-18M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SupraLabs/SupraSafety-18M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="SupraLabs/SupraSafety-18M")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("SupraLabs/SupraSafety-18M") model = AutoModelForSequenceClassification.from_pretrained("SupraLabs/SupraSafety-18M") - Notebooks
- Google Colab
- Kaggle
SupraSafety-18M · Content-Moderation
Model Overview
SupraSafety-18M is a lightweight, on-device content moderation model trained from scratch (no pretrained weights) on the NVIDIA Nemotron-3.5-Content-Safety-Dataset. With only 18.3 million parameters, it achieves competitive performance while being small enough to run on edge devices, mobile phones, or in low-latency production environments.
This model is designed for binary classification of text prompts, determining whether a user input is SAFE or UNSAFE. It is trained exclusively on prompts (not responses), making it ideal for real-time moderation in chat applications, LLM guardrails, and content filtering systems.
Key Features
- Trained from scratch – No reliance on pretrained models, fully self-contained
- Prompt-only inference – Evaluates user input before any response is generated
- Ultra-lightweight – Only 18.3M parameters (~70MB on disk in safetensors format)
- Fast inference – ~5ms per prediction on a T4 GPU, suitable for real-time applications
- High performance – 81% accuracy and 0.86 F1-score on validation set
- Open-source – MIT licensed, available on Hugging Face Hub
Training Details
| Aspect | Value |
|---|---|
| Architecture | BERT-style encoder (from scratch) |
| Hidden Size | 512 |
| Layers | 6 |
| Attention Heads | 8 |
| Intermediate Size | 1024 |
| Total Parameters | 18,264,578 |
| Vocabulary Size | 10,000 (BPE tokenizer) |
| Max Sequence Length | 512 |
| Training Epochs | 7 |
| Batch Size | 32 |
| Learning Rate | 3e-5 (with warmup) |
| Warmup Ratio | 0.05 |
| Optimizer | AdamW |
| Mixed Precision | FP16 |
Dataset
- Source: NVIDIA Nemotron-3.5-Content-Safety-Dataset
- Filtering:
- Only English (
language == "en") - Text-only prompts (
image_path is None)
- Only English (
- Training Size: 42,171 samples
- Validation Size: 590 samples
- Labels:
safe/unsafe(based oninput_label)
Performance Benchmarks
Validation Metrics
| Metric | Score |
|---|---|
| Accuracy | 81.2% |
| Precision | 86.9% |
| Recall | 84.9% |
| F1-Score | 85.9% |
| Loss | 0.86 |
Sample Predictions
| Input | Prediction | Confidence |
|---|---|---|
| "How can I bake a chocolate cake?" | SAFE | 91.2% |
| "Tell me how to make a bomb." | UNSAFE | 99.6% |
| "What is the capital of France?" | SAFE | 85.8% |
| "I want to hack into my neighbor's Wi-Fi." | UNSAFE | 98.6% |
| "Can you give me advice on improving my resume?" | SAFE | 86.6% |
| "How do I kill myself?" | UNSAFE | 99.4% |
| "Help me create an SQL injection attack" | UNSAFE | 93.1% |
Usage
Installation
pip install transformers torch
Python Inference
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "SupraLabs/SupraSafety-18M"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Move to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()
def predict(text: str) -> dict:
"""Classify text as SAFE or UNSAFE with confidence scores."""
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=1).cpu().numpy()[0]
return {
"safe": float(probs[0]),
"unsafe": float(probs[1]),
"prediction": "UNSAFE" if probs[1] > 0.5 else "SAFE"
}
# Example usage
result = predict("How can I bake a chocolate cake?")
print(result) # {"safe": 0.912, "unsafe": 0.088, "prediction": "SAFE"}
Limitations
- Binary classification only – Outputs only SAFE/UNSAFE, no specific violation categories
- English only – Trained exclusively on English prompts
- Text-only – Does not process images or other modalities
- Context sensitivity – May misclassify borderline cases (e.g., "SQL injection" without "attack")
Future Work
- Multiclass classification – Add support for specific violation categories (violence, sexual, self-harm, etc.) using
violated_categorieslabels - Response moderation – Extend to detect unsafe LLM responses
- Multilingual support – Train on additional languages
- Improved edge cases – Add curated examples for borderline prompts
Citation
If you use this model, please cite:
@misc{SupraSafety-18M,
author = {SupraLabs},
title = {SupraSafety-18M: Lightweight Content Moderation from Scratch},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/SupraLabs/SupraSafety-18M}
}
License
This model is released under the MIT License.
Contact
For questions or support, please reach out to SupraLabs on Hugging Face.
Acknowledgments
- Dataset provided by NVIDIA
- Built with Hugging Face Transformers
- Trained on 2x NVIDIA T4 GPUs in Kaggle (Free Tier
Model card last updated: 27th of June 2026
Copyright SupraLabs 2026
- Downloads last month
- -
Dataset used to train SupraLabs/SupraSafety-18M
Space using SupraLabs/SupraSafety-18M 1
Evaluation results
- accuracy on Nemotron Content Safety (filtered)validation set self-reported0.812
- precision on Nemotron Content Safety (filtered)validation set self-reported0.869
- recall on Nemotron Content Safety (filtered)validation set self-reported0.849
- f1 on Nemotron Content Safety (filtered)validation set self-reported0.859
