PROMPT GUARD SECBERT

PROMPT GUARD SECBERT is a security-focused NLP model built on top of BERT to detect prompt injection, jailbreak attempts, and unsafe prompts in Large Language Model (LLM) applications.

It is designed for AI safety, enterprise LLM deployments, and production security pipelines.

πŸ” Problem Overview

Modern LLM applications are vulnerable to:

  • Prompt injection attacks
  • Jailbreak attempts
  • Unsafe or malicious user prompts

These attacks can bypass system instructions, leak sensitive information, and produce unsafe outputs.

PROMPT GUARD SECBERT provides an automated defense layer by classifying user prompts before they reach an LLM.

πŸš€ Key Features

  • BERT-based text-classification model
  • Detects safe vs malicious prompts
  • Covers multiple attack categories
  • Hugging Face–ready training & inference
  • Modular, production-grade codebase
  • Apache-2.0 licensed

🧠 Model Details

Model Name: PROMPT GUARD SECBERT
Base Architecture: BERT (Sequence Classification)
Pipeline Tag: text-classification
Task: Prompt security classification

Supported Classes: safe, jailbreak, injection, unsafe

πŸ“œ License

Apache License 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support