---
license: mit
tags:
  - spam
  - text-classification
  - scikit-learn
  - tfidf
  - spaCy
  - logistic-regression
language: en
datasets: custom
model-index:
  - name: Spam Classifier (Scikit-learn + spaCy)
    results: []
---

# 📧 Spam Classifier (Scikit-learn + spaCy)

This model classifies messages as **spam** or **ham** using traditional NLP techniques.

## 🧠 Model Details

- **Preprocessing**: Tokenization + Lemmatization using spaCy
- **Vectorization**: TF-IDF (1-2 grams)
- **Feature Selection**: Chi2 with top 1000 features
- **Model**: Logistic Regression (`class_weight="balanced"`, `max_iter=1000`)
- **Performance**: ~87% accuracy on balanced test set (800 spam, 800 ham)

## 📦 Files

- `spam_classifier_bundle.joblib`: Includes trained model, TF-IDF vectorizer, label encoder, and feature selector

## 📥 Load Model (Example)

```python
from huggingface_hub import hf_hub_download
import joblib

bundle = joblib.load(hf_hub_download("mageshcruz/spam-classifier-scikit", "spam_classifier.joblib"))
model = bundle["model"]
vector = bundle["vectorizer"]
selector = bundle["selector"]
le = bundle["label_encoder"]