--- license: mit tags: - spam - text-classification - scikit-learn - tfidf - spaCy - logistic-regression language: en datasets: custom model-index: - name: Spam Classifier (Scikit-learn + spaCy) results: [] --- # 📧 Spam Classifier (Scikit-learn + spaCy) This model classifies messages as **spam** or **ham** using traditional NLP techniques. ## 🧠 Model Details - **Preprocessing**: Tokenization + Lemmatization using spaCy - **Vectorization**: TF-IDF (1-2 grams) - **Feature Selection**: Chi2 with top 1000 features - **Model**: Logistic Regression (`class_weight="balanced"`, `max_iter=1000`) - **Performance**: ~87% accuracy on balanced test set (800 spam, 800 ham) ## 📦 Files - `spam_classifier_bundle.joblib`: Includes trained model, TF-IDF vectorizer, label encoder, and feature selector ## 📥 Load Model (Example) ```python from huggingface_hub import hf_hub_download import joblib bundle = joblib.load(hf_hub_download("mageshcruz/spam-classifier-scikit", "spam_classifier.joblib")) model = bundle["model"] vector = bundle["vectorizer"] selector = bundle["selector"] le = bundle["label_encoder"]