Cervical Cancer Multimodal Classifier
Model Description
This is an advanced multimodal model that classifies cervical cancer using both:
- Visual features from histopathological images (Vision Transformer)
- Morphological features from tabular data (20 hand-crafted features)
Model Architecture
ββββββββββββββββββββ ββββββββββββββββββββββ
β Histopath. β β Tabular Features β
β Image (BMP) β β (20 features) β
ββββββββββ¬ββββββββββ ββββββββββ¬ββββββββββββ
β β
β β
βΌ βΌ
βββββββββββββββββ βββββββββββββββ
β ViT-base β β MLP β
β (768 dims) β β (64 dims) β
ββββββββ¬βββββββββ ββββββ¬βββββββββ
β β
ββββββββββ¬ββββββββββββββββ
β
βΌ
ββββββββββββββββββββ
β Fusion Layer β
β (512 -> 256) β
ββββββββββ¬ββββββββββ
β
βΌ
ββββββββββββββββββββ
β Output (7) β
β Classes β
ββββββββββββββββββββ
Supported Classes
- carcinoma_in_situ - Carcinoma in situ
- light_dysplastic - Light dysplastic
- moderate_dysplastic - Moderate dysplastic
- normal_columnar - Normal columnar
- normal_intermediate - Normal intermediate
- normal_superficiel - Normal superficial
- severe_dysplastic - Severe dysplastic
Performance
| Metric | Value |
|---|---|
| Test Accuracy | 0.6594 |
| Test F1-Score | 0.6571 |
| Weighted Precision | 0.6558 |
Training Details
- Dataset: Smear2005 (Herlev Colposcopy) [https://mde-lab.aegean.gr/index.php/downloads/]
- Vision Backbone: google/vit-base-patch16-224
- Training Epochs: 50 (with early stopping at 10)
- Batch Size: 16
- Learning Rate: 2e-5 (AdamW)
- Scheduler: CosineAnnealingLR
- Hardware: NVIDIA T4 GPU on Google Colab
num_epochs = 50 best_val_accuracy = 0.6376811594202898 patience = 10 patience_counter = 10
Tabular Features
The model uses 20 morphological features extracted from nuclei analysis:
- Nucleus Area: Kerne_A
- Cytoplasm Area: Cyto_A
- Nucleus-Cytoplasm Ratio: K/C
- Y-coordinates: Kerne_Ycol, Cyto_Ycol
- Morphological indices: KerneShort, KerneLong, KerneElong, KerneRund
- Perimeter: KernePeri, CytoPeri
- Size ratios: KerneMax, KerneMin, CytoMax, CytoMin
- Position: KernePos
Features are StandardScaler normalized using training set statistics.
Usage
Installation
pip install torch transformers pillow scikit-learn
Quick Start
import torch
from PIL import Image
import numpy as np
from sklearn.preprocessing import StandardScaler
# Load model
model = torch.load('multimodal_cervical_model.pt')
# Your image and tabular data
image = Image.open('sample.BMP')
tabular_features = {
'Kerne_A': 803.5,
'Cyto_A': 27804.125,
# ... 18 more features
}
# Predict
predictions = predict_multimodal(image, tabular_features, ...)
Advantages
β
Multimodal Fusion: Combines spatial-visual features with quantitative morphological data
β
Robustness: Less prone to overfitting than single-modality models
β
Interpretability: Features are human-interpretable (sizes, ratios, etc.)
β
Scalability: Can add more modalities (ultrasound, genetic data, etc.)
Limitations
β οΈ Limited to 7 classes (specific dataset)
β οΈ Requires both image and tabular data for inference
β οΈ Image input must be histopathological cervical samples
Citation
If you use this model, please cite:
@misc{cervical_multimodal_2025,
title = {Cervical Cancer Multimodal Classifier},
author = {Sastelvio MANUEL},
year = 2025,
howpublished = {\url{https://huggingface.co/sastelvio/cervical-cancer-multimodal-vit}}
}
Disclaimer
β οΈ Medical Use Only Under Professional Supervision
This model is for research and educational purposes. It should NOT be used for clinical diagnosis without:
- Validation by medical professionals
- Proper regulatory approval
- Thorough clinical testing
- Integration with clinical workflows
Author
[Sastelvio MANUEL]
Portfolio: [https://github.com/sastelvio]
License
MIT License - See LICENSE file for details
Last updated: 19 December 2025
- Downloads last month
- 1