Summary
A fine-tuned ModernBERT-base model for multi-label subject classification of educational web text. Given a passage of text, it predicts which of 17 academic/professional subject categories apply.
Model Details
| Property |
Value |
| Base model |
answerdotai/ModernBERT-base |
| Architecture |
ModernBertForSequenceClassification |
| Task |
Multi-label classification |
| Number of labels |
17 |
| Max input length |
512 tokens |
| Hidden size |
768 |
| Attention heads |
12 |
| Transformer layers |
22 (alternating full + sliding window attention) |
| Pooling |
Mean pooling |
Labels
| Index |
Field |
Display Name |
| 0 |
mathematics_statistics |
Mathematics Statistics |
| 1 |
computer_science_software_engineering |
Computer Science Software Engineering |
| 2 |
machine_learning_ai |
Machine Learning AI |
| 3 |
physical_sciences |
Physical Sciences |
| 4 |
life_sciences_biology |
Life Sciences Biology |
| 5 |
medicine_health |
Medicine Health |
| 6 |
engineering_technology |
Engineering Technology |
| 7 |
business_economics |
Business Economics |
| 8 |
law_government |
Law Government |
| 9 |
social_sciences |
Social Sciences |
| 10 |
history_geography |
History Geography |
| 11 |
philosophy_ethics |
Philosophy Ethics |
| 12 |
education_pedagogy |
Education Pedagogy |
| 13 |
language_writing |
Language Writing |
| 14 |
arts_humanities |
Arts Humanities |
| 15 |
environmental_science_energy |
Environmental Science Energy |
| 16 |
personal_finance_practical_life |
Personal Finance Practical Life |
Training Data
- Source: HuggingFaceFW/fineweb-edu (CC-MAIN-2021-04 shard) plus ~50K rows from HuggingFaceFW/fineweb (10BT sample)
- Labels were generated by gpt-5-nano via the OpenAI Batch API (~$80 in batch credits)
- Data was split 80% train / 10% val / 10% test (random seed 42)
Training Configuration
| Hyperparameter |
Value |
| Epochs |
3 |
| Batch size |
32 |
| Learning rate |
2e-5 |
| Weight decay |
0.01 |
| Warmup ratio |
0.1 |
| Max token length |
512 |
| Optimizer |
AdamW |
| Scheduler |
Linear with warmup |
| AMP |
bf16 (on CUDA) |
| Gradient clipping |
max norm 1.0 |
Model checkpoint was saved at the epoch with the best validation micro-F1 (epoch 2).
Test Set Performance
| Metric |
Score |
| Micro F1 |
0.8545 |
| Macro F1 |
0.8264 |
| Precision (micro) |
0.8799 |
| Recall (micro) |
0.8304 |
| Loss |
0.1222 |