How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-classification", model="Ian-Khalzov/article-topic-service-scibert")
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("Ian-Khalzov/article-topic-service-scibert")
model = AutoModelForSequenceClassification.from_pretrained("Ian-Khalzov/article-topic-service-scibert")
Quick Links

Article Topic Service SciBERT

SciBERT text classifier for scientific article topic prediction from article title and abstract.

Labels

  • Artificial Intelligence
  • Natural Language Processing
  • Computer Vision
  • Machine Learning
  • Computer Science Theory and Algorithms
  • Mathematics
  • Statistics
  • Electrical Engineering
  • Astrophysics
  • Condensed Matter Physics
  • Quantum Physics
  • Quantitative Biology

Dataset

Balanced 12-class subset built from librarian-bots/arxiv-metadata-snapshot.

  • Train: 30,000 examples
  • Validation: 3,600 examples
  • Test: 3,600 examples

Metrics

  • Validation accuracy: 0.8350
  • Validation macro F1: 0.8351
  • Test accuracy: 0.8356
  • Test macro F1: 0.8351
  • Title-only test accuracy: 0.7522
  • Title-only test macro F1: 0.7495

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model_id = "Ian-Khalzov/article-topic-service-scibert"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

text = "Title: Large language models for scientific document classification\n\nAbstract: We study..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.inference_mode():
    probs = torch.softmax(model(**inputs).logits[0], dim=-1)

predicted_label = model.config.id2label[int(probs.argmax())]
print(predicted_label)

Notes

The current baseline is strongest on physics-heavy classes and weakest on the broad Machine Learning category, where topical overlap with AI, NLP, CV, and Statistics remains high.

Downloads last month
8
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Ian-Khalzov/article-topic-service-scibert