--- language: - eng license: apache-2.0 tags: - text-classification pipeline_tag: text-classification --- # xlmr-large-classifier-pinocchio_it_tra1-eng - MT/HT Classifier This model is a fine-tuned version of [`FacebookAI/xlm-roberta-large`](https://huggingface.co/FacebookAI/xlm-roberta-large) for distinguishing between Machine Translated (MT) and Human Translated (HT) text (or HT1 and HT2 if using two different human translators). Training data: * Train: 1490, for each label: 745 * Validation: 164, for each label: 82 * Test: 214, for each label: 107 Results on the held-out test set: * Accuracy: 0.9065 * F1-Score: 0.9099 * Precision: 0.8783 * Recall: 0.9439 ## label mapping Label MT: 0 Label PE: 1 (this is the human translator) ## Info Upload date: 2025-04-30 00:00 ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("DanielSc4/xlmr-large-classifier-pinocchio_it_tra1-eng") model = AutoModelForSequenceClassification.from_pretrained("DanielSc4/xlmr-large-classifier-pinocchio_it_tra1-eng") device = torch.device("cuda" if torch.cuda.is_available() else "cpu") inp = tokenizer('This is a test', return_tensors='pt').to(device) model = model.to(device) out = model(**inp) logits = out.logits probs = logits.softmax(dim=-1) pred = probs.argmax(dim=-1).item() print("Predicted class: " + str(pred)) # 0 for MT, 1 for PE ```