---
language:
- eng
license: apache-2.0
tags:
- text-classification
pipeline_tag: text-classification
---


# xlmr-large-classifier-pinocchio_it_tra1-eng - MT/HT Classifier

This model is a fine-tuned version of [`FacebookAI/xlm-roberta-large`](https://huggingface.co/FacebookAI/xlm-roberta-large) for distinguishing between Machine Translated (MT) and Human Translated (HT) text
(or HT1 and HT2 if using two different human translators).


Training data:
* Train: 1490, for each label: 745
* Validation: 164, for each label: 82
* Test: 214, for each label: 107


Results on the held-out test set:
* Accuracy: 0.9065
* F1-Score: 0.9099
* Precision: 0.8783
* Recall: 0.9439

## label mapping
Label MT: 0

Label PE: 1 (this is the human translator)

## Info
Upload date: 2025-04-30 00:00

## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("DanielSc4/xlmr-large-classifier-pinocchio_it_tra1-eng")
model = AutoModelForSequenceClassification.from_pretrained("DanielSc4/xlmr-large-classifier-pinocchio_it_tra1-eng")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
inp = tokenizer('This is a test', return_tensors='pt').to(device)
model = model.to(device)

out = model(**inp)

logits = out.logits
probs = logits.softmax(dim=-1)
pred = probs.argmax(dim=-1).item()
print("Predicted class: " + str(pred)) # 0 for MT, 1 for PE
```