iitb-en-indic-only-punct

This model is a fine-tuned version of ai4bharat/indictrans2-en-indic-dist-200M for English-to-Marathi translation, specifically optimized for punctuation robustness.

It was introduced in the paper Assessing and Improving Punctuation Robustness in English-Marathi Machine Translation.

Model description

Traditional Machine Translation (MT) systems often struggle with punctuation-ambiguous text (e.g., "Let's eat Grandma" vs "Let's eat, Grandma"). This model addresses this issue by being fine-tuned on punctuation-varied data derived from the IITB-ENG-MAR dataset.

It corresponds to Approach 2 (Direct Fine-tuning) described in the research, where the base MT model is trained to implicitly learn context and resolve semantic and structural ambiguities caused by missing or inconsistent punctuation in the source English text.

Intended uses & limitations

This model is intended for translating English sentences into Marathi, particularly when the source text might have missing punctuation that changes the intended meaning.

Training and evaluation data

The model was fine-tuned using the IITB-ENG-MAR dataset. The performance was evaluated on the Virām benchmark, which consists of 54 manually curated, punctuation-ambiguous instances.

Training results

It achieves the following results on the evaluation set:

  • Loss: 0.3627
  • Bleu: 10.7712
  • Chrfpp: 33.1021
  • Comet: 0.5425
  • Gen Len: 20.8714
Training Loss Epoch Step Validation Loss Bleu Chrfpp Comet Bleurt Gen Len
0.4323 0.5059 6000 0.4048 9.7554 31.7923 0.5339 None 20.8746
0.3522 1.0119 12000 0.3882 10.0952 32.1519 0.5367 None 20.8721
0.3608 1.5178 18000 0.3779 10.2006 32.4109 0.5373 None 20.875
0.3362 2.0238 24000 0.3711 10.3061 32.5527 0.5392 None 20.8721
0.3196 2.5297 30000 0.3700 10.4817 32.7072 0.5395 None 20.8731
0.3029 3.0357 36000 0.3676 10.5911 32.8459 0.5397 None 20.8746
0.3049 3.5416 42000 0.3647 10.5533 32.8685 0.5415 None 20.8727
0.2705 4.0476 48000 0.3644 10.6712 32.9543 0.5417 None 20.8692
0.2819 4.5535 54000 0.3622 10.6249 32.9145 0.5414 None 20.8706
0.2567 5.0594 60000 0.3646 10.6345 32.9606 0.5414 None 20.8705
0.2783 5.5654 66000 0.3607 10.6848 33.046 0.5425 None 20.8697
0.2589 6.0713 72000 0.3633 10.7223 33.0218 0.542 None 20.8711
0.2702 6.5773 78000 0.3613 10.7778 33.0402 0.542 None 20.8717
0.256 7.0832 84000 0.3628 10.7432 33.0965 0.5425 None 20.8703
0.2512 7.5892 90000 0.3627 10.7712 33.1021 0.5425 None 20.8714

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 8

Framework versions

  • Transformers 4.53.2
  • Pytorch 2.4.0a0+f70bd71a48.nv24.06
  • Datasets 2.21.0
  • Tokenizers 0.21.4
Downloads last month
9
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for thenlpresearcher/iitb-en-indic-only-punct

Finetuned
(8)
this model

Space using thenlpresearcher/iitb-en-indic-only-punct 1

Paper for thenlpresearcher/iitb-en-indic-only-punct