iitb-en-indic-only-punct
This model is a fine-tuned version of ai4bharat/indictrans2-en-indic-dist-200M for English-to-Marathi translation, specifically optimized for punctuation robustness.
It was introduced in the paper Assessing and Improving Punctuation Robustness in English-Marathi Machine Translation.
Model description
Traditional Machine Translation (MT) systems often struggle with punctuation-ambiguous text (e.g., "Let's eat Grandma" vs "Let's eat, Grandma"). This model addresses this issue by being fine-tuned on punctuation-varied data derived from the IITB-ENG-MAR dataset.
It corresponds to Approach 2 (Direct Fine-tuning) described in the research, where the base MT model is trained to implicitly learn context and resolve semantic and structural ambiguities caused by missing or inconsistent punctuation in the source English text.
- Paper: Assessing and Improving Punctuation Robustness in English-Marathi Machine Translation
- GitHub Repository: Viram_Marathi
- Language Pair: English (Latin script) to Marathi (Devanagari script)
Intended uses & limitations
This model is intended for translating English sentences into Marathi, particularly when the source text might have missing punctuation that changes the intended meaning.
Training and evaluation data
The model was fine-tuned using the IITB-ENG-MAR dataset. The performance was evaluated on the Virām benchmark, which consists of 54 manually curated, punctuation-ambiguous instances.
Training results
It achieves the following results on the evaluation set:
- Loss: 0.3627
- Bleu: 10.7712
- Chrfpp: 33.1021
- Comet: 0.5425
- Gen Len: 20.8714
| Training Loss | Epoch | Step | Validation Loss | Bleu | Chrfpp | Comet | Bleurt | Gen Len |
|---|---|---|---|---|---|---|---|---|
| 0.4323 | 0.5059 | 6000 | 0.4048 | 9.7554 | 31.7923 | 0.5339 | None | 20.8746 |
| 0.3522 | 1.0119 | 12000 | 0.3882 | 10.0952 | 32.1519 | 0.5367 | None | 20.8721 |
| 0.3608 | 1.5178 | 18000 | 0.3779 | 10.2006 | 32.4109 | 0.5373 | None | 20.875 |
| 0.3362 | 2.0238 | 24000 | 0.3711 | 10.3061 | 32.5527 | 0.5392 | None | 20.8721 |
| 0.3196 | 2.5297 | 30000 | 0.3700 | 10.4817 | 32.7072 | 0.5395 | None | 20.8731 |
| 0.3029 | 3.0357 | 36000 | 0.3676 | 10.5911 | 32.8459 | 0.5397 | None | 20.8746 |
| 0.3049 | 3.5416 | 42000 | 0.3647 | 10.5533 | 32.8685 | 0.5415 | None | 20.8727 |
| 0.2705 | 4.0476 | 48000 | 0.3644 | 10.6712 | 32.9543 | 0.5417 | None | 20.8692 |
| 0.2819 | 4.5535 | 54000 | 0.3622 | 10.6249 | 32.9145 | 0.5414 | None | 20.8706 |
| 0.2567 | 5.0594 | 60000 | 0.3646 | 10.6345 | 32.9606 | 0.5414 | None | 20.8705 |
| 0.2783 | 5.5654 | 66000 | 0.3607 | 10.6848 | 33.046 | 0.5425 | None | 20.8697 |
| 0.2589 | 6.0713 | 72000 | 0.3633 | 10.7223 | 33.0218 | 0.542 | None | 20.8711 |
| 0.2702 | 6.5773 | 78000 | 0.3613 | 10.7778 | 33.0402 | 0.542 | None | 20.8717 |
| 0.256 | 7.0832 | 84000 | 0.3628 | 10.7432 | 33.0965 | 0.5425 | None | 20.8703 |
| 0.2512 | 7.5892 | 90000 | 0.3627 | 10.7712 | 33.1021 | 0.5425 | None | 20.8714 |
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 8
Framework versions
- Transformers 4.53.2
- Pytorch 2.4.0a0+f70bd71a48.nv24.06
- Datasets 2.21.0
- Tokenizers 0.21.4
- Downloads last month
- 9
Model tree for thenlpresearcher/iitb-en-indic-only-punct
Base model
ai4bharat/indictrans2-en-indic-dist-200M