de6f03b7d4e227b9e62a116b6b3f1276

This model is a fine-tuned version of google/mt5-base on the Helsinki-NLP/opus_books [it-pt] dataset. It achieves the following results on the evaluation set:

Loss: 2.0852
Data Size: 1.0
Epoch Runtime: 10.3061
Bleu: 7.3224

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	17.7007	0	1.5933	0.0275
No log	1	29	17.3946	0.0078	1.5225	0.0263
No log	2	58	16.8919	0.0156	2.0908	0.0341
No log	3	87	16.4773	0.0312	3.0697	0.0392
No log	4	116	15.6018	0.0625	3.1574	0.0460
No log	5	145	14.2311	0.125	3.5642	0.0385
1.4716	6	174	11.9765	0.25	4.6805	0.0481
1.4716	7	203	11.3884	0.5	6.4484	0.0618
1.4716	8.0	232	8.8137	1.0	10.3205	0.0628
7.7475	9.0	261	8.3286	1.0	10.6687	0.0267
7.7475	10.0	290	6.9823	1.0	10.7486	0.0336
8.7476	11.0	319	5.6225	1.0	11.9420	0.0131
8.7476	12.0	348	4.5765	1.0	9.3897	0.0641
6.5533	13.0	377	2.8908	1.0	10.0688	2.1317
4.2683	14.0	406	2.5515	1.0	10.7481	3.2022
4.2683	15.0	435	2.3799	1.0	11.1623	3.8545
3.2478	16.0	464	2.2810	1.0	11.3724	4.1447
3.2478	17.0	493	2.2309	1.0	11.7569	4.7750
2.8874	18.0	522	2.1997	1.0	12.3932	4.9295
2.6719	19.0	551	2.1806	1.0	13.3495	4.9502
2.6719	20.0	580	2.1659	1.0	9.4874	5.1724
2.5169	21.0	609	2.1288	1.0	9.7360	5.3784
2.5169	22.0	638	2.1352	1.0	10.3320	5.4165
2.4281	23.0	667	2.1226	1.0	10.7826	5.6347
2.4281	24.0	696	2.1197	1.0	11.8878	6.0645
2.3424	25.0	725	2.1225	1.0	11.6442	6.2068
2.2712	26.0	754	2.1028	1.0	11.6528	6.6706
2.2712	27.0	783	2.0924	1.0	12.6181	6.8102
2.1942	28.0	812	2.0966	1.0	9.7565	6.8578
2.1942	29.0	841	2.0858	1.0	9.8874	6.9900
2.1328	30.0	870	2.0917	1.0	10.3068	7.0286
2.1328	31.0	899	2.0874	1.0	11.5526	7.0950
2.0659	32.0	928	2.0902	1.0	11.5729	7.1123
1.9982	33.0	957	2.0778	1.0	11.5225	7.2337
1.9982	34.0	986	2.0830	1.0	11.8639	7.3778
1.9268	35.0	1015	2.0735	1.0	12.0117	7.4790
1.9268	36.0	1044	2.0832	1.0	13.9013	7.4273
1.906	37.0	1073	2.0853	1.0	9.4742	7.1218
1.8273	38.0	1102	2.0927	1.0	9.8980	7.4091
1.8273	39.0	1131	2.0852	1.0	10.3061	7.3224

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: -

Safetensors

Model size

1.0B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/de6f03b7d4e227b9e62a116b6b3f1276

Base model

google/mt5-base

Finetuned

(288)

this model