train_svamp_42_1763998316

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the svamp dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0801
  • Num Input Tokens Seen: 716448

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.4134 0.5 79 0.3151 36256
0.1026 1.0 158 0.1312 71568
0.0666 1.5 237 0.1124 107504
0.0631 2.0 316 0.0935 143232
0.0566 2.5 395 0.0903 178848
0.0849 3.0 474 0.0801 214912
0.0247 3.5 553 0.0857 250784
0.1055 4.0 632 0.0897 286448
0.0262 4.5 711 0.0917 322448
0.041 5.0 790 0.0956 358176
0.0311 5.5 869 0.0972 394336
0.0191 6.0 948 0.1051 429728
0.0031 6.5 1027 0.1163 465376
0.0171 7.0 1106 0.1116 501504
0.0098 7.5 1185 0.1117 537248
0.037 8.0 1264 0.1199 573120
0.0117 8.5 1343 0.1234 609248
0.004 9.0 1422 0.1229 644944
0.0022 9.5 1501 0.1233 680880
0.0036 10.0 1580 0.1245 716448

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.1+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for rbelanec/train_svamp_42_1763998316

Adapter
(509)
this model

Evaluation results