train_svamp_42_1763998316

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the svamp dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.4134	0.5	79	0.3151	36256
0.1026	1.0	158	0.1312	71568
0.0666	1.5	237	0.1124	107504
0.0631	2.0	316	0.0935	143232
0.0566	2.5	395	0.0903	178848
0.0849	3.0	474	0.0801	214912
0.0247	3.5	553	0.0857	250784
0.1055	4.0	632	0.0897	286448
0.0262	4.5	711	0.0917	322448
0.041	5.0	790	0.0956	358176
0.0311	5.5	869	0.0972	394336
0.0191	6.0	948	0.1051	429728
0.0031	6.5	1027	0.1163	465376
0.0171	7.0	1106	0.1116	501504
0.0098	7.5	1185	0.1117	537248
0.037	8.0	1264	0.1199	573120
0.0117	8.5	1343	0.1234	609248
0.004	9.0	1422	0.1229	644944
0.0022	9.5	1501	0.1233	680880
0.0036	10.0	1580	0.1245	716448

Base model

Adapter

(509)

this model