Llama-3.1-8B-Instruct-KTO-1000

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the bct_non_cot_kto_1000 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2016
  • Rewards/chosen: -0.2178
  • Logps/chosen: -18.1049
  • Logits/chosen: -3360374.5684
  • Rewards/rejected: -7.9685
  • Logps/rejected: -99.2175
  • Logits/rejected: -5993503.6952
  • Rewards/margins: 7.7507
  • Kl: 0.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Logps/chosen Logits/chosen Rewards/rejected Logps/rejected Logits/rejected Rewards/margins
0.4996 0.4444 50 0.4996 0.0049 -15.8775 -5421844.2105 0.0013 -19.5191 -7437597.2571 0.0036 5.5406
0.4926 0.8889 100 0.4927 0.0680 -15.2462 -5315816.4211 0.0091 -19.4418 -7413740.4952 0.0590 4.7451
0.3935 1.3333 150 0.3965 0.2993 -12.9332 -4366973.9789 -0.5726 -25.2580 -6947738.2095 0.8719 0.6684
0.288 1.7778 200 0.2868 0.4599 -11.3269 -3715966.9895 -1.8666 -38.1983 -6637966.6286 2.3265 0.0
0.2304 2.2222 250 0.2456 0.2811 -13.1157 -3821936.5053 -4.1256 -60.7884 -6486972.9524 4.4067 0.0
0.2265 2.6667 300 0.2277 0.2055 -13.8714 -3639481.9368 -5.1055 -70.5871 -6323365.7905 5.3110 0.0
0.1787 3.1111 350 0.2252 0.0093 -15.8332 -3060385.6842 -6.2024 -81.5565 -5682171.1238 6.2117 0.0
0.1818 3.5556 400 0.2285 0.0137 -15.7897 -2924462.4842 -6.4299 -83.8315 -5623589.1810 6.4436 0.0
0.1921 4.0 450 0.2127 -0.0080 -16.0069 -3297428.2105 -6.8889 -88.4215 -5958031.8476 6.8809 0.0
0.1945 4.4444 500 0.2114 -0.0668 -16.5945 -3297794.6947 -7.2699 -92.2313 -5972243.5048 7.2031 0.0
0.2105 4.8889 550 0.2067 -0.0350 -16.2766 -3147055.1579 -7.0596 -90.1288 -5862926.6286 7.0246 0.0
0.1921 5.3333 600 0.2064 -0.0570 -16.4969 -3241836.4632 -7.2054 -91.5865 -5997722.8190 7.1484 0.0
0.1614 5.7778 650 0.2070 -0.1499 -17.4258 -3228708.7158 -7.6014 -95.5464 -5918022.0952 7.4515 0.0
0.1896 6.2222 700 0.2123 -0.2047 -17.9736 -3418014.3158 -7.6625 -96.1570 -6086026.9714 7.4577 0.0
0.1631 6.6667 750 0.2076 -0.1804 -17.7305 -3385464.2526 -7.6603 -96.1349 -6043348.1143 7.4798 0.0
0.1704 7.1111 800 0.2064 -0.1567 -17.4936 -3383563.1158 -7.6349 -95.8816 -6061806.3238 7.4782 0.0
0.1902 7.5556 850 0.2029 -0.2018 -17.9440 -3373625.6 -7.8793 -98.3253 -6032148.7238 7.6775 0.0
0.174 8.0 900 0.2016 -0.2178 -18.1049 -3360374.5684 -7.9685 -99.2175 -5993503.6952 7.7507 0.0
0.2268 8.4444 950 0.2036 -0.2365 -18.2911 -3331174.4 -8.0276 -99.8082 -5953203.8095 7.7911 0.0
0.1646 8.8889 1000 0.2038 -0.2586 -18.5126 -3326715.9579 -8.0877 -100.4094 -5970805.6381 7.8291 0.0
0.1964 9.3333 1050 0.2038 -0.2629 -18.5557 -3347635.5368 -8.0931 -100.4632 -5967138.1333 7.8302 0.0
0.1483 9.7778 1100 0.2076 -0.2689 -18.6153 -3328483.0316 -8.0719 -100.2517 -5965142.5524 7.8031 0.0

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chchen/Llama-3.1-8B-Instruct-KTO-1000

Adapter
(1494)
this model