Datasets and models in: Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data.
-
Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data
Paper • 2504.09895 • Published • 1 -
mzhaoshuai/Llama-3.3-70B-Inst-awq_ultrafeedback_1in3
Viewer • Updated • 61.1k • 54 -
mzhaoshuai/Llama-3.3-70B-Inst-awq_SafeRLHF
Preview • Updated • 51 -
mzhaoshuai/NQ-Subset-500
Viewer • Updated • 500 • 24