high-quality Chinese training datasets
Collection
a suite of high-quality Chinese datasets, used for pretraining, fine-tuning or preference alignment. And the models trained on these datasets. • 13 items • Updated
• 24
opencsg/csg-wukong-2b-smoltalk-chinese, using ultrafeedback-chinese-binarized as the DPO dataset.Base model
opencsg/csg-wukong-2b-chinese-fineweb-edu