Improve model card: Add comprehensive metadata, paper link, and GitHub details
#1
by
nielsr
HF Staff
- opened
This PR significantly enhances the model card for the GT-GRPO: Qwen3-4B-Base trained on OpenRS model.
Key improvements include:
- Metadata: Added
pipeline_tag: text-generationfor proper categorization and discoverability on the Hub. Includedlibrary_name: transformersto enable the automated "How to use" widget, as evidenced by theconfig.jsonfile showingQwen3ForCausalLMandtransformers_version. Additional tags likeqwen,reasoning,self-supervised-learning, andreinforcement-learninghave been added to further describe the model and its training methodology. - Training Dataset: Specified
datasets: TMLR-Group-HF/Co-rewarding-RephrasedOpenRSto provide direct context about the training data, as referenced in the accompanying GitHub repository. - Paper Integration: The model card now prominently features the paper title, "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models", along with a summary of its abstract to give users immediate insight into the model's research context.
- GitHub Repository: Made the link to the official GitHub repository, https://github.com/tmlr-group/Co-rewarding, more prominent for easy access to code and additional information.
- Citation: Included the BibTeX citation provided in the GitHub README for proper academic attribution.
- Sample Usage: A sample usage code snippet has not been included, as no inference code for this specific trained model was found in the provided GitHub README, adhering to the project's guidelines against generating custom code.
These changes aim to make the model more informative, discoverable, and user-friendly for the Hugging Face community.
Geraldxm
changed pull request status to
merged