Improve model card: Add comprehensive metadata, paper link, and GitHub details

by nielsr HF Staff - opened Oct 11, 2025

←

nielsr

Oct 11, 2025

This PR significantly enhances the model card for the GT-GRPO: Qwen3-4B-Base trained on OpenRS model.

Key improvements include:

Metadata: Added pipeline_tag: text-generation for proper categorization and discoverability on the Hub. Included library_name: transformers to enable the automated "How to use" widget, as evidenced by the config.json file showing Qwen3ForCausalLM and transformers_version. Additional tags like qwen, reasoning, self-supervised-learning, and reinforcement-learning have been added to further describe the model and its training methodology.
Training Dataset: Specified datasets: TMLR-Group-HF/Co-rewarding-RephrasedOpenRS to provide direct context about the training data, as referenced in the accompanying GitHub repository.
Paper Integration: The model card now prominently features the paper title, "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models", along with a summary of its abstract to give users immediate insight into the model's research context.
GitHub Repository: Made the link to the official GitHub repository, https://github.com/tmlr-group/Co-rewarding, more prominent for easy access to code and additional information.
Citation: Included the BibTeX citation provided in the GitHub README for proper academic attribution.
Sample Usage: A sample usage code snippet has not been included, as no inference code for this specific trained model was found in the provided GitHub README, adhering to the project's guidelines against generating custom code.

These changes aim to make the model more informative, discoverable, and user-friendly for the Hugging Face community.

Geraldxm changed pull request status to merged Oct 11, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment