Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay
Paper • 2506.05316 • Published • 1
This model has been pushed to the Hub using the PytorchModelHubMixin integration: