The Art of Scaling Reinforcement Learning Compute for LLMs Paper • 2510.13786 • Published Oct 15 • 30
Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting Paper • 2510.08696 • Published Oct 9 • 14
Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting Paper • 2510.08696 • Published Oct 9 • 14 • 3
Rethinking Thinking Tokens: LLMs as Improvement Operators Paper • 2510.01123 • Published Oct 1 • 5 • 2
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT Paper • 2509.19284 • Published Sep 23 • 22
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT Paper • 2509.19284 • Published Sep 23 • 22
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT Paper • 2509.19284 • Published Sep 23 • 22 • 2
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails Paper • 2502.05163 • Published Feb 7 • 23
PILAF: Optimal Human Preference Sampling for Reward Modeling Paper • 2502.04270 • Published Feb 6 • 11
PILAF: Optimal Human Preference Sampling for Reward Modeling Paper • 2502.04270 • Published Feb 6 • 11 • 2
Teaching Large Language Models to Reason with Reinforcement Learning Paper • 2403.04642 • Published Mar 7, 2024 • 50
A Tale of Tails: Model Collapse as a Change of Scaling Laws Paper • 2402.07043 • Published Feb 10, 2024 • 16