-
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 59 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 52 -
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper • 2505.03335 • Published • 188 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 187
Collections
Discover the best community collections!
Collections including paper arxiv:2506.01939
-
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI
Paper • 2505.19443 • Published • 15 -
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
Paper • 2506.19290 • Published • 52 -
CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks
Paper • 2105.12655 • Published -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 151
-
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
Paper • 2506.07044 • Published • 114 -
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Paper • 2506.09513 • Published • 100 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 104
-
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA
Paper • 2505.21115 • Published • 140 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 187 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 142
-
Large Reasoning Models Learn Better Alignment from Flawed Thinking
Paper • 2510.00938 • Published • 58 -
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT
Paper • 2509.19284 • Published • 22 -
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
Paper • 2509.25810 • Published • 5 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266
-
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 87 -
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 97 -
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Paper • 2506.06941 • Published • 15 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 187
-
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 187 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 142 -
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Paper • 2505.22617 • Published • 131
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 262 -
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 133 -
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Paper • 2506.05176 • Published • 74 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 276
-
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 59 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 52 -
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper • 2505.03335 • Published • 188 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 187
-
Large Reasoning Models Learn Better Alignment from Flawed Thinking
Paper • 2510.00938 • Published • 58 -
What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT
Paper • 2509.19284 • Published • 22 -
Learning to Reason as Action Abstractions with Scalable Mid-Training RL
Paper • 2509.25810 • Published • 5 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266
-
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI
Paper • 2505.19443 • Published • 15 -
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs
Paper • 2506.19290 • Published • 52 -
CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks
Paper • 2105.12655 • Published -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 151
-
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 87 -
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 97 -
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Paper • 2506.06941 • Published • 15 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 187
-
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
Paper • 2506.07044 • Published • 114 -
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Paper • 2506.09513 • Published • 100 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 104
-
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 187 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 142 -
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Paper • 2505.22617 • Published • 131
-
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA
Paper • 2505.21115 • Published • 140 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 187 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 142
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 262 -
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 133 -
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Paper • 2506.05176 • Published • 74 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 276