-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 142 -
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 138 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 88
Collections
Discover the best community collections!
Collections including paper arxiv:2510.04871
-
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 535 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 491 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266 -
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
Paper • 2510.04618 • Published • 123
-
ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
Paper • 2510.22037 • Published • 19 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 491 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 535 -
Scaling Language-Centric Omnimodal Representation Learning
Paper • 2510.11693 • Published • 100
-
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Paper • 2510.14901 • Published • 47 -
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
Paper • 2505.23359 • Published • 39 -
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Paper • 2506.02397 • Published • 35 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 142
-
DoPE: Denoising Rotary Position Embedding
Paper • 2511.09146 • Published • 92 -
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
Paper • 2511.19365 • Published • 63 -
Latent Collaboration in Multi-Agent Systems
Paper • 2511.20639 • Published • 111 -
Video Generation Models Are Good Latent Reward Models
Paper • 2511.21541 • Published • 44
-
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 535 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 491 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 345 -
Intern-S1: A Scientific Multimodal Foundation Model
Paper • 2508.15763 • Published • 256
-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 142 -
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 138 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 88
-
DoPE: Denoising Rotary Position Embedding
Paper • 2511.09146 • Published • 92 -
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
Paper • 2511.19365 • Published • 63 -
Latent Collaboration in Multi-Agent Systems
Paper • 2511.20639 • Published • 111 -
Video Generation Models Are Good Latent Reward Models
Paper • 2511.21541 • Published • 44
-
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 535 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 491 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 266 -
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
Paper • 2510.04618 • Published • 123
-
ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
Paper • 2510.22037 • Published • 19 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 491 -
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 535 -
Scaling Language-Centric Omnimodal Representation Learning
Paper • 2510.11693 • Published • 100
-
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 535 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 491 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 345 -
Intern-S1: A Scientific Multimodal Foundation Model
Paper • 2508.15763 • Published • 256
-
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Paper • 2510.14901 • Published • 47 -
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
Paper • 2505.23359 • Published • 39 -
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Paper • 2506.02397 • Published • 35 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 142