Stabilizing Rubric Integration Training via Decoupled Advantage Normalization Paper • 2603.26535 • Published 16 days ago • 3
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation Paper • 2602.24286 • Published Feb 27 • 97
Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL Paper • 2602.03773 • Published Feb 3 • 13