EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control Paper • 2511.15248 • Published Nov 19 • 6
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation Paper • 2511.06307 • Published Nov 9 • 51