XSkill: Continual Learning from Experience and Skills in Multimodal Agents Paper • 2603.12056 • Published 15 days ago • 32
AT^2PO: Agentic Turn-based Policy Optimization via Tree Search Paper • 2601.04767 • Published Jan 8 • 28
Evaluating Parameter Efficient Methods for RLVR Paper • 2512.23165 • Published Dec 29, 2025 • 28
Refusal Falls off a Cliff: How Safety Alignment Fails in Reasoning? Paper • 2510.06036 • Published Oct 7, 2025 • 7
OpenCUA: Open Foundations for Computer-Use Agents Paper • 2508.09123 • Published Aug 12, 2025 • 33