EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation Paper • 2605.23271 • Published 8 days ago • 76
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published 18 days ago • 194
MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI Paper • 2605.08678 • Published 21 days ago • 9
Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces Paper • 2605.02801 • Published 26 days ago • 8
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published Apr 9 • 291
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 326