Qwen-AgentWorld: Language World Models for General Agents Paper • 2606.24597 • Published 6 days ago • 139
NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers? Paper • 2606.24530 • Published 6 days ago • 61
EnterpriseClawBench: Benchmarking Agents from Real Workplace Sessions Paper • 2606.23654 • Published 7 days ago • 79
MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling Paper • 2606.13473 • Published 18 days ago • 92
FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents Paper • 2606.12087 • Published 19 days ago • 77
ComBench: A Benchmark for Rigorous Proof Reasoning and Constructive Realization in Olympiad-Level Combinatorics Paper • 2606.10479 • Published 20 days ago • 19
SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects Paper • 2605.19587 • Published May 19 • 10
Draft-OPD: On-Policy Distillation for Speculative Draft Models Paper • 2605.29343 • Published May 28 • 36
COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation Paper • 2605.31264 • Published May 29 • 120
π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows Paper • 2605.14678 • Published May 19 • 108
EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL Paper • 2605.18703 • Published May 18 • 50
Post-Trained MoE Can Skip Half Experts via Self-Distillation Paper • 2605.18643 • Published May 18 • 30
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling Paper • 2605.13301 • Published May 13 • 165
DFlash Collection Block Diffusion for Flash Speculative Decoding • 23 items • Updated about 15 hours ago • 139