PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference Paper • 2603.25730 • Published 4 days ago • 38
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents Paper • 2508.13186 • Published Aug 14, 2025 • 19
MDK12-Bench: A Comprehensive Evaluation of Multimodal Large Language Models on Multidisciplinary Exams Paper • 2508.06851 • Published Aug 9, 2025
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles Paper • 2508.16072 • Published Aug 22, 2025 • 4
Dialogue as Discovery: Navigating Human Intent Through Principled Inquiry Paper • 2510.27410 • Published Oct 31, 2025
SVBench: Evaluation of Video Generation Models on Social Reasoning Paper • 2512.21507 • Published Dec 25, 2025 • 8
Yume-1.5: A Text-Controlled Interactive World Generation Model Paper • 2512.22096 • Published Dec 26, 2025 • 61
ProSoftArena: Benchmarking Hierarchical Capabilities of Multimodal Agents in Professional Software Environments Paper • 2601.02399 • Published Dec 30, 2025
MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences Paper • 2601.07251 • Published Jan 12 • 11
World Craft: Agentic Framework to Create Visualizable Worlds via Text Paper • 2601.09150 • Published Jan 14 • 19
LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces Paper • 2602.14337 • Published Feb 15 • 14
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG Paper • 2603.23497 • Published 6 days ago • 86
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG Paper • 2603.23497 • Published 6 days ago • 86
LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces Paper • 2602.14337 • Published Feb 15 • 14
World Craft: Agentic Framework to Create Visualizable Worlds via Text Paper • 2601.09150 • Published Jan 14 • 19
MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences Paper • 2601.07251 • Published Jan 12 • 11
Yume-1.5: A Text-Controlled Interactive World Generation Model Paper • 2512.22096 • Published Dec 26, 2025 • 61