Cosmos 3: Omnimodal World Models for Physical AI Paper • 2606.02800 • Published about 1 month ago • 138
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Paper • 2605.30280 • Published May 28 • 146
OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources Paper • 2605.29250 • Published May 28 • 79
VimRAG: Navigating Massive Visual Context in Retrieval-Augmented Generation via Multimodal Memory Graph Paper • 2602.12735 • Published Feb 13 • 8
Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models Paper • 2510.01304 • Published Oct 1, 2025 • 11
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published Jan 29 • 155
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published Jan 29 • 155
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking Paper • 2601.06487 • Published Jan 10 • 54
ArenaRL Collection Scaling RL for Open-Ended Agents via Tournamentbased Relative Ranking • 3 items • Updated Mar 2 • 6
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking Paper • 2601.06487 • Published Jan 10 • 54