CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published 13 days ago • 339
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models Paper • 2603.25716 • Published 16 days ago • 153
Repurposing Geometric Foundation Models for Multi-view Diffusion Paper • 2603.22275 • Published 19 days ago • 47
PEARL: Personalized Streaming Video Understanding Model Paper • 2603.20422 • Published 22 days ago • 40
Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models Paper • 2603.22212 • Published 19 days ago • 126
Group3D: MLLM-Driven Semantic Grouping for Open-Vocabulary 3D Object Detection Paper • 2603.21944 • Published 20 days ago • 26
Versatile Editing of Video Content, Actions, and Dynamics without Training Paper • 2603.17989 • Published 24 days ago • 17
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published 25 days ago • 109
Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models Paper • 2603.17051 • Published 25 days ago • 108
Video-CoE: Reinforcing Video Event Prediction via Chain of Events Paper • 2603.14935 • Published 27 days ago • 91
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild Paper • 2603.17187 • Published 25 days ago • 137
OpenMonoGS-SLAM: Monocular Gaussian Splatting SLAM with Open-set Semantics Paper • 2512.08625 • Published Dec 9, 2025 • 1
MosaicMem: Hybrid Spatial Memory for Controllable Video World Models Paper • 2603.17117 • Published 25 days ago • 87
InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion Paper • 2512.17504 • Published Dec 19, 2025 • 99
V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning Paper • 2603.14482 • Published 27 days ago • 30
EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing Paper • 2603.19224 • Published 23 days ago • 18
Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models Paper • 2603.18002 • Published 24 days ago • 13