Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs Paper • 2603.16932 • Published Mar 14 • 87
HiMu: Hierarchical Multimodal Frame Selection for Long Video Question Answering Paper • 2603.18558 • Published Mar 19 • 10