VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection Paper • 2505.20289 • Published May 26, 2025 • 11
MAOAM: Unified Object and Material Selection with Vision-Language Models Paper • 2606.04880 • Published 29 days ago • 10
From Plans to Pixels: Learning to Plan and Orchestrate for Open-Ended Image Editing Paper • 2605.15181 • Published May 14 • 12
Exploration and Exploitation Errors Are Measurable for Language Model Agents Paper • 2604.13151 • Published Apr 14 • 25
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models Paper • 2512.02231 • Published Dec 1, 2025 • 9
Visual Instruction Inversion: Image Editing via Visual Prompting Paper • 2307.14331 • Published Jul 26, 2023 • 1
Yo'LLaVA: Your Personalized Language and Vision Assistant Paper • 2406.09400 • Published Jun 13, 2024 • 2
YoChameleon: Personalized Vision and Language Generation Paper • 2504.20998 • Published Apr 29, 2025 • 12
X-Fusion: Introducing New Modality to Frozen Large Language Models Paper • 2504.20996 • Published Apr 29, 2025 • 13