Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model Paper • 2407.07053 • Published Jul 9, 2024 • 47
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published Jul 21, 2025 • 131