LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks Paper • 2506.00411 • Published May 31, 2025 • 31
Improved Visual-Spatial Reasoning via R1-Zero-Like Training Paper • 2504.00883 • Published Apr 1, 2025 • 67
SIFT: Grounding LLM Reasoning in Contexts via Stickers Paper • 2502.14922 • Published Feb 19, 2025 • 32
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation Paper • 2502.05415 • Published Feb 8, 2025 • 20