Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2506.21539

Vision Language Models for Robotics

Unified Vision-Language-Action Model

Paper • 2506.19850 • Published Jun 24 • 27
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2 • 143
3D-VLA: A 3D Vision-Language-Action Generative World Model

Paper • 2403.09631 • Published Mar 14, 2024 • 11
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots

Paper • 2312.14457 • Published Dec 22, 2023 • 1

Continual Learning - VLA

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

Paper • 2509.09372 • Published Sep 11 • 239
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models

Paper • 2510.01623 • Published Oct 2 • 10
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2 • 225
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models

Paper • 2511.09515 • Published 27 days ago • 18

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26 • 40
Fast and Simplex: 2-Simplicial Attention in Triton

Paper • 2507.02754 • Published Jul 3 • 26
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction

Paper • 2507.02025 • Published Jul 2 • 35
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

Paper • 2507.00951 • Published Jul 1 • 24

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26 • 40

SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning

Paper • 2504.09081 • Published Apr 12 • 16
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Paper • 2504.13180 • Published Apr 17 • 19
Sekai: A Video Dataset towards World Exploration

Paper • 2506.15675 • Published Jun 18 • 65
WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26 • 40

Multimodal Agent

about 2 hours ago

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25 • 29
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

June 2025 - Top Papers

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

Paper • 2506.07044 • Published Jun 8 • 114
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

Paper • 2506.09513 • Published Jun 11 • 100
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10 • 104

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26 • 40
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation

Paper • 2509.05263 • Published Sep 5 • 10
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

Paper • 2510.00406 • Published Oct 1 • 65
GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Paper • 2510.19430 • Published Oct 22 • 48

Dynamic View Synthesis as an Inverse Problem

Paper • 2506.08004 • Published Jun 9 • 5
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition

Paper • 2506.17201 • Published Jun 20 • 57
WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26 • 40

MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations

Paper • 2504.07830 • Published Apr 10 • 18
WORLDMEM: Long-term Consistent World Simulation with Memory

Paper • 2504.12369 • Published Apr 16 • 35
Towards a Unified Copernicus Foundation Model for Earth Vision

Paper • 2503.11849 • Published Mar 14 • 4
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory

Paper • 2506.18903 • Published Jun 23 • 22

Vision Language Models for Robotics

Unified Vision-Language-Action Model

Paper • 2506.19850 • Published Jun 24 • 27
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2 • 143
3D-VLA: A 3D Vision-Language-Action Generative World Model

Paper • 2403.09631 • Published Mar 14, 2024 • 11
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots

Paper • 2312.14457 • Published Dec 22, 2023 • 1

Multimodal Agent

about 2 hours ago

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25 • 29
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

Continual Learning - VLA

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

Paper • 2509.09372 • Published Sep 11 • 239
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models

Paper • 2510.01623 • Published Oct 2 • 10
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2 • 225
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models

Paper • 2511.09515 • Published 27 days ago • 18

June 2025 - Top Papers

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

Paper • 2506.07044 • Published Jun 8 • 114
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

Paper • 2506.09513 • Published Jun 11 • 100
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10 • 104

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26 • 40
Fast and Simplex: 2-Simplicial Attention in Triton

Paper • 2507.02754 • Published Jul 3 • 26
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction

Paper • 2507.02025 • Published Jul 2 • 35
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

Paper • 2507.00951 • Published Jul 1 • 24

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26 • 40
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation

Paper • 2509.05263 • Published Sep 5 • 10
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

Paper • 2510.00406 • Published Oct 1 • 65
GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Paper • 2510.19430 • Published Oct 22 • 48

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26 • 40

Dynamic View Synthesis as an Inverse Problem

Paper • 2506.08004 • Published Jun 9 • 5
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition

Paper • 2506.17201 • Published Jun 20 • 57
WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26 • 40

SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning

Paper • 2504.09081 • Published Apr 12 • 16
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Paper • 2504.13180 • Published Apr 17 • 19
Sekai: A Video Dataset towards World Exploration

Paper • 2506.15675 • Published Jun 18 • 65
WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26 • 40

MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations

Paper • 2504.07830 • Published Apr 10 • 18
WORLDMEM: Long-term Consistent World Simulation with Memory

Paper • 2504.12369 • Published Apr 16 • 35
Towards a Unified Copernicus Foundation Model for Earth Vision

Paper • 2503.11849 • Published Mar 14 • 4
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory

Paper • 2506.18903 • Published Jun 23 • 22

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs