Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2511.09515

Multimodal Agent

about 4 hours ago

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25 • 29
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

Continual Learning - VLA

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

Paper • 2509.09372 • Published Sep 11 • 239
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models

Paper • 2510.01623 • Published Oct 2 • 10
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2 • 225
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models

Paper • 2511.09515 • Published 25 days ago • 17

Chain of thought

NousResearch/Hermes-4-70B-FP8

Text Generation • 71B • Updated Sep 12 • 300 • 25
NousResearch/Hermes-4-405B-FP8

Text Generation • 406B • Updated Sep 2 • 1.04k • 21
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models

Paper • 2508.21365 • Published Aug 29 • 29
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent

Paper • 2509.15566 • Published Sep 19 • 14

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Paper • 2410.14059 • Published Oct 17, 2024 • 62
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7 • 46
Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6 • 96
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published Mar 13 • 53

WMPO: World Model-based Policy Optimization for Vision-Language-Action Models

Paper • 2511.09515 • Published 25 days ago • 17
Robot Learning from a Physical World Model

Paper • 2511.07416 • Published 27 days ago • 29
WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning

Paper • 2512.02425 • Published 6 days ago • 22

Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning

Paper • 2509.22601 • Published Sep 26 • 29
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation

Paper • 2509.25849 • Published Sep 30 • 47
GEM: A Gym for Agentic LLMs

Paper • 2510.01051 • Published Oct 1 • 89
Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training

Paper • 2510.04996 • Published Oct 6 • 15

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26 • 40
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation

Paper • 2509.05263 • Published Sep 5 • 10
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

Paper • 2510.00406 • Published Oct 1 • 65
GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Paper • 2510.19430 • Published Oct 22 • 47

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Multimodal Agent

about 4 hours ago

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25 • 29
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

WMPO: World Model-based Policy Optimization for Vision-Language-Action Models

Paper • 2511.09515 • Published 25 days ago • 17
Robot Learning from a Physical World Model

Paper • 2511.07416 • Published 27 days ago • 29
WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning

Paper • 2512.02425 • Published 6 days ago • 22

Continual Learning - VLA

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

Paper • 2509.09372 • Published Sep 11 • 239
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models

Paper • 2510.01623 • Published Oct 2 • 10
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2 • 225
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models

Paper • 2511.09515 • Published 25 days ago • 17

Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning

Paper • 2509.22601 • Published Sep 26 • 29
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation

Paper • 2509.25849 • Published Sep 30 • 47
GEM: A Gym for Agentic LLMs

Paper • 2510.01051 • Published Oct 1 • 89
Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training

Paper • 2510.04996 • Published Oct 6 • 15

Chain of thought

NousResearch/Hermes-4-70B-FP8

Text Generation • 71B • Updated Sep 12 • 300 • 25
NousResearch/Hermes-4-405B-FP8

Text Generation • 406B • Updated Sep 2 • 1.04k • 21
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models

Paper • 2508.21365 • Published Aug 29 • 29
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent

Paper • 2509.15566 • Published Sep 19 • 14

WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26 • 40
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation

Paper • 2509.05263 • Published Sep 5 • 10
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

Paper • 2510.00406 • Published Oct 1 • 65
GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Paper • 2510.19430 • Published Oct 22 • 47

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Paper • 2410.14059 • Published Oct 17, 2024 • 62
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7 • 46
Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6 • 96
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published Mar 13 • 53

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs