Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2509.07979

about 13 hours ago

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Paper • 2510.08540 • Published Oct 9 • 109
Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13 • 165
Spotlight on Token Perception for Multimodal Reinforcement Learning

Paper • 2510.09285 • Published Oct 10 • 36
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation

Paper • 2510.17354 • Published Oct 20 • 33

Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83
Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published Sep 26 • 70
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Paper • 2510.05034 • Published Oct 6 • 48

Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83

Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation

Paper • 2509.05263 • Published Sep 5 • 10
Symbolic Graphics Programming with Large Language Models

Paper • 2509.05208 • Published Sep 5 • 46
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Paper • 2509.12201 • Published Sep 15 • 104

Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83

Apriel-1.5-15b-Thinker

Paper • 2510.01141 • Published Oct 1 • 117
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Paper • 2509.21268 • Published Sep 25 • 103
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

Paper • 2509.00676 • Published Aug 31 • 84
Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83

POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion

Paper • 2509.01215 • Published Sep 1 • 50
Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83
Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis

Paper • 2509.09595 • Published Sep 11 • 48
Reconstruction Alignment Improves Unified Multimodal Models

Paper • 2509.07295 • Published Sep 8 • 40

Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9 • 101
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

Paper • 2509.03867 • Published Sep 4 • 211
Why Language Models Hallucinate

Paper • 2509.04664 • Published Sep 4 • 193

Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83

From Editor to Dense Geometry Estimator

Paper • 2509.04338 • Published Sep 4 • 92
Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83

about 13 hours ago

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Paper • 2510.08540 • Published Oct 9 • 109
Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13 • 165
Spotlight on Token Perception for Multimodal Reinforcement Learning

Paper • 2510.09285 • Published Oct 10 • 36
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation

Paper • 2510.17354 • Published Oct 20 • 33

Apriel-1.5-15b-Thinker

Paper • 2510.01141 • Published Oct 1 • 117
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Paper • 2509.21268 • Published Sep 25 • 103
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

Paper • 2509.00676 • Published Aug 31 • 84
Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83

Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83
Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published Sep 26 • 70
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Paper • 2510.05034 • Published Oct 6 • 48

POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion

Paper • 2509.01215 • Published Sep 1 • 50
Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83
Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis

Paper • 2509.09595 • Published Sep 11 • 48
Reconstruction Alignment Improves Unified Multimodal Models

Paper • 2509.07295 • Published Sep 8 • 40

Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83

Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9 • 101
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

Paper • 2509.03867 • Published Sep 4 • 211
Why Language Models Hallucinate

Paper • 2509.04664 • Published Sep 4 • 193

Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation

Paper • 2509.05263 • Published Sep 5 • 10
Symbolic Graphics Programming with Large Language Models

Paper • 2509.05208 • Published Sep 5 • 46
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Paper • 2509.12201 • Published Sep 15 • 104

Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83

Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83

From Editor to Dense Geometry Estimator

Paper • 2509.04338 • Published Sep 4 • 92
Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9 • 83

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs