-
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization
Paper • 2510.08540 • Published • 109 -
Diffusion Transformers with Representation Autoencoders
Paper • 2510.11690 • Published • 165 -
Spotlight on Token Perception for Multimodal Reinforcement Learning
Paper • 2510.09285 • Published • 36 -
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation
Paper • 2510.17354 • Published • 33
Collections
Discover the best community collections!
Collections including paper arxiv:2509.07979
-
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 83 -
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
Paper • 2509.22638 • Published • 70 -
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Paper • 2510.05034 • Published • 48
-
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 83 -
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation
Paper • 2509.05263 • Published • 10 -
Symbolic Graphics Programming with Large Language Models
Paper • 2509.05208 • Published • 46 -
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Paper • 2509.12201 • Published • 104
-
Apriel-1.5-15b-Thinker
Paper • 2510.01141 • Published • 117 -
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
Paper • 2509.21268 • Published • 103 -
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model
Paper • 2509.00676 • Published • 84 -
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 83
-
POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion
Paper • 2509.01215 • Published • 50 -
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 83 -
Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis
Paper • 2509.09595 • Published • 48 -
Reconstruction Alignment Improves Unified Multimodal Models
Paper • 2509.07295 • Published • 40
-
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 83 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 101 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 211 -
Why Language Models Hallucinate
Paper • 2509.04664 • Published • 193
-
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization
Paper • 2510.08540 • Published • 109 -
Diffusion Transformers with Representation Autoencoders
Paper • 2510.11690 • Published • 165 -
Spotlight on Token Perception for Multimodal Reinforcement Learning
Paper • 2510.09285 • Published • 36 -
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation
Paper • 2510.17354 • Published • 33
-
Apriel-1.5-15b-Thinker
Paper • 2510.01141 • Published • 117 -
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
Paper • 2509.21268 • Published • 103 -
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model
Paper • 2509.00676 • Published • 84 -
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 83
-
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 83 -
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
Paper • 2509.22638 • Published • 70 -
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Paper • 2510.05034 • Published • 48
-
POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion
Paper • 2509.01215 • Published • 50 -
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 83 -
Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis
Paper • 2509.09595 • Published • 48 -
Reconstruction Alignment Improves Unified Multimodal Models
Paper • 2509.07295 • Published • 40
-
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 83 -
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper • 2509.07980 • Published • 101 -
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
Paper • 2509.03867 • Published • 211 -
Why Language Models Hallucinate
Paper • 2509.04664 • Published • 193
-
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 83 -
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation
Paper • 2509.05263 • Published • 10 -
Symbolic Graphics Programming with Large Language Models
Paper • 2509.05208 • Published • 46 -
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Paper • 2509.12201 • Published • 104