-
LLaDA2.0: Scaling Up Diffusion Language Models to 100B
Paper • 2512.15745 • Published • 76 -
inclusionAI/LLaDA2.0-flash
Text Generation • 103B • Updated • 428 • 58 -
inclusionAI/LLaDA2.0-mini
Text Generation • 16B • Updated • 5.15k • 48 -
inclusionAI/LLaDA2.0-flash-preview
Text Generation • 103B • Updated • 125 • 69
Collections
Discover the best community collections!
Collections including paper arxiv:2512.15745
-
TiDAR: Think in Diffusion, Talk in Autoregression
Paper • 2511.08923 • Published • 117 -
Diffusion Language Models are Super Data Learners
Paper • 2511.03276 • Published • 127 -
What Makes Diffusion Language Models Super Data Learners?
Paper • 2510.04071 • Published -
LLaDA2.0: Scaling Up Diffusion Language Models to 100B
Paper • 2512.15745 • Published • 76
-
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
Paper • 2505.22618 • Published • 44 -
DINGO: Constrained Inference for Diffusion LLMs
Paper • 2505.23061 • Published • 31 -
Discrete Diffusion in Large Language and Multimodal Models: A Survey
Paper • 2506.13759 • Published • 43 -
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs
Paper • 2506.14429 • Published • 44
-
Scaling Latent Reasoning via Looped Language Models
Paper • 2510.25741 • Published • 221 -
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
Paper • 2511.23319 • Published • 22 -
Focused Chain-of-Thought: Efficient LLM Reasoning via Structured Input Information
Paper • 2511.22176 • Published • 4 -
FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning
Paper • 2511.22265 • Published • 1
-
Fast-dLLM v2: Efficient Block-Diffusion LLM
Paper • 2509.26328 • Published • 54 -
Attention Is All You Need for KV Cache in Diffusion LLMs
Paper • 2510.14973 • Published • 40 -
Attention Sinks in Diffusion Language Models
Paper • 2510.15731 • Published • 48 -
Diffusion Language Models are Super Data Learners
Paper • 2511.03276 • Published • 127
-
Large Language Diffusion Models
Paper • 2502.09992 • Published • 123 -
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Paper • 2503.09573 • Published • 74 -
MMaDA: Multimodal Large Diffusion Language Models
Paper • 2505.15809 • Published • 97 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 54
-
LLaDA2.0: Scaling Up Diffusion Language Models to 100B
Paper • 2512.15745 • Published • 76 -
inclusionAI/LLaDA2.0-flash
Text Generation • 103B • Updated • 428 • 58 -
inclusionAI/LLaDA2.0-mini
Text Generation • 16B • Updated • 5.15k • 48 -
inclusionAI/LLaDA2.0-flash-preview
Text Generation • 103B • Updated • 125 • 69
-
Scaling Latent Reasoning via Looped Language Models
Paper • 2510.25741 • Published • 221 -
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
Paper • 2511.23319 • Published • 22 -
Focused Chain-of-Thought: Efficient LLM Reasoning via Structured Input Information
Paper • 2511.22176 • Published • 4 -
FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning
Paper • 2511.22265 • Published • 1
-
TiDAR: Think in Diffusion, Talk in Autoregression
Paper • 2511.08923 • Published • 117 -
Diffusion Language Models are Super Data Learners
Paper • 2511.03276 • Published • 127 -
What Makes Diffusion Language Models Super Data Learners?
Paper • 2510.04071 • Published -
LLaDA2.0: Scaling Up Diffusion Language Models to 100B
Paper • 2512.15745 • Published • 76
-
Fast-dLLM v2: Efficient Block-Diffusion LLM
Paper • 2509.26328 • Published • 54 -
Attention Is All You Need for KV Cache in Diffusion LLMs
Paper • 2510.14973 • Published • 40 -
Attention Sinks in Diffusion Language Models
Paper • 2510.15731 • Published • 48 -
Diffusion Language Models are Super Data Learners
Paper • 2511.03276 • Published • 127
-
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
Paper • 2505.22618 • Published • 44 -
DINGO: Constrained Inference for Diffusion LLMs
Paper • 2505.23061 • Published • 31 -
Discrete Diffusion in Large Language and Multimodal Models: A Survey
Paper • 2506.13759 • Published • 43 -
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs
Paper • 2506.14429 • Published • 44
-
Large Language Diffusion Models
Paper • 2502.09992 • Published • 123 -
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Paper • 2503.09573 • Published • 74 -
MMaDA: Multimodal Large Diffusion Language Models
Paper • 2505.15809 • Published • 97 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 54