DiRL-8B-Instruct

---
base_model: JetLM/SDAR-8B-Chat
language:
- en
- zh
license: apache-2.0
tags:
- math
- reasoning
- diffusion
model_type: sdar
pipeline_tag: text-generation
library_name: transformers
---

<h1 align="center">DiRL-8B-Instruct</h1>

<p align="center">
  <a href="https://arxiv.org/abs/2512.22234">
    <img src="https://img.shields.io/badge/arXiv-2512.22234-b31b1b.svg" alt="Paper on arXiv"/>
  </a>
  <a href="https://github.com/OpenMOSS/DiRL">
    <img src="https://img.shields.io/badge/GitHub-Code-black.svg?logo=github" alt="GitHub Code"/>
  </a>
</p>

## Introduction

**DiRL-8B-Instruct** is an 8B parameter diffusion language model specialized for mathematical reasoning. It is trained using the [DiRL](https://github.com/OpenMOSS/DiRL) framework based on [SDAR-8B-Chat](https://huggingface.co/JetLM/SDAR-8B-Chat). Through two-stage training (SFT + RL), DiRL-8B-Instruct achieves state-of-the-art results at the 8B scale on mathematical reasoning benchmarks, even outperforming 32B models on most tasks.

> **Highlights**
> 
> * **SOTA Performance:** Achieves **83.05%** on MATH500, **20.63%** on AIME2024, and **20.83%** on AIME2025, surpassing all 8B baselines.
> * **Training Framework:** Trained with [DiRL](https://github.com/OpenMOSS/DiRL), an efficient training framework for diffusion language models.
> * **Strong Baseline:** Built on [SDAR-8B-Chat](https://huggingface.co/JetLM/SDAR-8B-Chat), gaining **+11.20%** on MATH500 and **+11.46%** on AIME2024.

## Inference

### Using LMDeploy

```python
from lmdeploy import pipeline, PytorchEngineConfig, GenerationConfig
from transformers import AutoTokenizer

model_path = "OpenMOSS-Team/DiRL-8B-Instruct"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Prepare prompts
prompts = [
    [{"role": "user", "content": "Solve: If x + 5 = 12, what is x?"}],
]
prompts = tokenizer.apply_chat_template(prompts, tokenize=False, add_generation_prompt=True)

# Configure backend for DLLM inference
backend_config = PytorchEngineConfig(
    dtype="float16",
    max_prefill_token_num=8192,
    cache_max_entry_count=0.8,
    dllm_block_length=4,
    dllm_denoising_steps=4,
    dllm_unmasking_strategy="low_confidence_dynamic",
    dllm_confidence_threshold=0.9,
)

# Create inference pipeline
with pipeline(model_path, backend_config=backend_config) as pipe:
    gen_config = GenerationConfig(
        top_p=1.0,
        top_k=50,
        temperature=1.0,
        do_sample=False,  # greedy decoding
        max_new_tokens=8192,
    )
    
    outputs = pipe(prompts, gen_config=gen_config)
    
    for output in outputs:
        print(output.text)
```

## Performance

| Model | MATH500 | GSM8K | AIME2024 | AIME2025 | OlympiadBench | Average |
|-------|---------|-------|----------|----------|---------------|---------|
| Qwen2.5-7B-Instruct | 73.78 | 89.78 | 8.96 | 5.63 | 36.58 | 42.95 |
| Qwen2.5-32B-Instruct | 81.13 | **94.03** | 12.92 | 11.88 | 45.65 | 49.12 |
| SDAR-8B-Chat | 71.85 | 89.87 | 9.17 | 9.38 | 36.03 | 43.26 |
| Trado-8B-Instruct | 75.59 | 91.06 | 11.67 | 15.00 | 40.32 | 46.73 |
| **DiRL-8B-Instruct** | **83.05** | 93.03 | **20.63** | **20.83** | **46.40** | **52.79** |

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{zhu2025dirl,
  title={DiRL: An Efficient Post-Training Framework for Diffusion Language Models},
  author={Zhu, Ying and Wan, Jiaxin and Liu, Xiaoran and He, Siyanag and Wang, Qiqi and Guo, Xu and Liang, Tianyi and Huang, Zengfeng and He, Ziwei and Qiu, Xipeng},
  year={2025},
  eprint={2512.22234},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2512.22234}
}
```