---
base_model: JetLM/SDAR-8B-Chat
language:
- en
- zh
license: apache-2.0
tags:
- math
- reasoning
- diffusion
model_type: sdar
pipeline_tag: text-generation
library_name: transformers
---
DiRL-8B-Instruct
## Introduction
**DiRL-8B-Instruct** is an 8B parameter diffusion language model specialized for mathematical reasoning. It is trained using the [DiRL](https://github.com/OpenMOSS/DiRL) framework based on [SDAR-8B-Chat](https://huggingface.co/JetLM/SDAR-8B-Chat). Through two-stage training (SFT + RL), DiRL-8B-Instruct achieves state-of-the-art results at the 8B scale on mathematical reasoning benchmarks, even outperforming 32B models on most tasks.
> **Highlights**
>
> * **SOTA Performance:** Achieves **83.05%** on MATH500, **20.63%** on AIME2024, and **20.83%** on AIME2025, surpassing all 8B baselines.
> * **Training Framework:** Trained with [DiRL](https://github.com/OpenMOSS/DiRL), an efficient training framework for diffusion language models.
> * **Strong Baseline:** Built on [SDAR-8B-Chat](https://huggingface.co/JetLM/SDAR-8B-Chat), gaining **+11.20%** on MATH500 and **+11.46%** on AIME2024.
## Inference
### Using LMDeploy
```python
from lmdeploy import pipeline, PytorchEngineConfig, GenerationConfig
from transformers import AutoTokenizer
model_path = "OpenMOSS-Team/DiRL-8B-Instruct"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
# Prepare prompts
prompts = [
[{"role": "user", "content": "Solve: If x + 5 = 12, what is x?"}],
]
prompts = tokenizer.apply_chat_template(prompts, tokenize=False, add_generation_prompt=True)
# Configure backend for DLLM inference
backend_config = PytorchEngineConfig(
dtype="float16",
max_prefill_token_num=8192,
cache_max_entry_count=0.8,
dllm_block_length=4,
dllm_denoising_steps=4,
dllm_unmasking_strategy="low_confidence_dynamic",
dllm_confidence_threshold=0.9,
)
# Create inference pipeline
with pipeline(model_path, backend_config=backend_config) as pipe:
gen_config = GenerationConfig(
top_p=1.0,
top_k=50,
temperature=1.0,
do_sample=False, # greedy decoding
max_new_tokens=8192,
)
outputs = pipe(prompts, gen_config=gen_config)
for output in outputs:
print(output.text)
```
## Performance
| Model | MATH500 | GSM8K | AIME2024 | AIME2025 | OlympiadBench | Average |
|-------|---------|-------|----------|----------|---------------|---------|
| Qwen2.5-7B-Instruct | 73.78 | 89.78 | 8.96 | 5.63 | 36.58 | 42.95 |
| Qwen2.5-32B-Instruct | 81.13 | **94.03** | 12.92 | 11.88 | 45.65 | 49.12 |
| SDAR-8B-Chat | 71.85 | 89.87 | 9.17 | 9.38 | 36.03 | 43.26 |
| Trado-8B-Instruct | 75.59 | 91.06 | 11.67 | 15.00 | 40.32 | 46.73 |
| **DiRL-8B-Instruct** | **83.05** | 93.03 | **20.63** | **20.83** | **46.40** | **52.79** |
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{zhu2025dirl,
title={DiRL: An Efficient Post-Training Framework for Diffusion Language Models},
author={Zhu, Ying and Wan, Jiaxin and Liu, Xiaoran and He, Siyanag and Wang, Qiqi and Guo, Xu and Liang, Tianyi and Huang, Zengfeng and He, Ziwei and Qiu, Xipeng},
year={2025},
eprint={2512.22234},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.22234}
}
```