--- base_model: JetLM/SDAR-8B-Chat language: - en - zh license: apache-2.0 tags: - math - reasoning - diffusion model_type: sdar pipeline_tag: text-generation library_name: transformers ---

DiRL-8B-Instruct

Paper on arXiv GitHub Code

## Introduction **DiRL-8B-Instruct** is an 8B parameter diffusion language model specialized for mathematical reasoning. It is trained using the [DiRL](https://github.com/OpenMOSS/DiRL) framework based on [SDAR-8B-Chat](https://huggingface.co/JetLM/SDAR-8B-Chat). Through two-stage training (SFT + RL), DiRL-8B-Instruct achieves state-of-the-art results at the 8B scale on mathematical reasoning benchmarks, even outperforming 32B models on most tasks. > **Highlights** > > * **SOTA Performance:** Achieves **83.05%** on MATH500, **20.63%** on AIME2024, and **20.83%** on AIME2025, surpassing all 8B baselines. > * **Training Framework:** Trained with [DiRL](https://github.com/OpenMOSS/DiRL), an efficient training framework for diffusion language models. > * **Strong Baseline:** Built on [SDAR-8B-Chat](https://huggingface.co/JetLM/SDAR-8B-Chat), gaining **+11.20%** on MATH500 and **+11.46%** on AIME2024. ## Inference ### Using LMDeploy ```python from lmdeploy import pipeline, PytorchEngineConfig, GenerationConfig from transformers import AutoTokenizer model_path = "OpenMOSS-Team/DiRL-8B-Instruct" # Load tokenizer tokenizer = AutoTokenizer.from_pretrained(model_path) # Prepare prompts prompts = [ [{"role": "user", "content": "Solve: If x + 5 = 12, what is x?"}], ] prompts = tokenizer.apply_chat_template(prompts, tokenize=False, add_generation_prompt=True) # Configure backend for DLLM inference backend_config = PytorchEngineConfig( dtype="float16", max_prefill_token_num=8192, cache_max_entry_count=0.8, dllm_block_length=4, dllm_denoising_steps=4, dllm_unmasking_strategy="low_confidence_dynamic", dllm_confidence_threshold=0.9, ) # Create inference pipeline with pipeline(model_path, backend_config=backend_config) as pipe: gen_config = GenerationConfig( top_p=1.0, top_k=50, temperature=1.0, do_sample=False, # greedy decoding max_new_tokens=8192, ) outputs = pipe(prompts, gen_config=gen_config) for output in outputs: print(output.text) ``` ## Performance | Model | MATH500 | GSM8K | AIME2024 | AIME2025 | OlympiadBench | Average | |-------|---------|-------|----------|----------|---------------|---------| | Qwen2.5-7B-Instruct | 73.78 | 89.78 | 8.96 | 5.63 | 36.58 | 42.95 | | Qwen2.5-32B-Instruct | 81.13 | **94.03** | 12.92 | 11.88 | 45.65 | 49.12 | | SDAR-8B-Chat | 71.85 | 89.87 | 9.17 | 9.38 | 36.03 | 43.26 | | Trado-8B-Instruct | 75.59 | 91.06 | 11.67 | 15.00 | 40.32 | 46.73 | | **DiRL-8B-Instruct** | **83.05** | 93.03 | **20.63** | **20.83** | **46.40** | **52.79** | ## Citation If you use this model in your research, please cite: ```bibtex @misc{zhu2025dirl, title={DiRL: An Efficient Post-Training Framework for Diffusion Language Models}, author={Zhu, Ying and Wan, Jiaxin and Liu, Xiaoran and He, Siyanag and Wang, Qiqi and Guo, Xu and Liang, Tianyi and Huang, Zengfeng and He, Ziwei and Qiu, Xipeng}, year={2025}, eprint={2512.22234}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2512.22234} } ```