Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,114 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
base_model:
|
| 3 |
+
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
|
| 4 |
+
---
|
| 5 |
+
<div align="center">
|
| 6 |
+
|
| 7 |
+
# 🤔 Skywork-OR1 (Open Reasoner 1)
|
| 8 |
+
|
| 9 |
+
<div>
|
| 10 |
+
✊ Unleashing the Power of Reinforcement Learning for Math and Code Reasoners 🤖
|
| 11 |
+
</div>
|
| 12 |
+
|
| 13 |
+
</div>
|
| 14 |
+
<div>
|
| 15 |
+
<br>
|
| 16 |
+
|
| 17 |
+
<div align="center">
|
| 18 |
+
|
| 19 |
+
[](https://huggingface.co/collections/Skywork/skywork-or1-67fa1bcb41b436ef2def76b9)
|
| 20 |
+
[](https://huggingface.co/datasets/Skywork/Skywork-OR1-RL-Data)
|
| 21 |
+
[](https://github.com/SkyworkAI/Skywork-OR1)
|
| 22 |
+
[](https://yourname.notion.site/my-awesome-blog)
|
| 23 |
+
|
| 24 |
+
[](https://github.com/SkyworkAI/Skywork-OR1/stargazers)
|
| 25 |
+
[](https://github.com/SkyworkAI/Skywork-OR1/fork)
|
| 26 |
+
|
| 27 |
+
</div>
|
| 28 |
+
|
| 29 |
+
## 🔥 News
|
| 30 |
+
|
| 31 |
+
- **April 13, 2025**: We release the **`Skywork-OR1`** (Open Reasoner 1) series of models, including **`Skywork-OR1-Math-7B`**, **`Skywork-OR1-32B-Preview`**, and **`Skywork-OR1-Math-7B-Preview`**. We open-source
|
| 32 |
+
- 🤗 Model weights: [`Skywork-OR1-Math-7B`](https://huggingface.co/Skywork/Skywork-OR1-Math-7B), [`Skywork-OR1-32B-Preview`](https://huggingface.co/Skywork/Skywork-OR1-32B-Preview), [`Skywork-OR1-7B-Preview`](https://huggingface.co/Skywork/Skywork-OR1-7B-Preview)
|
| 33 |
+
- 🤗 Training data: [`Skywork-OR1-RL-Data`](https://huggingface.co/datasets/Skywork/Skywork-OR1-RL-Data)
|
| 34 |
+
- 🧑💻 Code: [`Skywork-OR1`](https://github.com/SkyworkAI/Skywork-OR1)
|
| 35 |
+
- We also release a [Notion Blog](https://yourname.notion.site/my-awesome-blog) to share detailed training recipes and extensive experimental results, analysis, and insights, dedicated to helping the community to better research, understand, and push the frontier of open reasoning models.
|
| 36 |
+
|
| 37 |
+
## 📖 Overview
|
| 38 |
+
|
| 39 |
+
<div align="center">
|
| 40 |
+
<img src="./assets/skywork-or1-math-7b-multi-stage.png" width="60%"/>
|
| 41 |
+
|
| 42 |
+
<sub>The AIME24 scores versus training steps of Skywork-OR1-Math-7B in our multi-stage training pipeline.</sub>
|
| 43 |
+
</div>
|
| 44 |
+
|
| 45 |
+
The **`Skywork-OR1`** (Open Reasoner 1) model series consists of powerful math and code reasoning models trained using large-scale rule-based reinforcement learning with carefully designed datasets and training recipes. This series includes two general-purpose reasoning models—**`Skywork-OR1-7B-Preview`** and **`Skywork-OR1-32B-Preview`**—along with a math-specialized model, **`Skywork-OR1-Math-7B`**.
|
| 46 |
+
|
| 47 |
+
- **[`Skywork-OR1-Math-7B`](https://huggingface.co/Skywork/Skywork-OR1-Math-7B)** is specifically optimized for mathematical reasoning, scoring **72.4** on AIME24 and **52.8** on AIME25 — well ahead of all models of similar size.
|
| 48 |
+
- **[`Skywork-OR1-32B-Preview`](https://huggingface.co/Skywork/Skywork-OR1-32B-Preview)** delivers the 671B-parameter Deepseek-R1 performance on math tasks (AIME24 and AIME25) and coding tasks (LiveCodeBench).
|
| 49 |
+
- **[`Skywork-OR1-7B-Preview`](https://huggingface.co/Skywork/Skywork-OR1-7B-Preview)** outperforms all similarly sized models in both math and coding scenarios.
|
| 50 |
+
|
| 51 |
+
## 📊 Evaluation
|
| 52 |
+
|
| 53 |
+
<div align="center">
|
| 54 |
+
<div style="display: flex; justify-content: center; gap: 20px;">
|
| 55 |
+
<img src="./assets/32b_perf.png" width="75%"/>
|
| 56 |
+
<img src="./assets/7b_perf.png" width="75%"/>
|
| 57 |
+
</div>
|
| 58 |
+
</div>
|
| 59 |
+
<br>
|
| 60 |
+
|
| 61 |
+
We evaluate our models on AIME24, AIME25, and LiveCodeBench. Instead of using Pass@1, which is common in prior work, we introduce Avg@K as the primary metric. This metric robustly measures a model's average performance across K independent attempts, reducing the impact of randomness and enhancing the reliability of the results. We believe that Avg@K provides a better reflection of a model's stability and reasoning consistency.
|
| 62 |
+
|
| 63 |
+
We inlcude the detailed results in the following table.
|
| 64 |
+
|
| 65 |
+
| Model | AIME24 (Avg@32) | AIME25 (Avg@32) | LiveCodeBench (8/1/24-2/1/25) (Avg@4) |
|
| 66 |
+
|-------|---------|---------|--------------|
|
| 67 |
+
| DeepSeek-R1-Distill-Qwen-7B | 55.5 | 39.2| 37.6 |
|
| 68 |
+
| Light-R1-7B-DS | 59.1 | 44.3| 39.5 |
|
| 69 |
+
| DeepSeek-R1-Distill-Qwen-32B | 72.9 | 59.0| 57.2 |
|
| 70 |
+
| TinyR1-32B-Preview | 78.1| 65.3| 61.6 |
|
| 71 |
+
| QwQ-32B | 79.5 | 65.3| 61.6 |
|
| 72 |
+
| DeepSeek-R1 | 79.8 | 70.0| 65.9 |
|
| 73 |
+
| **Skywork-OR1-Math-7B** | 69.8 | 52.3 | 43.6 |
|
| 74 |
+
| **Skywork-OR1-7B-Preview** | 63.6 | 45.8 | 43.9 |
|
| 75 |
+
| **Skywork-OR1-32B-Preview** | 79.7 | 69.0 | 63.9 |
|
| 76 |
+
|
| 77 |
+
## ⚙️ Training Recipe
|
| 78 |
+
|
| 79 |
+
We offer a brief overview of our data and training pipeline below. For more details, please refer to our Notion Blog [here]().
|
| 80 |
+
|
| 81 |
+
### Data
|
| 82 |
+
|
| 83 |
+
- We select, clean, and curate **a dataset of 110K verifiable, challenging, and diverse math problems and 14K coding questions** from open-source datasets.
|
| 84 |
+
- We perform **model-aware difficulty estimation** for each problem and model and conduct **rigorous quality assessment prior to training** to ensure training efficiency and effectiveness.
|
| 85 |
+
|
| 86 |
+
### Training
|
| 87 |
+
|
| 88 |
+
We develop a customized version of GRPO that leverages both data-wise and training-wise improvements:
|
| 89 |
+
|
| 90 |
+
- We perform both **offline and online difficulty-based filtering** and **rejection sampling** to improve training efficiency.
|
| 91 |
+
- We incorporate a **multi-stage training pipeline** coupled with **adaptive entropy control** and other techniques to enhance exploration and stability.
|
| 92 |
+
|
| 93 |
+
## 📄 Technical Report
|
| 94 |
+
|
| 95 |
+
Our technical report will be released soon. Stay tuned!
|
| 96 |
+
|
| 97 |
+
## 🙏 Acknowledgements
|
| 98 |
+
|
| 99 |
+
- Both of our models are trained on top of [`DeepSeek-R1-Distill-Qwen-7B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) and [`DeepSeek-R1-Distill-Qwen-32B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B).
|
| 100 |
+
- Both models are trained using [a custom fork](https://github.com/SkyworkAI/Skywork-OR1) of the wonderful [`verl`](https://github.com/volcengine/verl) project.
|
| 101 |
+
|
| 102 |
+
## 📚 Citation
|
| 103 |
+
|
| 104 |
+
We will update the citation once the technical report is released. In the meantime, please cite the following:
|
| 105 |
+
|
| 106 |
+
```bibtex
|
| 107 |
+
@misc{skywork-or1-2025,
|
| 108 |
+
title={},
|
| 109 |
+
author={},
|
| 110 |
+
howpublished={\url{}},
|
| 111 |
+
note={Notion Blog},
|
| 112 |
+
year={2025}
|
| 113 |
+
}
|
| 114 |
+
```
|