Safetensors
qwen2
chrisliu298 commited on
Commit
e5645c6
·
verified ·
1 Parent(s): 0b2419c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +114 -0
README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
4
+ ---
5
+ <div align="center">
6
+
7
+ # 🤔 Skywork-OR1 (Open Reasoner 1)
8
+
9
+ <div>
10
+ ✊ Unleashing the Power of Reinforcement Learning for Math and Code Reasoners 🤖
11
+ </div>
12
+
13
+ </div>
14
+ <div>
15
+ <br>
16
+
17
+ <div align="center">
18
+
19
+ [![Models](https://img.shields.io/badge/Models-4d5eff?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor)](https://huggingface.co/collections/Skywork/skywork-or1-67fa1bcb41b436ef2def76b9)
20
+ [![Data](https://img.shields.io/badge/Data-4d5eff?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor)](https://huggingface.co/datasets/Skywork/Skywork-OR1-RL-Data)
21
+ [![Github](https://img.shields.io/badge/Code-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/SkyworkAI/Skywork-OR1)
22
+ [![Notion](https://img.shields.io/badge/Notion_Blog-000000?style=for-the-badge&logo=notion&logoColor=white)](https://yourname.notion.site/my-awesome-blog)
23
+
24
+ [![GitHub Stars](https://img.shields.io/github/stars/SkyworkAI/Skywork-OR1?style=for-the-badge&logo=github&logoColor=white&label=Stars&color=000000)](https://github.com/SkyworkAI/Skywork-OR1/stargazers)
25
+ [![GitHub Forks](https://img.shields.io/github/forks/SkyworkAI/Skywork-OR1?style=for-the-badge&logo=github&logoColor=white&label=Forks&color=000000)](https://github.com/SkyworkAI/Skywork-OR1/fork)
26
+
27
+ </div>
28
+
29
+ ## 🔥 News
30
+
31
+ - **April 13, 2025**: We release the **`Skywork-OR1`** (Open Reasoner 1) series of models, including **`Skywork-OR1-Math-7B`**, **`Skywork-OR1-32B-Preview`**, and **`Skywork-OR1-Math-7B-Preview`**. We open-source
32
+ - 🤗 Model weights: [`Skywork-OR1-Math-7B`](https://huggingface.co/Skywork/Skywork-OR1-Math-7B), [`Skywork-OR1-32B-Preview`](https://huggingface.co/Skywork/Skywork-OR1-32B-Preview), [`Skywork-OR1-7B-Preview`](https://huggingface.co/Skywork/Skywork-OR1-7B-Preview)
33
+ - 🤗 Training data: [`Skywork-OR1-RL-Data`](https://huggingface.co/datasets/Skywork/Skywork-OR1-RL-Data)
34
+ - 🧑‍💻 Code: [`Skywork-OR1`](https://github.com/SkyworkAI/Skywork-OR1)
35
+ - We also release a [Notion Blog](https://yourname.notion.site/my-awesome-blog) to share detailed training recipes and extensive experimental results, analysis, and insights, dedicated to helping the community to better research, understand, and push the frontier of open reasoning models.
36
+
37
+ ## 📖 Overview
38
+
39
+ <div align="center">
40
+ <img src="./assets/skywork-or1-math-7b-multi-stage.png" width="60%"/>
41
+
42
+ <sub>The AIME24 scores versus training steps of Skywork-OR1-Math-7B in our multi-stage training pipeline.</sub>
43
+ </div>
44
+
45
+ The **`Skywork-OR1`** (Open Reasoner 1) model series consists of powerful math and code reasoning models trained using large-scale rule-based reinforcement learning with carefully designed datasets and training recipes. This series includes two general-purpose reasoning models—**`Skywork-OR1-7B-Preview`** and **`Skywork-OR1-32B-Preview`**—along with a math-specialized model, **`Skywork-OR1-Math-7B`**.
46
+
47
+ - **[`Skywork-OR1-Math-7B`](https://huggingface.co/Skywork/Skywork-OR1-Math-7B)** is specifically optimized for mathematical reasoning, scoring **72.4** on AIME24 and **52.8** on AIME25 — well ahead of all models of similar size.
48
+ - **[`Skywork-OR1-32B-Preview`](https://huggingface.co/Skywork/Skywork-OR1-32B-Preview)** delivers the 671B-parameter Deepseek-R1 performance on math tasks (AIME24 and AIME25) and coding tasks (LiveCodeBench).
49
+ - **[`Skywork-OR1-7B-Preview`](https://huggingface.co/Skywork/Skywork-OR1-7B-Preview)** outperforms all similarly sized models in both math and coding scenarios.
50
+
51
+ ## 📊 Evaluation
52
+
53
+ <div align="center">
54
+ <div style="display: flex; justify-content: center; gap: 20px;">
55
+ <img src="./assets/32b_perf.png" width="75%"/>
56
+ <img src="./assets/7b_perf.png" width="75%"/>
57
+ </div>
58
+ </div>
59
+ <br>
60
+
61
+ We evaluate our models on AIME24, AIME25, and LiveCodeBench. Instead of using Pass@1, which is common in prior work, we introduce Avg@K as the primary metric. This metric robustly measures a model's average performance across K independent attempts, reducing the impact of randomness and enhancing the reliability of the results. We believe that Avg@K provides a better reflection of a model's stability and reasoning consistency.
62
+
63
+ We inlcude the detailed results in the following table.
64
+
65
+ | Model | AIME24 (Avg@32) | AIME25 (Avg@32) | LiveCodeBench (8/1/24-2/1/25) (Avg@4) |
66
+ |-------|---------|---------|--------------|
67
+ | DeepSeek-R1-Distill-Qwen-7B | 55.5 | 39.2| 37.6 |
68
+ | Light-R1-7B-DS | 59.1 | 44.3| 39.5 |
69
+ | DeepSeek-R1-Distill-Qwen-32B | 72.9 | 59.0| 57.2 |
70
+ | TinyR1-32B-Preview | 78.1| 65.3| 61.6 |
71
+ | QwQ-32B | 79.5 | 65.3| 61.6 |
72
+ | DeepSeek-R1 | 79.8 | 70.0| 65.9 |
73
+ | **Skywork-OR1-Math-7B** | 69.8 | 52.3 | 43.6 |
74
+ | **Skywork-OR1-7B-Preview** | 63.6 | 45.8 | 43.9 |
75
+ | **Skywork-OR1-32B-Preview** | 79.7 | 69.0 | 63.9 |
76
+
77
+ ## ⚙️ Training Recipe
78
+
79
+ We offer a brief overview of our data and training pipeline below. For more details, please refer to our Notion Blog [here]().
80
+
81
+ ### Data
82
+
83
+ - We select, clean, and curate **a dataset of 110K verifiable, challenging, and diverse math problems and 14K coding questions** from open-source datasets.
84
+ - We perform **model-aware difficulty estimation** for each problem and model and conduct **rigorous quality assessment prior to training** to ensure training efficiency and effectiveness.
85
+
86
+ ### Training
87
+
88
+ We develop a customized version of GRPO that leverages both data-wise and training-wise improvements:
89
+
90
+ - We perform both **offline and online difficulty-based filtering** and **rejection sampling** to improve training efficiency.
91
+ - We incorporate a **multi-stage training pipeline** coupled with **adaptive entropy control** and other techniques to enhance exploration and stability.
92
+
93
+ ## 📄 Technical Report
94
+
95
+ Our technical report will be released soon. Stay tuned!
96
+
97
+ ## 🙏 Acknowledgements
98
+
99
+ - Both of our models are trained on top of [`DeepSeek-R1-Distill-Qwen-7B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) and [`DeepSeek-R1-Distill-Qwen-32B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B).
100
+ - Both models are trained using [a custom fork](https://github.com/SkyworkAI/Skywork-OR1) of the wonderful [`verl`](https://github.com/volcengine/verl) project.
101
+
102
+ ## 📚 Citation
103
+
104
+ We will update the citation once the technical report is released. In the meantime, please cite the following:
105
+
106
+ ```bibtex
107
+ @misc{skywork-or1-2025,
108
+ title={},
109
+ author={},
110
+ howpublished={\url{}},
111
+ note={Notion Blog},
112
+ year={2025}
113
+ }
114
+ ```