File size: 2,915 Bytes
9907bfb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
---
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
datasets:
- Satori-reasoning/Satori_FT_data
base_model:
- Qwen/Qwen2.5-Math-7B
---
**Satori-7B-SFT** is the SFT model checkpoint for training our RL model [Satori-7B-Round2](https://huggingface.co/Satori-reasoning/Satori-7B-Round2). **Satori-7B-SFT** is only trained with a small-scale format tuning (FT) stage that helps the base LLM to internalize the COAT reasoning format.
# **Usage**
```python
import os
from tqdm import tqdm
import torch
from vllm import LLM, SamplingParams
def generate(question_list,model_path):
llm = LLM(
model=model_path,
trust_remote_code=True,
tensor_parallel_size=1,
)
sampling_params = SamplingParams(
max_tokens=4096,
temperature=0.0,
n=1,
skip_special_tokens=True # hide special tokens such as "<|continue|>", "<|reflect|>", and "<|explore|>"
)
outputs = llm.generate(question_list, sampling_params, use_tqdm=True)
completions = [[output.text for output in output_item.outputs] for output_item in outputs]
return completions
def prepare_prompt(question):
prompt = f"<|im_start|>user\nSolve the following math problem efficiently and clearly.\nPlease reason step by step, and put your final answer within \\boxed{{}}.\nProblem: {question}<|im_end|>\n<|im_start|>assistant\n"
return prompt
def run():
model_path = "Satori-reasoning/Satori-7B-SFT"
all_problems = [
"which number is larger? 9.11 or 9.9?",
]
completions = generate(
[prepare_prompt(problem_data) for problem_data in all_problems],
model_path
)
for completion in completions:
print(completion[0])
if __name__ == "__main__":
run()
```
# **Resources**
We provide our training datasets:
- [Full format tuning dataset](https://huggingface.co/datasets/Satori-reasoning/Satori_FT_data) with 300K unique questions.
- [RL dataset](https://huggingface.co/datasets/Satori-reasoning/Satori_RL_data) with 550K unique questions.
Please refer to our blog and research paper for more technical details of Satori.
- [Blog](https://satori-reasoning.github.io/blog/satori/)
- [Paper](https://arxiv.org/pdf/2502.02508)
For code, see https://github.com/Satori-reasoning/Satori
# **Citation**
If you find our model and data helpful, please cite our paper:
```
@misc{shen2025satorireinforcementlearningchainofactionthought,
title={Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search},
author={Maohao Shen and Guangtao Zeng and Zhenting Qi and Zhang-Wei Hong and Zhenfang Chen and Wei Lu and Gregory Wornell and Subhro Das and David Cox and Chuang Gan},
year={2025},
eprint={2502.02508},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.02508},
}
``` |