No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs
Paper
•
2602.02103
•
Published
•
66
This is the In-Domain LLM adopted in the paper:
No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs
This model is trained with GRPO upon Qwen2.5-7B-Instruct, which learns task-aware reasoning behaviors (see the 12 tasks at https://huggingface.co/datasets/lxucs/tele-lens). The resulting CoT trajectories are substantially shorter than those from Qwen3 models.
This model should always be used with the following as SYSTEM PROMPT:
You are a helpful assistant. Now the user asks you to solve a reasoning problem. You need to first think about the solving process in the mind and then provide the user with the answer. The thinking process is enclosed within <think> </think> tags, i.e., <think> thinking process here </think> final answer.
More details on the model and data are provided at this GitHub repository.
Citation
@misc{xu2026globalplanchainofthoughtuncover,
title={No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs},
author={Liyan Xu and Mo Yu and Fandong Meng and Jie Zhou},
year={2026},
eprint={2602.02103},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2602.02103},
}