AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence

Yuliang Liu; Junjie Lu; Zhaoling Chen; Chaofeng Qu; Jason Klein Liu; Chonghan Liu; Zefan Cai; Yunhui Xia; Li Zhao; Jiang Bian; Chuheng Zhang; Wei Shen; Zhouhan Lin

AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence

Yuliang Liu ,
Junjie Lu ,
Zhaoling Chen ,
Chaofeng Qu ,
Jason Klein Liu ,
Chonghan Liu ,
Zefan Cai ,
Yunhui Xia ,
Li Zhao ,
Jiang Bian ,
Chuheng Zhang ,
Wei Shen ,
Zhouhan Lin

ICML 2025 | February 2025

Current approaches for training Process Reward Models (PRMs) often involve breaking down responses into multiple reasoning steps using rule-based techniques, such as using predefined placeholder tokens or setting the reasoning step’s length into a fixed size. These approaches overlook the fact that specific words do not typically mark true decision points in a text. To address this, we propose AdaptiveStep, a method that divides reasoning steps based on the model’s confidence in predicting the next word. This division method provides more decision-making information at each step, enhancing downstream tasks, such as reward model learning. Moreover, our method does not require manual annotation. We demonstrate its effectiveness through experiments with AdaptiveStep-trained PRMs in mathematical reasoning and code generation tasks. Experimental results indicate that the outcome PRM achieves state-of-the-art Best-of-N performance, surpassing greedy search strategy with token-level value-guided decoding, while also reducing construction costs by over 30% compared to existing open-source PRMs. In addition, we provide a thorough analysis and case study on the PRM’s performance, transferability, and generalization capabilities.