(2025) - Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, Gao Huang
Notes
TLDR; RL actually worsen Pass@K metrics. Base model already has the reasoning path that RL-ed models have.
TLDR; RL actually worsen Pass@K metrics. Base model already has the reasoning path that RL-ed models have.