(2025) - Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, Gao Huang

Notes

TLDR; RL actually worsen Pass@K metrics. Base model already has the reasoning path that RL-ed models have.