Notes TLDR; RL actually worsen Pass@K metrics. Base model already has the reasoning path that RL-ed models have.