❯

❯

Run RL Training

Run RL Training

created Mar 08, 2026modified Mar 22, 20261 min read

running

Need to figure out the OOM issues. ✅ 2026-03-09
- Something is leaking the memory in-between steps.
  - Trying to fix this with garbage collection on prime.py and ray_trainer.py. Currently running on grpo-train-yuki.
Look into evaluation harness ✅ 2026-03-15
- We are using the unbiased low-variance estimator, as per @yueDoesReinforcementLearning2025
- I think it is running properly now.

Graph View

Backlinks

Can We Improve Creativity using RL

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community