Showing Preference Changes during IFT

Need to run OLMO, 3 steps each, on the DRUID dataset. ✅ 2026-03-10

Run the same thing on Llama ✅ 2026-03-10

Running this slowly — need to check if I need the chat template for inference
Need to probably sanity check the results. Why is gold sources good now? It is not reinforced back then. I am currently running the base model again.
We probably need to do proper sampling rather than running for all the rows — this could be problematic. I hope the results don’t really change…
Wait, I just realized that we can’t say it “reinforce” to follow gold sources… that doesn’t make sense. We don’t even know if it moves in the direction of the original answer… We can’t determine whether it is repulsed or attracted to the message of the evidence.

See if the training effects are more uniform between the two models when the training data is the same (Tulu). ✅ 2026-03-11

Yes! They are definitely more similar than not! The extent can be a bit different — but the general trend is the same.

Need to deal with seeing if it is following the evidence or not. ✅ 2026-03-11

Solved by reverting back to the original ACU score → seeing how much it moves towards the evidence stance.

Run the middle training in-between biggest delta for Llama and OLMO ✅ 2026-03-16

This is looking like SFT; running on 7B might be a bit too much, so I should run it on 1B first for a proof of concept. ✅
I need to save checkpoints in between training. ✅

Find ways to characterize the post-training.

The idea is that we use the same context characteristics, then we control one element at a time to see the difference.
- This is possibly takes too long.
Or we can do a swaparoo between the same models (need to do SFT 2x) to see if the effects swapped too.
- Yes, I think we need to do a run with 1B model trained on the newer Dolci dataset to see if it will become more similar to the Olmo3B model.
- But this does not help with the second hypothesis lmao.
Read papers given by Zain and Sarah. ✅ 2026-03-17
- Sarah's paper is not out yet.
Siddhesh mentioned something about safety, instruct, fine-tuned OLMo models? ❌ 2026-03-17
- Can’t find what he’s referring to.

Run the difference between ACU scores (normal) too ✅ 2026-03-13

Running this now.
What does it mean that the results don't hold here...?
Haeun said it's okay for it to be different as long as there is a meaningful correlation. What correlation can we find here?

Work on perplexity counting for Olmo.

Evaluate each step of SFT ✅ 2026-03-17

Explorer