Training Compass
The Why
- Before training any model, always check if prompting / fine-tuning does the job well.
- If answer to both of them are no, then train the model strictly under the below categories:
- Research — make the hypothesis as concrete as possible, then think if scaling up increases the chance of success.
- Production — only when your domain is very specific, or your deployment is very restrictive
- Strategic open-sourced — concrete goals with gap in the open-sourced community
The What
- Next, decide what model type, architecture, etc that you want to train.
- Connect each constraint from your “why” to concrete specifications in your “what”.
- Principle of derisking: never change anything unless you’ve tested that it helps.
- Ask “Will this help my specific use case” and “Will this optimize my training” before testing any modification.
- Modify one thing at a time. Track parameter count, modify other hyperparameters to make the model sizes roughly the same.
- Rules of engagement
- Be paranoid, make sure you can reproduce published results. Inspect the answers yourself.
- Test every change, no matter how small.
- Train on enough tokens and use sufficient evaluations. Do not cut corners!