done How does the training process of judge model affect their judgement?
- Running
big-qwen-base. Still dealing with some multi-GPU and VLLM issues. ✅ 2026-03-05 - Running
thinking-qwen-base. Maybe thinking models solve positional bias?
Results
- No, it is still really bad. Using 30B model, the positional bias is 24.60%
- Maybe it would be better if we use thinking models…