Things To Do
Thoughts
-
How do we measure reasoning coherence?
- An easy way out is just to get an LLM to judge it, then cross reference with human evaluation.
- Is there a more concrete, perhaps in the domain of knowledge-graph to quantify reasoning correctness?
-
Should we instead use different tasks to measure reasoning accuracy and context utilization.
- Use a more straightforward “what is the conclusion” task for reasoning accuracy.
- Maintain the same method to determine context utilization.