In decision trees, we can prioritize nodes with higher information gain.
Let the total number of samples is
We can then define the entropy
Next,
So, we have Information Gain, which is the decrease of impurity if we split.
In decision trees, we can prioritize nodes with higher information gain.
Let the total number of samples is
We can then define the entropy
Next,
So, we have Information Gain, which is the decrease of impurity if we split.