Information Gain

In decision trees, we can prioritize nodes with higher information gain. Let the total number of samples is $n$ and samples in class 1 is $n_{1}$ .

We can then define the entropy $I$ as the entropy before we split.

I (n, n_{1}) = - ((\frac{n _{1}}{n}) lo g_{2} (\frac{n _{1}}{n}) + (1 - \frac{n _{1}}{n}) lo g_{2} (1 - \frac{n _{1}}{n}))

Next, $E (X)$ is the entropy after we split, given feature $X$ which has $k$ classes, and $m_{i}$ being the number of child nodes while $m_{i 1}$ is the number of samples in class 1 of child $m_{i}$ .

E (X) = - (\frac{m _{1}}{n}) I (m_{1}, m_{11}) + (\frac{m _{2}}{n}) I (m_{2}, m_{21}) + \dots + (\frac{m _{k}}{n}) I (m_{k} m_{k_{1}})

So, we have Information Gain, which is the decrease of impurity if we split.

Gain (X) = I (n, n_{1}) - E (X)

Messy Notes

Explorer

Information Gain

Graph View

Backlinks