In decision trees, we can prioritize nodes with higher information gain. Let the total number of samples is and samples in class 1 is .

We can then define the entropy as the entropy before we split.

Next, is the entropy after we split, given feature which has classes, and being the number of child nodes while is the number of samples in class 1 of child .

So, we have Information Gain, which is the decrease of impurity if we split.