If we want to do feature selection, we can’t just do trial and error to find the features that correspond to the lowest error ⇒ too expensive!
So, we have to determine the separability of the features. The more separable they are, the better classification work and therefore higher accuracy. Doing this requires some metrics, and there are a bunch of metrics.
Metrics based on distributions
To measure the overlapping of two distributions:
There are many equations, but unnecessary for now.
Metrics based on information theory
We have Shannon entropy, for feature