Separability of Classes

If we want to do feature selection, we can’t just do trial and error to find the features that correspond to the lowest error ⇒ too expensive!

So, we have to determine the separability of the features. The more separable they are, the better classification work and therefore higher accuracy. Doing this requires some metrics, and there are a bunch of metrics.

Metrics based on distributions

To measure the overlapping of two distributions:

J_{p} (\cdot) = \int g [p (x, ω_{1}), p (x, ω_{2}), P_{1}, P_{2}] d x

$x$ is the feature vector; $ω_{1}, ω_{2}$ are the two classes; $P_{i}$ is the probability of class $ω_{i}$ ; and $g [\cdot]$ is the function that computes the overlap between the distributions.

There are many equations, but unnecessary for now.

Metrics based on information theory

We have Shannon entropy, for feature $x$ :

H = - i = 1 \sum c P (ω_{i} ∣ x) lo g_{2} P (ω_{i} ∣ x)

Messy Notes

Explorer

Separability of Classes

Metrics based on distributions

Metrics based on information theory

Graph View

Table of Contents

Backlinks