Classification is similar to linear regression, insofar we are making a line in a high dimensional space. But instead of using this line to predict, we are using this line to segregate different classes.

The classification equation is

where if then and vice versa.

In this course, we learn to derive the optimal equation using Linear Discriminant Analysis and Fisher Criterion.

However, both of them only explains how to get the optimal direction of projection (i.e. the value of ), while we still need the value of .

We have a few option for the value of ,

You have to choose based on (1) Domain situation, (2) Threshold with the ROC curve.

Note that you can’t just classify everything and anything β€” it depends on whether they are Linearly Separable or not. Using the Normalized Augmented Feature Vector, we can derive the cost function for a Perceptron to help with classification.

So what should we do if we have linearly non-separable cases? We have to use models with the least Minimum Squared Error.