Following LDA, we can then formalize Fisher’s Criterion for best separation

where

So the optimal w,

Note that the derivation comes from substituting

Some things to note with the value of : - Not unique, so if we change the scale of , the value of won’t change. - Of course, we can fix the denominator , and then maximise the numerator

We define a Langragian function:

Let ,

which means that is the eigenvector of matrix .

Substituting , we eventually will have

Remember that is the direction of projection.

Note that this is the binary linear discriminant. The multiclass version is slighty different.