Following LDA, we can then formalize Fisherβs Criterion for best separation
where
So the optimal w,
Note that the derivation comes from substituting
Some things to note with the value of : - Not unique, so if we change the scale of , the value of wonβt change. - Of course, we can fix the denominator , and then maximise the numerator
We define a Langragian function:
Let ,
which means that is the eigenvector of matrix .
Substituting , we eventually will have
Remember that is the direction of projection.
Note that this is the binary linear discriminant. The multiclass version is slighty different.