This is mainly about how we deal with nonlinear classification.
The idea is that we want to do a non-linear transformation
Let’s say we want to model a 2nd order polynomial (quadratic) decision function, we have to do transformation
We would have to do mapping for three different terms:
- Linear →
(total of coordinates): to model straight lines - Square →
(total of coordinates): to model circles / ellipses - Cross →
(total of coordinates): to model curves
This is bad because (1) A lot of compute is needed, (2) Dimensionality increasing.
So we have the kernel trick. Following Optimal Hyperplane solution, we have:
Notice that we are only using the inner-products of the transformed vector. We can just define
Dual Problem with Kernel Trick
We transform the decision function into:
To determine whether the Kernel