In Perceptron, we learn that there might be multiple solutions. To decide which one is best, we have the optimal hyperplane.
For sample set
Formalizing this, we want to fix the scale by setting it to 1:
Optimal Hyperplane
In plain English,
We have to use Langragian equation to get the solution. This is convex optimization problem. We must find the saddle point.
Differentiating with
From here, we get 3 observations
- For optimal hyperplane,
is:
must be the linear combination of the training samples:
- Only support vectors have non-zero coefficient of
in . - This is based on the Kuhn Tucker theorem.
Optimal Hyperplane (Solution)
We sum over
only because they are the only non-zeros (3rd observation).
Plugging this back to
The threshold
Then we take average (to reduce noise) to find