Following Optimal Hyperplane, we have generalized optimal hyperplane that deals with non-separable cases.

Actually, we just need to add a slack / error variable to relax the constraint:

We then define a function , which reflects how much of the original constraints are violated.

Generalized Optimal Hyperplane

Where parameter controls the penalty on errors.

I am kinda too tired to derive everything like we did in Optimal Hyperplane, but we need to use Kuhn Tucker theorem as well, but with the error term .

Here are the revised theorem:

The decision function solution: