Following Optimal Hyperplane, we have generalized optimal hyperplane that deals with non-separable cases.

Actually, we just need to add a slack / error variable to relax the constraint:

We then define a function , which reflects how much of the original constraints are violated.

Generalized Optimal Hyperplane

\begin{align} &\min \Phi(\mathbf{w}, \boldsymbol{\xi}) = \frac{1}{2}(\mathbf{w}\cdot \mathbf{w}) + C\left( \sum_{i=1}^l \xi_{i} \right) \quad \text{w.r.t } \mathbf{w} \\ &\text{s.t. } y_{i}((\mathbf{w} \cdot \mathbf{x}_{i}) + b) \geq 1-\xi_{i}, \quad i=1,2,\dots,l

\end{align}

Where parameter $C$ controls the penalty on errors.

I am kinda too tired to derive everything like we did in Optimal Hyperplane, but we need to use Kuhn Tucker theorem as well, but with the error term .

Here are the revised theorem:

\end{align}

The decision function solution: