Following Optimal Hyperplane, we have generalized optimal hyperplane that deals with non-separable cases.
Actually, we just need to add a slack / error variable to relax the constraint:
We then define a function , which reflects how much of the original constraints are violated.
Generalized Optimal Hyperplane
\begin{align} &\min \Phi(\mathbf{w}, \boldsymbol{\xi}) = \frac{1}{2}(\mathbf{w}\cdot \mathbf{w}) + C\left( \sum_{i=1}^l \xi_{i} \right) \quad \text{w.r.t } \mathbf{w} \\ &\text{s.t. } y_{i}((\mathbf{w} \cdot \mathbf{x}_{i}) + b) \geq 1-\xi_{i}, \quad i=1,2,\dots,l
\end{align}
Where parameter $C$ controls the penalty on errors.
I am kinda too tired to derive everything like we did in Optimal Hyperplane, but we need to use Kuhn Tucker theorem as well, but with the error term .
Here are the revised theorem:
Primal Problem
\begin{align} &\min \psi(\mathbf{w}, \boldsymbol{\xi}) = \frac{1}{2}(\mathbf{w}\cdot \mathbf{w}) + C\left( \sum_{i=1}^n \xi_{i} \right) \\ &\text{s.t. } y_{i}[(\mathbf{w}\cdot \mathbf{x}_{i})+b] -1 + \xi_{i} \geq 0, \quad \xi_{i} \geq 0, \quad i=1,..,l
\end{align}
Dual Problem
s.t. and
The decision function solution: