Iām sure you know what this means conceptually, but to formalize it:
In English, there exist a value where the you can correctly classify all of the samples ().
How can we tell if the data is linearly separable?
- Observe the decrease of error with time, does it really go to zero?
What if the data are not linearly separable?
- We have to choose ā continue with linear methods, but allow for errors
- Or design nonlinear methods, like in Kernel Trick