I’m sure you know what this means conceptually, but to formalize it:
In English,
there exist a value
How can we tell if the data is linearly separable?
- Observe the decrease of error with time, does it really go to zero?
What if the data are not linearly separable?
- We have to choose — continue with linear methods, but allow for errors
- Or design nonlinear methods, like in Kernel Trick