Instead of finding
One method to do this is by introducing an error / bias term. Basically, we change from the cost function of Perceptron
to
We then perform differentiation for gradient descent to get
and so we update the weights for every example with
This is the ADALINE / LMS algorithm.