Instead of finding for all samples, we find for as many samples as possible.
One method to do this is by introducing an error / bias term. Basically, we change from the cost function of Perceptron
to
We then perform differentiation for gradient descent to get
and so we update the weights for every example with
This is the ADALINE / LMS algorithm.