Minimum Squared Error

Instead of finding for all samples, we find for as many samples as possible.

One method to do this is by introducing an error / bias term. Basically, we change from the cost function of Perceptron

We then perform differentiation for gradient descent to get

and so we update the weights for every example with

This is the ADALINE / LMS algorithm.

Explorer