From Normalized Augmented Feature Vector, we get the value that we want to maximize. Since cost functions are usually things you wanna minimize, we get

In which we want to find the that minimizes the above function.

Obviously, we do differentiation wrt .

As such, we get , where is the learning rate. This is the backprop formula.

In traditional perceptron, we update the learning rate with the variable increment rule:

Perceptron Convergence Theorem

If training samples are linearly separable, the Perceptron Algorithm will converge to a solution vector in a finite number of updates.