From Normalized Augmented Feature Vector, we get the value that we want to maximize. Since cost functions are usually things you wanna minimize, we get

In which we want to find the that minimizes the above function.

Obviously, we do differentiation wrt .

\triangledown J = \frac{ \partial J_{p}(\boldsymbol{\alpha})}{ \partial \boldsymbol{\alpha} } = -\sum_{\mathbf{y}_{j}\in\mathcal{Y}^k} (-\mathbf{y}_{j}) $$As such, we get $\boldsymbol{\alpha}(k+1) = \boldsymbol{\alpha}(k) + \rho_{k}\sum_{\mathbf{y}_{j}\in\mathcal{Y}^k}\mathbf{y}_{j}$, where $\rho_{k}$ is the learning rate. This is the backprop formula. In traditional perceptron, we update the learning rate with the variable increment rule:

\rho_{k} = \frac{|\boldsymbol{\alpha}(k)^T\mathbf{y}{j}|}{\begin{Vmatrix} \mathbf{y}{j} \end{Vmatrix}^2}