Also called the Windrow-Hoff algorithm, it is as follows:
- Normalize the augmented feature vectors of all the training samples (refer to Normalized Augmented Feature Vector):
\mathbf{z}{i}β= \begin{cases} \mathbf{z}{i} & \text{if } \ -\mathbf{z}_{i} & \text{if } \end{cases}
2. Initialization: Set $k=0$ and all initial weights to zero $\boldsymbol{\alpha}(0) = \mathbf{0}$. Set proper target values $b_{i}$ for all samples. 3. Pick up sample $\mathbf{z}_{j}$ from the training set, compute the gradient and update the weight $$ \boldsymbol{\alpha}(k+1)=\boldsymbol{\alpha}(k) + \rho_{k}(b_{j}-\alpha(k)^T\mathbf{z}_{j})\mathbf{z}_{j}- Let , and repeat step 3 for all samples until the stopping criterion is met.
There are a few options on how to set the value of .
-
If we follow Linear Discriminant Analysis, we can
And set .
-
Otherwise, we can approximate Bayesian Discriminant instead.$$ b_{i}= 1, \ i = 1, \dots, N
This is kinda beyond the course, so I am not gonna try to understand whatβs going on.