Also called the Windrow-Hoff algorithm, it is as follows:

  1. Normalize the augmented feature vectors of all the training samples (refer to Normalized Augmented Feature Vector):
  2. Initialization: Set and all initial weights to zero . Set proper target values for all samples.
  3. Pick up sample from the training set, compute the gradient and update the weight
  4. Let , and repeat step 3 for all samples until the stopping criterion is met.

There are a few options on how to set the value of .

  1. If we follow Linear Discriminant Analysis, we can

    And set .

  2. Otherwise, we can approximate Bayesian Discriminant instead.

This is kinda beyond the course, so I am not gonna try to understand what’s going on.