The model is

Then the training data is

For objective function, We get the likelihood of (trust me):

Then we do maximum likelihood estimation…

Then we differentiate wrt .

And so, the learning algorithm is