The model is
Then the training data is
For objective function,
We get the likelihood of
Then we do maximum likelihood estimation…
Then we differentiate wrt
And so, the learning algorithm is
The model is
Then the training data is
For objective function,
We get the likelihood of
Then we do maximum likelihood estimation…
Then we differentiate wrt
And so, the learning algorithm is