Instead of normal Logistic Regression which is binary, we want to classify it to multiple classes.

Binary classification:

  • Weighted sum Logistic function Compare with threshold Classification. Multiclass classification:
  • Weighted sum Compare among peers (Max function) Classification.

But max functions don’t sum up to 1. We need softmax.

And the loss function is cross-entropy loss.

Note that is the output probability of the th sample being in th class, and is the indicator of the correct class label.