Instead of normal Logistic Regression which is binary, we want to classify it to multiple classes.

Binary classification:

  • Weighted sum β†’ Logistic function β†’ Compare with threshold β†’ Classification. Multiclass classification:
  • Weighted sum β†’ Compare among peers (Max function) β†’ Classification.

But max functions don’t sum up to 1. We need softmax.

And the loss function is cross-entropy loss.

Note that is the output probability of the th sample being in th class, and is the indicator of the correct class label.