Instead of normal Logistic Regression which is binary, we want to classify it to multiple classes.
Binary classification:
- Weighted sum → Logistic function → Compare with threshold → Classification. Multiclass classification:
- Weighted sum → Compare among peers (Max function) → Classification.
But max functions don’t sum up to 1. We need softmax.
And the loss function is cross-entropy loss.
Note that is the output probability of the th sample being in th class, and is the indicator of the correct class label.