Linear Regression
Crucial components:
- Input vector:
- Weight vector (one per feature):
- Note that the weight vector tells how important each feature are.
- And it outputs a predicted class
- Objective function: minimize the residual sum of squares (RSS)
TODO: I should look through the derivation of the derivative and try it by hand.
Logistic Regression (Single Class)
- We do the same as above, but we add a sigmoid function →
- We set the threshold at 0.5
- Objective function: Cross Entropy error function
Logistic Regression (Multi-Classs)
- Instead of sigmoid function, we use softmax
- So our objective function is
Gradient Descent
- Batch gradient descent: uses the entire dataset.
- Stochastic gradient descent: uses one data point.
- Mini-batch gradient descent combines both.
Backpropagation
- Just slowly go back from the last layer
- Remember that the weights are between and
Regularization
- Dropout — make the model learns a sparse representation, has an ensemble effect
- L1/2 regularization — discourages the weights from being too large
- where $\lambda$ is the penalty hyperparameter and $p = 2$ or $1$, where
- Gradient clipping, we put a threshold on the gradient update. If it is above the threshold, we scale it down.