Instead of 100% trusting in the training loss as a metric for model performance, we want to a theoretically rigorous way to ensure learning happens.
Learning is defined as finding a function that maps inputs to outputs . The function is , where is the model parameter.
Some important terms:
- Loss Function : measures how wrong your prediction is.
- The value is the penalty of predicting instead of .
- Risk Functional : measures the expected loss over the entire data distribution .
- Note that the value is the objective function to minimize.
So, we want to find the best function that minimize the expected loss
Basically, we want to search across all the possible search space . But we canβt do this since we donβt know the true distribution . So we have to approximate it with empirical risk .
We will then minimize . This is called empirical risk minimization (EMR).
Furthermore, we have the upper bound of from Vapnik-Chervonenkis inequality.
where is a monotonic function and we can expand it to
So now, we need to minimize both and .