Instead of 100% trusting in the training loss as a metric for model performance, we want to a theoretically rigorous way to ensure learning happens.
Learning is defined as finding a function that maps inputs
Some important terms:
- Loss Function
: measures how wrong your prediction is. - The value
is the penalty of predicting instead of .
- The value
- Risk Functional
: measures the expected loss over the entire data distribution . - Note that the value
is the objective function to minimize.
- Note that the value
So, we want to find the best function
Basically, we want to search
We will then minimize
Furthermore, we have the upper bound of
where
So now, we need to minimize both