‘The basic formula:
f(x)=i=0∑dwixi=wTx
The training data:
{(x1,y1),...,(xN,yN)}, xj∈Rd+1,yj∈R
Objective function:
minE=N1j=1∑N(f(xj)−yj)2
In here, E is the mean squared error.
We get the optimal weight w∗ when we differentiate E w.r.t. w.
Let’s expand E first.
E=N1∣∣Xw−y∣∣2=N1(Xw−y)T(Xw−y)=N1(wTXTXw+scalarwTXTy−yTXw+yTy)=N1(wTXTXw+2yTXw+yTy)
Next, we have to use Standard Results of Matrix Calculus, equation (4) and (1).
∂w∂E=N1(equation(4), A=XTXw(XTX+(XTX)T)−equation(1)(2yTX)T)=N1(Since XXX is symmetric2XTXw−2XTy)=N2XT(Xw−y)
We set ∂w∂E=0.
<spanstyle="color:rgb(0,32,96)"><br>XTXw<br>w∗=XTy=(XTX)−1XTy<br></span>
Note that the last line is only applicable if XTX is invertible.
To determine the fitness of the model, we have R2 Goodness of Fit.