βThe basic formula:
f(x)=i=0βdβwiβxiβ=wTx
The training data:
{(x1β,y1β),...,(xNβ,yNβ)},Β xjββRd+1,yjββR
Objective function:
minE=N1βj=1βNβ(f(xjβ)βyjβ)2
In here, E is the mean squared error.
We get the optimal weight wβ when we differentiate E w.r.t. w.
Letβs expand E first.
Eβ=N1ββ£β£Xwβyβ£β£2=N1β(Xwβy)T(Xwβy)=N1β(wTXTXw+scalarwTXTyβyTXwββ+yTy)=N1β(wTXTXw+2yTXw+yTy)ββ
Next, we have to use Standard Results of Matrix Calculus, equation (4) and (1).
βwβEββ=N1β(equation(4),Β A=XTXw(XTX+(XTX)T)βββequation(1)(2yTX)Tββ)=N1β(SinceΒ XXXΒ isΒ symmetric2XTXwβββ2XTy)=N2βXT(Xwβy)ββ
We set βwβEβ=0.
<spanstyle="color:rgb(0,32,96)"><br>XTXw<br>wββ=XTy=(XTX)β1XTy<br>ββ</span>
Note that the last line is only applicable if XTX is invertible.
To determine the fitness of the model, we have R2 Goodness of Fit.