Linear Regression

‘The basic formula:

f (x) = i = 0 \sum d w_{i} x_{i} = w^{T} x

The training data:

{(x_{1}, y_{1}), ..., (x_{N}, y_{N})}, x_{j} \in R^{d + 1}, y_{j} \in R

Objective function:

min E = \frac{1}{N} j = 1 \sum N (f (x_{j}) - y_{j})^{2}

In here, $E$ is the mean squared error.

We get the optimal weight $w^{*}$ when we differentiate $E$ w.r.t. $w$ . Let’s expand $E$ first.

E = \frac{1}{N} ∣∣ Xw - y ∣ ∣^{2} = \frac{1}{N} (Xw - y)^{T} (Xw - y) = \frac{1}{N} (w^{T} X^{T} Xw + scalar w^{T} X^{T} y - y^{T} Xw + y^{T} y) = \frac{1}{N} (w^{T} X^{T} Xw + 2 y^{T} Xw + y^{T} y)

Next, we have to use Standard Results of Matrix Calculus, equation (4) and (1).

\frac{\partial E}{\partial w} = \frac{1}{N} (equation(4), A = X^{T} X w (X^{T} X + (X^{T} X)^{T}) - equation(1) (2 y^{T} X)^{T}) = \frac{1}{N} (Since X^{X} X is symmetric 2 X^{T} Xw - 2 X^{T} y) = \frac{2}{N} X^{T} (Xw - y)

We set $\frac{\partial E}{\partial w} = 0$ .

< s p an s t y l e = " co l or : r g b (0, 32, 96) " > < b r > X^{T} Xw < b r > w^{*} = X^{T} y = (X^{T} X)^{- 1} X^{T} y < b r > < / s p an >

Note that the last line is only applicable if $X^{T} X$ is invertible.

To determine the fitness of the model, we have R2 Goodness of Fit.

Messy Notes