Intutively, we want to transform the samples into a different coordinate system where the new axes are uncorrelated, and most of the information is concentrated in just a few axes.
Let’s say you have a random vector
The covariance matrix reflects how spread out the data is, and how correlated one dimension with another.
We have to perform eigen-decomposition
where
Following Feature Extraction, we should only keep the first