Intutively, we want to transform the samples into a different coordinate system where the new axes are uncorrelated, and most of the information is concentrated in just a few axes.

Let’s say you have a random vector . The covariance matrix of the data is:

The covariance matrix reflects how spread out the data is, and how correlated one dimension with another.

We have to perform eigen-decomposition next.

where is the eigenvector and is the eigenvalue. Next, we project the original data to new axes:

Following Feature Extraction, we should only keep the first coefficient with the largest eigenvalues, to get a compressed representation: