Problem 1: Derivation of Fisher Linear Discriminant
Problem Setup
We are given two classes of data: Class 1 () with samples and Class 2 () with samples. Each data point exists in a d-dimensional space, . Our goal is to find a projection direction that best separates the two classes when we project the data onto a one-dimensional space using:
This projection maps our d-dimensional data to a scalar value .
Quantities in the Original Space
Before projecting, we define several important quantities in the original d-dimensional space.
The class means are:
The within-class scatter matrices measure how spread out the data is within each class: The total within-class scatter matrix combines both classes:
The between-class scatter matrix measures the separation between class means:
These are all taken from the lecture notes.
Quantities in the Projected Space
After projection, we need to express these quantities in terms of the one-dimensional projected values. The projected class means are:
For the projected within-class scatter, we compute:
Factoring out :
Bringing outside the summation:
The total projected within-class scatter is:
The projected between-class scatter is the squared distance between projected means:
Fisher’s Criterion
Fisher proposed finding the projection direction that maximizes the ratio of between-class variance to within-class variance:
Of course, we want the between-class variance to be as large as possible (large numerator) and the within-class variance to be as small as possible (small denominator).
An important observation is that is scale-invariant. If we replace with for any non-zero constant :
Since only the direction of matters, not its magnitude, we can fix the denominator to any positive constant c and simply maximize the numerator. This transforms our problem into:
Lagrangian Optimization
We solve this constrained optimization problem using Lagrange multipliers. The Lagrangian is1:
Taking the derivative with respect to w and setting it to zero:
This simplifies to:
Assuming is invertible, we multiply both sides by :
This is a generalized eigenvalue problem where w is an eigenvector of with eigenvalue .
The between-class scatter matrix has a special structure:
This is an outer product, making a rank-1 matrix. When we apply to any vector :
The result is always in the direction of scaled by .
Substituting this into our eigenvalue equation:
This shows that w is proportional to . Since we only care about the direction of , we can ignore the constants and .
As such, the optimal projection direction is:
where:
is the total within-class scatter matrix.
are the means of the tw classes.
Footnotes
This is shamelessly taken from the lecture notes. ↩