Problem 1: Derivation of Fisher Linear Discriminant

Problem Setup

We are given two classes of data: Class 1 () with samples and Class 2 () with samples. Each data point exists in a d-dimensional space, . Our goal is to find a projection direction that best separates the two classes when we project the data onto a one-dimensional space using: This projection maps our d-dimensional data to a scalar value .

Quantities in the Original Space

Before projecting, we define several important quantities in the original d-dimensional space.

The class means are: The within-class scatter matrices measure how spread out the data is within each class: The total within-class scatter matrix combines both classes: The between-class scatter matrix measures the separation between class means: These are all taken from the lecture notes.

Quantities in the Projected Space

After projection, we need to express these quantities in terms of the one-dimensional projected values. The projected class means are: For the projected within-class scatter, we compute: Factoring out : Bringing outside the summation: The total projected within-class scatter is: The projected between-class scatter is the squared distance between projected means:

Fisher’s Criterion

Fisher proposed finding the projection direction that maximizes the ratio of between-class variance to within-class variance: Of course, we want the between-class variance to be as large as possible (large numerator) and the within-class variance to be as small as possible (small denominator).

An important observation is that is scale-invariant. If we replace with for any non-zero constant : Since only the direction of matters, not its magnitude, we can fix the denominator to any positive constant c and simply maximize the numerator. This transforms our problem into:

Lagrangian Optimization

We solve this constrained optimization problem using Lagrange multipliers. The Lagrangian is¹: Taking the derivative with respect to w and setting it to zero: This simplifies to: Assuming is invertible, we multiply both sides by : This is a generalized eigenvalue problem where w is an eigenvector of with eigenvalue .

The between-class scatter matrix has a special structure: This is an outer product, making a rank-1 matrix. When we apply to any vector : The result is always in the direction of scaled by .

Substituting this into our eigenvalue equation: This shows that w is proportional to . Since we only care about the direction of , we can ignore the constants and .

As such, the optimal projection direction is: where:

is the total within-class scatter matrix.
are the means of the tw classes.

This is shamelessly taken from the lecture notes. ↩

Explorer

TU ML HW 2

Problem 1: Derivation of Fisher Linear Discriminant

Problem Setup

Quantities in the Original Space

Quantities in the Projected Space

Fisher’s Criterion

Lagrangian Optimization

Graph View

Table of Contents

Explorer

TU ML HW 2

Problem 1: Derivation of Fisher Linear Discriminant

Problem Setup

Quantities in the Original Space

Quantities in the Projected Space

Fisher’s Criterion

Lagrangian Optimization

Footnotes

Graph View

Table of Contents