t-SNE

This is developed since PCA cannot handle non-linear data. The logic is that we want to create probability of the high-dimensional data, and then find the most similar probability for the low-dimensional data.

Model the neighborhood of the high dimensional data as distribution
- We use t-distribution here (that’s where the t in t-SNE comes from)
$p_{j ∣ i} = \frac{exp ( - x _{i} - x _{j} ^{2} /2 σ _{i}^{2} )}{\sum _{k \neq = i} exp ( - x _{i} - x _{k} ^{2} /2 σ _{i}^{2} )}$
- The value of $σ_{i}$ is chosen by hand
- The value of $k$ is the number of local neighbor to care about, or the perplexity
Model the neighborhood of the low dimensional data as distribution

q_{j ∣ i} = \frac{exp ( - y _{i} - y _{j} ^{2} )}{\sum _{k \neq = i} exp ( - y _{i} - y _{k} ^{2} )}

Find the cost function, which is the KL-divergence

KL-Div = j \sum p_{j ∣ i} lo g \frac{p _{j ∣ i}}{q _{j ∣ i}}

Gradient dexcent to find the distribution of low dimensional data

Messy Notes

Explorer

t-SNE

Graph View

Backlinks