Visualizing Data Using T-SNE - UCLouvain

1y ago
16 Views
2 Downloads
7.30 MB
42 Pages
Last View : 19d ago
Last Download : 3m ago
Upload by : Ophelia Arruda
Transcription

Visualizing Data using t-SNEAn Intuitive IntroductionSimon CarbonnelleUniversité Catholique de Louvain, ICTEAM12th of May, 2016

Visualization and Dimensionality ReductionIntuition behind t-SNEVisualizing representations

Visualization and Dimensionality ReductionIntuition behind t-SNEVisualizing representations

Visualization is key to understand data easily.Data of house areas in m2 and price in 1000s of onIs the relation linear?1

Visualization is key to understand data easily.QuestionIs the relation linear?1

Dimensionality Reduction is a helpful tool forvisualization.IDimensionality reduction algorithmsIIIMap high-dimensional data to alower dimensionWhile preserving structureThey are used forIIIVisualizationPerformanceCurse of dimensionalityIA ton of algorithms existIt-SNE is specialised for visualizationI. and has gained a lot of popularity2

Dimensionality Reduction is a helpful tool forvisualization.IDimensionality reduction algorithmsIIIMap high-dimensional data to alower dimensionWhile preserving structureThey are used forIIIVisualizationPerformanceCurse of dimensionalityIA ton of algorithms existIt-SNE is specialised for visualizationI. and has gained a lot of popularity3

Visualization and Dimensionality ReductionIntuition behind t-SNEVisualizing representations

Dimensionality Reduction techniques solveoptimization problems.X {x1 , x2 , ., xn Rh } Y {y1 , y2 , ., yn Rl }min C (X , Y)YThree approaches for Dimensionality Reduction:IDistance preservationITopology preservationIInformation preservation4

Dimensionality Reduction techniques solveoptimization problems.X {x1 , x2 , ., xn Rh } Y {y1 , y2 , ., yn Rl }min C (X , Y)YThree approaches for Dimensionality Reduction:IDistance preservationITopology preservationIInformation preservationt-SNE is distance-based but tends to preserve topology4

SNE computes pair-wise similarities.SNE converts euclidean distances to similarities, that can beinterpreted as probabilities.pj i Pexp( k xi xj k2 /2σi2 )22k6 i exp( k xi xk k /2σi )exp( k yi yj k2 )2k6 i exp( k yi yk k )qj i Ppi i 0, qi i 0Hence the name Stochastic Neighbor Embedding.5

Pair-wise similarities should stay the same. 6

Pair-wise similarities should stay the same.pj i qj i 6

Pair-wise similarities should stay the same. 6

Pair-wise similarities should stay the same. 6

Pair-wise similarities should stay the same. 6

Kullback-Leiber Divergence measures thefaithfulness with wich qj i models pj i .IPi {p1 i , p2 i , ., pn i } and Qi {q1 i , q2 i , ., qn i } are thedistributions on the neighbors of datapoint i.IKullback-Leiber Divergence (KL) compares two distributions.XXXpj iC KL(Pi Qi ) pj i logqj iiijIKL divergence is asymmetricIKL divergence is always positive.IWe have our minimization problem: minY C (X , Y)7

Some remaining questions.exp( k xi xj k2 /2σi2 )exp( k yi yj k2 )P,q j i222k6 i exp( k yi yk k )k6 i exp( k xi xk k /2σi )pj i P1. Why radial basis function (exponential)?2. Why probabilities?3. How do you choose σi ?8

Some remaining questions.exp( k xi xj k2 /2σi2 )exp( k yi yj k2 )P,q j i222k6 i exp( k yi yk k )k6 i exp( k xi xk k /2σi )pj i P1. Why radial basis function (exponential)?Focus on localgeometry.This is why t-SNEcan be interpreted astopology-based9

Some remaining questions.exp( k yi yj k2 )exp( k xi xj k2 /2σi2 )P,q j i222k6 i exp( k yi yk k )k6 i exp( k xi xk k /2σi )pj i P1. Why radial basis function (exponential)?2. Why probabilities?Small distance does notmean proximity on manifold.Probabilities are appropriateto model this uncertainty10

Some remaining questions.exp( k xi xj k2 /2σi2 )exp( k yi yj k2 )P,q j i222k6 i exp( k yi yk k )k6 i exp( k xi xk k /2σi )pj i P1. Why radial basis function (exponential)?2. Why probabilities?3. How do you choose σi ?11

The entropy of Pi increases with σi .EntropyH(P) Pipi log2 pi12

Perplexity, a smooth measure of the # of neighbors.PerplexityPerp(P) 2H(P) Entropy of 1.055Perplexity of 2.078 Entropy of 3.800Perplexity of 13.92913

From SNE to t-SNE. SNESymmetric SNE t-SNEModelisation:exp( kxi xj k2 /2σi2 )exp( kxi xk k2 /2σi2 )exp( kyi yj k2 )P2k6 i exp( kyi yk k )pj i Pqj i k6 iCost Function:C PKL(Pi Qi )iDerivatives:dCdyi 2Pj (pj i qj i pi j qi j )(yi yj )14

From SNE to t-SNE. SNEModelisation:exp( kxi xj k2 /2σi2 )P22k6 i exp( kxi xk k /2σi )exp( kyi yj k2 )P2k6 i exp( kyi yk k )pj i qj i Cost Function:C PKL(Pi Qi )iDerivatives:dCdyi 2Pj (pj i qj i pi j qi j )(yi yj )Symmetric SNE t-SNEModelisation:pij qij pj i pi j2nexp( kyi yj k2 )P2k6 l exp( kyk yl k )Cost Function:C KL(P Q)Derivatives:dCdyi 4IPj (pij qij )(yi yj )FasterComputation14

From SNE to t-SNE. SNEModelisation:exp( kxi xj k2 /2σi2 )P22k6 i exp( kxi xk k /2σi )exp( kyi yj k2 )P2k6 i exp( kyi yk k )pj i qj i Cost Function:C PKL(Pi Qi )iDerivatives:dCdyi 2Pj (pj i qj i pi j qi j )(yi yj )Symmetric SNEModelisation:pij qij pj i pi j2nexp( kyi yj k2 )P2k6 l exp( kyk yl k )Cost Function:C KL(P Q)Derivatives:dCdyi 4IPj (pij qij )(yi yj )FasterComputation t-SNEModelisation:pij qij pj i pi j2n(1 kyi yj k2 ) 1P2 1k6 l (1 kyk yl k )Cost Function:C KL(P Q)Derivatives:dCdyi 4Pj (pij qij )(yi yj )(1 k yi yj k2 ) 1IEven FasterComputationIBetterBehaviour14

The ”Crowding problem”There is much more space in high dimensions.15

Mismatched Tails can Compensate for MismatchedDimensionalitiesStudent-t distribution has heavier tails.16

Last but not least: Optimizationmin C (X , Y)YC KL(P Q) XXijpj i logpj iqj iINon-convexIGradient descent Momentum Adaptive learning rateY (t) Y (t 1) η(t)ITwo tricks:IIIδC α(t)(Y (t 1) Y (t 2) )δYEarly CompressionEarly ExaggerationIllustration Colah’s blog17

18

Visualization and Dimensionality ReductionIntuition behind t-SNEVisualizing representations

Mapping raw data to distributed representations.IFeature engineering is often laborious.INew tendency is to automatically learn adequate features orrepresentations.IUltimate goal: enable AI to extract useful features from rawsensory data.t-SNE can be used to make sense of the learned representations!19

Using t-SNE to explore a Word embedding.IISystem outputs 1 if central word is in right context, 0otherwise.Algorithms learns representation and classificationsimultaneously.From Machine Learning to Machine Reasoning, L. Bottou (2011)GoalRepresentation captures syntactic and semantic similarity.20

Using t-SNE to explore a Word embedding.http://colah.github.io/21

Explore a Wikipedia article embedding.http://colah.github.io/22

Exploring game state representations.Google Deepmind plays Atari games.Playing Atari with deep reinforcement learning, V. Mnih et Al.GoalLearning to play Space Invaders from score feedback and raw pixelvalues.23

Exploring game state representations.Google Deepmind plays Atari games.IIIA representation is learned with a convolutional neural networkFrom 84x84x4 28.224 pixel values to 512 neurons.Predicts expected score if a certain action is taken.Human-level control through deep reinforcement learning, V. Mnih et Al. (Nature,2015)24

Exploring game state representations.Google Deepmind plays Atari games.Human-level control through deep reinforcement learning, V. Mnih et Al. (Nature,2015)25

Using t-SNE to explore image representations.Classifying dogs and /IEach data point is an image of a dog or a catIred cats, blue dogs26

Using t-SNE to explore image representations.Classifying dogs and cats.RepresentationConvolutional net trained for Image Classification (1000 sne/27

Using t-SNE to explore image representations.Classifying dogs and cats.RepresentationConvolutional net trained for Image Classification (1000 sne/27

ConclusionIThe t-SNE algorithm reduces dimensionality while preservinglocal similarity.IThe t-SNE algorithm has been build heuristically.It-SNE is commonly used to visualize representations.28

Visualizing Data using t-SNE An Intuitive Introduction Simon Carbonnelle Universit e Catholique de Louvain, ICTEAM 12th of May, 2016. Visualization and Dimensionality Reduction Intuition behind t-SNE Visualizing representations. Visualization and Dimensionality Reduction

Related Documents:

niques (except t-SNE) perform strongly on ar-tificial data sets but their visualization of real, high-dimensional data sets is poor. Whereas, t-SNE outperforms the other techniques and cap-tures most of the local structure while revealing global structure like presence of clusters at vari-ous scales. t-SNE uses Gaussian distribution for

VISUALIZING DATA USING T-SNE 2. Stochastic Neighbor Embedding Stochastic Neighbor Embedding (SNE) starts by converting the high-dimensional Euclidean dis-tances between datapoints into conditional probabilities that represent similarities.1 The similarity of datapoint xj to datapoint xi is the conditional probability, pjji, that xi would pick xj as its neighbor

when applied to high-dimensional clustered data, t-SNE tends to produce a visualization with more separated clusters, which are often in good agreement with the clusters found by a dedicated clustering algorithm (Kobak and Berens, 2019). See Figure 1 for an example of data visualization using such a basic t-SNE algorithm.

Visualizing Data using t-SNE Laurens van der Maaten L.VANDERMAATEN@MICC UNIMAAS NL MICC-IKAT Maastricht University P.O. Box 616, 6200 MD Maastricht, The Netherlands Geoffrey Hinton HINTON@CS.TORONTO EDU Department of Computer Science University of Toronto 6 King's College Road, M5S 3G4 Toronto, ON, Canada

Visualizing Data using t-SNE Laurens van der Maaten L.VANDERMAATEN@MICC UNIMAAS NL MICC-IKAT Maastricht University P.O. Box 616, 6200 MD Maastricht, The Netherlands Geoffrey Hinton HINTON@CS.TORONTO EDU Department of Computer Science University of Toronto 6 King's College Road, M5S 3G4 Toronto, ON, Canada

E. Bertini, N. Elmqvist, and T. Wischgoll (Guest Editors) Visualizing Time-Dependent Data Using Dynamic t-SNE Paulo E. Rauber 1,2, Alexandre X. Falcão2, Alexandru C. Telea1 1 Johann Bernoulli Institute, University of Groningen, the Netherlands 2 Institute of Computing, University of Campinas, Brazil Abstract

Additional Visualizations with t-SNE and PCA Figure 2: t-SNE (1st column) and PCA (2nd column) visualizations on Sines, and t-SNE (3rd column) and PCA (4th column) visualizations on Stocks. Each row provides the visualization for each of the 7 benchmarks, ordered as follows: (1) TimeGAN, (2) RCGAN, (3) C-RNN-GAN, (4) T-Forcing, (5)

English Language Arts: Grade 2 READING Guiding Principle: Students read a wide range of fiction, nonfiction, classic, and contemporary works, to build an understanding of texts, of themselves, and of the cultures of the United States and the world; to acquire new information; to respond to the needs and demands of society and the workplace .