Nonlinear Independent Component Analysis: A Principled Framework For .

1y ago
9 Views
2 Downloads
8.03 MB
65 Pages
Last View : 15d ago
Last Download : 3m ago
Upload by : Grant Gall
Transcription

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sNonlinear independent component analysis:A principled framework forunsupervised deep learningAapo Hyvärinen[Now:] Parietal Team, INRIA-Saclay, France[Earlier:] Gatsby Unit, University College London, UK[Always:] Dept of Computer Science, University of Helsinki, Finland[Kind of:] CIFARA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sAbstractI Short critical introduction to deep learningI Importance of Big DataA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sAbstractI Short critical introduction to deep learningI Importance of Big DataI Importance of unsupervised learningA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sAbstractI Short critical introduction to deep learningI Importance of Big DataI Importance of unsupervised learningI Disentanglement methods try to find independent factorsA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sAbstractI Short critical introduction to deep learningI Importance of Big DataI Importance of unsupervised learningI Disentanglement methods try to find independent factorsI In linear case, independent component analysis (ICA)successful, can we extend to a nonlinear method?A. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sAbstractI Short critical introduction to deep learningI Importance of Big DataI Importance of unsupervised learningI Disentanglement methods try to find independent factorsI In linear case, independent component analysis (ICA)successful, can we extend to a nonlinear method?I Problem: Nonlinear ICA fundamentally ill-definedA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sAbstractI Short critical introduction to deep learningI Importance of Big DataI Importance of unsupervised learningI Disentanglement methods try to find independent factorsI In linear case, independent component analysis (ICA)successful, can we extend to a nonlinear method?I Problem: Nonlinear ICA fundamentally ill-definedI Solution 1: use temporal structure in time series, in aself-supervised fashionA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sAbstractI Short critical introduction to deep learningI Importance of Big DataI Importance of unsupervised learningI Disentanglement methods try to find independent factorsI In linear case, independent component analysis (ICA)successful, can we extend to a nonlinear method?I Problem: Nonlinear ICA fundamentally ill-definedI Solution 1: use temporal structure in time series, in aself-supervised fashionI Solution 2: use an extra auxiliary variable in a VAE frameworkA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sSuccess of Artificial IntelligenceI Autonomous vehicles, machine translation, game playing,search engines, recommendation machine, etc.I Most modern applications based on deep learningA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sNeural networksI Layers of “neurons” repeating linear transformations andsimple nonlinearities fXxi (L 1) f (wij (L)xj (L)), where L is layerjwith e.g. f (x) max(0, x)I Can approximate “any” nonlinear input-output mappingsI Learns by nonlinear regression(e.g. least-squares)A. HyvärinenNonlinear ICA(1)

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sDeep learningI Deep Learning learning in neural network with many layersI With enough data, can learn any input-output relationship:image-category / past-present / friends - political viewsI Present boom started by Krizhevsky, Sutskever, Hinton, 2012:Superior recognition success of objects in imagesA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sCharacteristics of deep learningI Nonlinearity: E.g. recognition of a cat is highly nonlinearI A linear model would use a single prototypeBut locations, sizes, viewpoints highly variableA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sCharacteristics of deep learningI Nonlinearity: E.g. recognition of a cat is highly nonlinearI A linear model would use a single prototypeBut locations, sizes, viewpoints highly variableI Needs big data : E.g. millions of images from the InternetI Because general nonlinear functions have many parametersA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sCharacteristics of deep learningI Nonlinearity: E.g. recognition of a cat is highly nonlinearI A linear model would use a single prototypeBut locations, sizes, viewpoints highly variableI Needs big data : E.g. millions of images from the InternetI Because general nonlinear functions have many parametersI Needs big computers : Graphics Processing Units (GPU)I Obvious consequence of need for big data, and nonlinearitiesA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sCharacteristics of deep learningI Nonlinearity: E.g. recognition of a cat is highly nonlinearI A linear model would use a single prototypeBut locations, sizes, viewpoints highly variableI Needs big data : E.g. millions of images from the InternetI Because general nonlinear functions have many parametersI Needs big computers : Graphics Processing Units (GPU)I Obvious consequence of need for big data, and nonlinearitiesI Most theory quite old : Nonlinear (logistic) regressionI But earlier we didn’t have enough data and “compute”A. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sImportance unsupervised learningI Success stories in deep learning need category labelsI Is it a cat or a dog? Liked or not liked?A. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sImportance unsupervised learningI Success stories in deep learning need category labelsI Is it a cat or a dog? Liked or not liked?I Problem: labels may beI Difficult to obtainI Unrealistic in neural modellingI AmbiguousA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sImportance unsupervised learningI Success stories in deep learning need category labelsI Is it a cat or a dog? Liked or not liked?I Problem: labels may beI Difficult to obtainI Unrealistic in neural modellingI AmbiguousA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sImportance unsupervised learningI Success stories in deep learning need category labelsI Is it a cat or a dog? Liked or not liked?I Problem: labels may beI Difficult to obtainI Unrealistic in neural modellingI AmbiguousI Unsupervised learning:I we only observe a data vector x, no label or target yI E.g. photographs with no labelsI Very difficult, largely unsolved problemA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sICA as principled unsupervised learningDifficulty of nonlinear ICAICA as principled unsupervised learningI Linear independent component analysis (ICA)xi (t) nXaij sj (t)for all i, j 1 . . . n(2)j 1I xi (t) is i-th observed signal at sample point t (possibly time)I aij constant parameters describing “mixing”I Assuming independent, non-Gaussian latent “sources” sjA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sICA as principled unsupervised learningDifficulty of nonlinear ICAICA as principled unsupervised learningI Linear independent component analysis (ICA)xi (t) nXaij sj (t)for all i, j 1 . . . n(2)j 1I xi (t) is i-th observed signal at sample point t (possibly time)I aij constant parameters describing “mixing”I Assuming independent, non-Gaussian latent “sources” sjI ICA is identifiable, i.e. well-defined:(Darmois-Skitovich 1950; Comon, 1994)I Observing only xi we can recover both aij and sjI I.e. original sources can be recoveredI As opposed to PCA, factor analysisA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sICA as principled unsupervised learningDifficulty of nonlinear ICAUnsupervised learning can have different goalsA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sICA as principled unsupervised learningDifficulty of nonlinear ICAUnsupervised learning can have different goals1) Accurate model of data distribution?I E.g. Variational Autoencoders are goodA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sICA as principled unsupervised learningDifficulty of nonlinear ICAUnsupervised learning can have different goals1) Accurate model of data distribution?I E.g. Variational Autoencoders are good2) Sampling points from data distribution?I E.g. Generative Adversarial Networks are goodA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sICA as principled unsupervised learningDifficulty of nonlinear ICAUnsupervised learning can have different goals1) Accurate model of data distribution?I E.g. Variational Autoencoders are good2) Sampling points from data distribution?I E.g. Generative Adversarial Networks are good3) Useful features for supervised learning?I Many methods, “Representation learning”A. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sICA as principled unsupervised learningDifficulty of nonlinear ICAUnsupervised learning can have different goals1) Accurate model of data distribution?I E.g. Variational Autoencoders are good2) Sampling points from data distribution?I E.g. Generative Adversarial Networks are good3) Useful features for supervised learning?I Many methods, “Representation learning”4) Reveal underlying structure in data,disentangle latent quantities?I Independent Component Analysis! (this talk)A. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sICA as principled unsupervised learningDifficulty of nonlinear ICAUnsupervised learning can have different goals1) Accurate model of data distribution?I E.g. Variational Autoencoders are good2) Sampling points from data distribution?I E.g. Generative Adversarial Networks are good3) Useful features for supervised learning?I Many methods, “Representation learning”4) Reveal underlying structure in data,disentangle latent quantities?I Independent Component Analysis! (this talk)I These goals are orthogonal, even contradictory!I Probably, no method can accomplish all (Cf. Theis et al 2015)A. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sICA as principled unsupervised learningDifficulty of nonlinear ICAUnsupervised learning can have different goals1) Accurate model of data distribution?I E.g. Variational Autoencoders are good2) Sampling points from data distribution?I E.g. Generative Adversarial Networks are good3) Useful features for supervised learning?I Many methods, “Representation learning”4) Reveal underlying structure in data,disentangle latent quantities?I Independent Component Analysis! (this talk)I These goals are orthogonal, even contradictory!I Probably, no method can accomplish all (Cf. Theis et al 2015)I In unsupervised learning research, must specify actual goalA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sICA as principled unsupervised learningDifficulty of nonlinear ICAIdentifiability means ICA does blind source separationObserved signals:Principal components:Independent components are original sources:A. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sICA as principled unsupervised learningDifficulty of nonlinear ICAExample of ICA: Brain source separation(Hyvärinen, Ramkumar, Parkkonen, Hari, 2010)A. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sICA as principled unsupervised learningDifficulty of nonlinear ICAExample of ICA: Image features(Olshausen and Field, 1996; Bell and Sejnowski, 1997)Features similar to wavelets, Gabor functions, simple cells.A. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sICA as principled unsupervised learningDifficulty of nonlinear ICANonlinear ICA is an unsolved problemI Extend ICA to nonlinear case to get general disentanglement?I Unfortunately, “basic” nonlinear ICA is not identifiable:I If we define nonlinear ICA model simply asxi (t) fi (s1 (t), . . . , sn (t))we cannot recover original sourcesSources (s)Mixtures (x)A. Hyvärinenfor all i, j 1 . . . n(3)(Darmois, 1952; Hyvärinen & Pajunen, 1999)Independent estimatesNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sICA as principled unsupervised learningDifficulty of nonlinear ICADarmois constructionI Darmois (1952) showed impossibility of nonlinear ICA:I For any x1 , x2 , can always construct y g (x1 , x2 )independent of x1 asg (ξ1 , ξ2 ) P(x2 ξ2 x1 ξ1 )A. HyvärinenNonlinear ICA(4)

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sICA as principled unsupervised learningDifficulty of nonlinear ICADarmois constructionI Darmois (1952) showed impossibility of nonlinear ICA:I For any x1 , x2 , can always construct y g (x1 , x2 )independent of x1 asg (ξ1 , ξ2 ) P(x2 ξ2 x1 ξ1 )(4)I Independence alone too weak for identifiability:We could take x1 as independent component which is absurdI Maximizing non-Gaussianity of components equally absurd:Scalar transform h(x1 ) can give any distributionSources (s)Mixtures (x)Independent estimatesA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sICA as principled unsupervised learningDifficulty of nonlinear ICATemporal structure helps in nonlinear ICAI Two kinds of temporal g et al 2003)(Hyvärinen and Morioka, NIPS2016)I Now, identifiability of nonlinear ICA can be proven(Sprekeler et al, 2014; Hyvärinen and Morioka, NIPS2016 & AISTATS2017):Can find original sources!A. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sICA as principled unsupervised learningDifficulty of nonlinear ICATrick: “Self-supervised” learningI Supervised learning: we haveI “input” x, e.g. images / brain signalsI “output” y, e.g. content (cat or dog) / experimental conditionA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sICA as principled unsupervised learningDifficulty of nonlinear ICATrick: “Self-supervised” learningI Supervised learning: we haveI “input” x, e.g. images / brain signalsI “output” y, e.g. content (cat or dog) / experimental conditionI Unsupervised learning: we haveI only “input” xA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sICA as principled unsupervised learningDifficulty of nonlinear ICATrick: “Self-supervised” learningI Supervised learning: we haveI “input” x, e.g. images / brain signalsI “output” y, e.g. content (cat or dog) / experimental conditionI Unsupervised learning: we haveI only “input” xI Self-supervised learning: we haveI only “input” xI but we invent y somehow, e.g. by creating corrupted data, anduse supervised algorithmsA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sICA as principled unsupervised learningDifficulty of nonlinear ICATrick: “Self-supervised” learningI Supervised learning: we haveI “input” x, e.g. images / brain signalsI “output” y, e.g. content (cat or dog) / experimental conditionI Unsupervised learning: we haveI only “input” xI Self-supervised learning: we haveI only “input” xI but we invent y somehow, e.g. by creating corrupted data, anduse supervised algorithmsI Numerous examples in computer vision:I Remove part of photograph, learn to predict missing part(x is original data with part removed, y is missing part)A. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sPermutation-contrastive learningTime-contrastive learningAuxiliary variables frameworkPermutation-contrastive learningI Observe n-dim time series x(t)(Hyvärinen and Morioka 2017)1nA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sPermutation-contrastive learningTime-contrastive learningAuxiliary variables frameworkPermutation-contrastive learningI Observe n-dim time series x(t)I Take short time windows as new data y(t) x(t), x(t 1)A. Hyvärinen(Hyvärinen and Morioka 2017)1nNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sPermutation-contrastive learningTime-contrastive learningAuxiliary variables frameworkPermutation-contrastive learningI Observe n-dim time series x(t)I Take short time windows as new data y(t) x(t), x(t 1)(Hyvärinen and Morioka 2017)Real data1nI Create randomly time-permuted data y (t) x(t), x(t )with t a random time point.A. HyvärinenNonlinear ICAPermuted data

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sPermutation-contrastive learningTime-contrastive learningAuxiliary variables frameworkPermutation-contrastive learningI Observe n-dim time series x(t)I Take short time windows as new data y(t) x(t), x(t 1)I Create randomly time-permuted data y (t) x(t), x(t )(Hyvärinen and Morioka 2017)Real dataPermuted data1nFeature extractor:1with t a random time point.I Train NN to discriminate y from y I Could this really do Nonlinear ICA?nLogistic regressionReal dataA. HyvärinenNonlinear ICAvs. permuted

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sTheorem:Permutation-contrastive learningTime-contrastive learningAuxiliary variables frameworkPCL estimates nonlinear ICA with time dependenciesI Assume data follows nonlinear ICA model x(t) f(s(t)) withI smooth, invertible nonlinear mixing f : Rn RnI independent sources si (t)I temporally dependent (strongly enough), stationaryI non-Gaussian (strongly enough)A. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sTheorem:Permutation-contrastive learningTime-contrastive learningAuxiliary variables frameworkPCL estimates nonlinear ICA with time dependenciesI Assume data follows nonlinear ICA model x(t) f(s(t)) withI smooth, invertible nonlinear mixing f : Rn RnI independent sources si (t)I temporally dependent (strongly enough), stationaryI non-Gaussian (strongly enough)I Then, PCL demixes nonlinear ICA: hidden units give si (t)I A constructive proof of identifiabilityA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sTheorem:Permutation-contrastive learningTime-contrastive learningAuxiliary variables frameworkPCL estimates nonlinear ICA with time dependenciesI Assume data follows nonlinear ICA model x(t) f(s(t)) withI smooth, invertible nonlinear mixing f : Rn RnI independent sources si (t)I temporally dependent (strongly enough), stationaryI non-Gaussian (strongly enough)I Then, PCL demixes nonlinear ICA: hidden units give si (t)I A constructive proof of identifiabilityI For Gaussian sources, demixes up to linear mixingA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sPermutation-contrastive learningTime-contrastive learningAuxiliary variables frameworkIllustration of demixing capabilityI AR Model with Laplacian innovations, n 2log p(s(t) s(t 1)) s(t) ρs(t 1) I Nonlinearity is MLP. Mixing: leaky ReLU’s; Demixing: maxoutSources (s)Mixtures (x)Estimates by kTDSEP (Harmeling et al 2003)Estimates by our PCLA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sTime-contrastive learning:Permutation-contrastive learningTime-contrastive learningAuxiliary variables framework(Hyvärinen and Morioka 2016)I Observe n-dim time series x(t)1nTime ( )A. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sTime-contrastive learning:Permutation-contrastive learningTime-contrastive learningAuxiliary variables framework(Hyvärinen and Morioka 2016)I Observe n-dim time series x(t)I Divide x(t) into T segments(e.g. bins with equal sizes)12Segments (134T)T-1T1nTime ( )A. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sTime-contrastive learning:Permutation-contrastive learningTime-contrastive learningAuxiliary variables framework(Hyvärinen and Morioka 2016)I Observe n-dim time series x(t)I Divide x(t) into T segments(e.g. bins with equal sizes)I Train MLP to tell which segmenta single data point comes fromI Number of classes is T ,labels given by index of segmentI Multinomial logistic regression12Segments (134T)T-1T1nFeature extractor:1mMultinomial logistic regression:A. Hyvärinen1Nonlinear ICA122334TT

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sTime-contrastive learning:Permutation-contrastive learningTime-contrastive learningAuxiliary variables framework(Hyvärinen and Morioka 2016)I Observe n-dim time series x(t)I Divide x(t) into T segments(e.g. bins with equal sizes)I Train MLP to tell which segmenta single data point comes from1Segments (134T)T-1T1nFeature extractor:I Number of classes is T ,labels given by index of segmentI Multinomial logistic regression1I In hidden layer h, NN should learn torepresent nonstationarity( differences between segments)I Nonlinear ICA for nonstationary data!A. Hyvärinen2m1Nonlinear ICAMultinomial logistic regression:122334TT

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sPermutation-contrastive learningTime-contrastive learningAuxiliary variables frameworkExperiments on MEGI Sources estimated from resting data (no stimulation)I a) Validation by classifying another data set with fourstimulation modalities: visual, auditory, tactile, rest.I Trained a linear SVM on estimated sourcesI Number of layers in MLP ranging from 1 to 4a)Classification accuracy (%)I b) Attempt to visualize nonlinear processingL 1L 4b)50L 1L3L 4L24030L1TCLDAEkTDSEP NSVICAFigure 3: Real MEG data. a) Classification accuracies of linear SMVs newly trained with taskA. HyvärinenNonlinear ICAsession data to predict stimulationlabels in task-sessions,with feature extractors trained in advance

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sPermutation-contrastive learningTime-contrastive learningAuxiliary variables frameworkAuxiliary variables: Alternative to temporal structure(Arandjelovic & Zisserman, 2017; Hyvärinen et al, 2019)Look at correlations of video (main data) and audio (aux var)A. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sDeep Latent Variable Models and VAE’sI General framework with observed data vector x and latent z:Zp(x, z) p(x z)p(z), p(x) p(x, z)dzwhere θ is a vector of parameters, e.g. in a neural networkI Posterior p(x z) could model nonlinear mixingA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sDeep Latent Variable Models and VAE’sI General framework with observed data vector x and latent z:Zp(x, z) p(x z)p(z), p(x) p(x, z)dzwhere θ is a vector of parameters, e.g. in a neural networkI Posterior p(x z) could model nonlinear mixingI Variational autoencoders (VAE):I Model:I Define prior so that z white Gaussian (thus independent zi )I Define posterior so that x f(z) nI Estimation:I Approximative maximization of likelihoodI Approximation is “variational lower bound”A. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sDeep Latent Variable Models and VAE’sI General framework with observed data vector x and latent z:Zp(x, z) p(x z)p(z), p(x) p(x, z)dzwhere θ is a vector of parameters, e.g. in a neural networkI Posterior p(x z) could model nonlinear mixingI Variational autoencoders (VAE):I Model:I Define prior so that z white Gaussian (thus independent zi )I Define posterior so that x f(z) nI Estimation:I Approximative maximization of likelihoodI Approximation is “variational lower bound”I Is such a model identifiable?A. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sIdentifiable VAEI Original VAE is not identifiable:I Latent variables usually white and Gaussian:I Any orthogonal rotation is equivalent: z0 Uz has exactly thesame distribution.A. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sIdentifiable VAEI Original VAE is not identifiable:I Latent variables usually white and Gaussian:I Any orthogonal rotation is equivalent: z0 Uz has exactly thesame distribution.I Our new iVAE (Khemakhem, Kingma, Hyvärinen, 2019):I Assume we also observe auxiliary variable u,e.g. audio for video, segment label, historyI General framework, not just time structureI zi conditionally independentgiven uI Variant of our nonlinear ICA,hence identifiableA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sApplication to causal analysisI Causal discovery : learning causal structure withoutinterventionsI We can use nonlinear ICA to find general non-linear causalrelationships (Monti et al, UAI2019)I Identifiability absolutely necessaryN1f1N2f2S1 : X1 f1 (N1 )S2 : X2 f2 (X1 , N2 )X1X2A. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sConclusionI Conditions for ordinary deep learning:I Big data, big computers, class labels (outputs)A. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sConclusionI Conditions for ordinary deep learning:I Big data, big computers, class labels (outputs)I If no class labels: unsupervised learningA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sConclusionI Conditions for ordinary deep learning:I Big data, big computers, class labels (outputs)I If no class labels: unsupervised learningI Independent component analysis can be made nonlinearI Special assumptions needed for identifiabilityA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sConclusionI Conditions for ordinary deep learning:I Big data, big computers, class labels (outputs)I If no class labels: unsupervised learningI Independent component analysis can be made nonlinearI Special assumptions needed for identifiabilityI Self-supervised methods are easy to implementA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sConclusionI Conditions for ordinary deep learning:I Big data, big computers, class labels (outputs)I If no class labels: unsupervised learningI Independent component analysis can be made nonlinearI Special assumptions needed for identifiabilityI Self-supervised methods are easy to implementI Connection to VAE’s can be made iVAEA. HyvärinenNonlinear ICA

Deep LearningIndependent component analysisNonlinear ICAConnection to VAE’sConclusionI Conditions for ordinary deep learning:I Big data, big computers, class labels (outputs)I If no class labels: unsupervised learningI Independent component analysis can be made nonlinearI Special assumptions needed for identifiabilityI Self-supervised methods are easy to implementI Connection to VAE’s can be made iVAEI Principled framework for “disentanglement”A. HyvärinenNonlinear ICA

Deep Learning Independent component analysis Nonlinear ICA Connection to VAE's Nonlinear independent component analysis: A principled framework for . I Solution 1: usetemporal structurein time series, in a self-supervisedfashion I Solution 2: use an extraauxiliary variablein aVAEframework A. Hyv arinen Nonlinear ICA. Deep Learning

Related Documents:

Nonlinear Finite Element Analysis Procedures Nam-Ho Kim Goals What is a nonlinear problem? How is a nonlinear problem different from a linear one? What types of nonlinearity exist? How to understand stresses and strains How to formulate nonlinear problems How to solve nonlinear problems

Third-order nonlinear effectThird-order nonlinear effect In media possessing centrosymmetry, the second-order nonlinear term is absent since the polarization must reverse exactly when the electric field is reversed. The dominant nonlinearity is then of third order, 3 PE 303 εχ The third-order nonlinear material is called a Kerr medium. P 3 E

Outline Nonlinear Control ProblemsSpecify the Desired Behavior Some Issues in Nonlinear ControlAvailable Methods for Nonlinear Control I For linear systems I When is stabilized by FB, the origin of closed loop system is g.a.s I For nonlinear systems I When is stabilized via linearization the origin of closed loop system isa.s I If RoA is unknown, FB provideslocal stabilization

RR Donnelley Component R.R. Donnelley Printing Companies Component Haddon Component Banta Employees Component Banta Book Group Component Banta Danbury Component Banta Specialty Converting Component Moore Wallace Component (other than Cardinal Brands Benefit and Check Printers Benefit) Cardinal Brands Benefit of the Moore Wallace Component

Keywords: Independent component analysis, ICA, principal component analysis, PCA, face recognition. 1. INTRODUCTION Several advances in face recognition such as "H lons, " "Eigenfa es, " and "Local Feature Analysis4" have employed forms of principal component analysis, which addresses only second-order moments of the input. Principal component

Probabilistic Independent Component Analysis for FMRI Principles of EDA Principal Component Analysis From PCA to ICA Independent Component Analysis Spatial ICA for FMRI the data is represented as a 2D matrix and decomposed into a set of spatially independent component maps and a

oriented nonlinear analysis procedures” based on the so-called “pushover analysis”. All pushover analysis procedures can be considered as approximate extensions of the response spectrum method to the nonlinear response analysis with varying degrees of sophistication. For example, “Nonlinear Static Procedure—NSP” (ATC, 1996; FEMA, 2000) may be looked upon as a “single-mode .

Scrum, Agile Software Development. with Ken Schwaber (Prentice Hall, fall 2001), a provocative book that assumes software development is more like . new product development. than the manufacturing-like processes that the software industry has used for the last 20 years. Arie van Bennekum. has been actively involved in DSDM and the DSDM Consortium since 1997. Before that he had been working .