Expectation- Maximization Algorithm And Applications

2y ago
16 Views
2 Downloads
662.33 KB
31 Pages
Last View : 2m ago
Last Download : 3m ago
Upload by : Melina Bettis
Transcription

ExpectationMaximization Algorithmand ApplicationsEugene WeinsteinCourant Institute ofMathematical SciencesNov 14th, 2006

List of Concepts Maximum-Likelihood Estimation (MLE)Expectation-Maximization (EM)Conditional ProbabilityMixture ModelingGaussian Mixture Models (GMMs)String edit-distanceForward-backward algorithms2/31

Overview Expectation-Maximization Mixture Model Training Learning String Edit-Distance3/31

One-Slide MLE Review Say I give you a coin with But I don’t tell you the value of θ Now say I let you flip the coin n times You get h heads and n-h tails What is the natural estimate of θ ? This is More formally, the likelihood of θ is governed by abinomial distribution: Can prove is the maximum-likelihood estimate of θ Differentiate with respect to θ, set equal to 04/31

EM Motivation So, to solve any ML-type problem, we analyticallymaximize the likelihood function? Seems to work for 1D Bernoulli (coin toss) Also works for 1D Gaussian (find µ, σ 2 ) Not quite Distribution may not be well-behaved, or have too manyparameters Say your likelihood function is a mixture of 1000 1000dimensional Gaussians (1M parameters) Direct maximization is not feasible Solution: introduce hidden variables to Simplify the likelihood function (more common) Account for actual missing data5/31

Hidden and Observed Variables Observed variables: directly measurable from thedata, e.g. The waveform values of a speech recording Is it raining today? Did the smoke alarm go off? Hidden variables: influence the data, but nottrivial to measure The phonemes that produce a given speech recording P (rain today rain yesterday) Is the smoke alarm malfunctioning?6/31

Expectation-Maximization Model dependent random variables: Observed variable x Unobserved (hidden) variable y that generates x Assume probability distributions: θ represents set of all parameters of distribution Repeat until convergence E-step: Compute expectation of(θ ′,θ : old, new distribution parameters) M-step: Find θ that maximizes Q7/31

Conditional Expectation Review Let X, Y be r.v.’s drawn from thedistributions P(x) and P(y) Conditional distribution given by: Then For function h(Y ): Given a particular value of X (X x):8/31

Maximum Likelihood Problem Want to pick θ that maximizes the loglikelihood of the observed (x) andunobserved (y) variables given Observed variable x Previous parameters θ ′ Conditional expectation ofgiven x and θ ′ is9/31

EM Derivation Lemma (Special case of Jensen’sInequality): Let p(x), q(x) be probabilitydistributions. Then Proof: rewrite as: Interpretation: relative entropy non-negative10/31

EM Derivation EM Theorem: If then Proof: By some algebra and lemma, So, if this quantity is positive, so is 11/31

EM Summary Repeat until convergence E-step: Compute expectation of(θ ′,θ : old, new distribution parameters) M-step: Find θ that maximizes Q EM Theorem: If then Interpretation As long as we can improve the expectation of the log-likelihood,EM improves our model of observed variable x Actually, it’s not necessary to maximize the expectation, just needto make sure that it increases – this is called “Generalized EM”12/31

EM Comments In practice, the x is series of data points To calculate expectation, can assume i.i.d and sumover all points: Problems with EM? Local maxima Need to bootstrap training process (pick a θ ) When is EM most useful? When model distributions easy to maximize (e.g.,Gaussian mixture models) EM is a meta-algorithm, needs to be adapted toparticular application13/31

Overview Expectation-Maximization Mixture Model Training Learning String Distance14/31

EM Applications: Mixture Models Gaussian/normal distribution Parameters: mean µ and variance σ 2 In the multi-dimensional case, assume isotropic Gaussian:same variance in all dimensions We can model arbitrary distributions with density mixtures15/31

Density Mixtures Combine m elementary densities to model acomplex data distribution kth Gaussian parametrized by16/31

Density Mixtures Combine m elementary densities to model acomplex data distribution17/31

Density Mixtures Combine m elementary densities to model a complexdata distribution Log-likelihood function of the data x given: Log of sum – hard to optimize analytically! Instead, introduce hidden variable y : x generated by Gaussian k EM formulation: maximize18/31

Gaussian Mixture Model EM Goal: maximize n (observed) data points: n (hidden) labels: : xi generated by Gaussian k Several pages of math later, we get: E step: compute likelihood of M step: update αk, µk, σk for each Gaussian k 1.m19/31

GMM-EM Discussion Summary: EM naturally applicable to trainingprobabilistic models EM is a generic formulation, need to do somehairy math to get to implementation Problems with GMM-EM? Local maxima Need to bootstrap training process (pick a θ ) GMM-EM applicable to enormous number ofpattern recognition tasks: speech, vision, etc. Hours of fun with GMM-EM20/31

Overview Expectation-Maximization Mixture Model Training Learning String Distance21/31

String Edit-Distance Notation: Operate on two strings: Edit-distance: transform one string intoanother using Substitution: kitten Æ bitten, cost Insertion: cop Æ crop, cost Deletion: learn Æ earn, cost Can compute efficiently recursively22/31

Stochastic String Edit-Distance Instead of setting costs, model edit operationsequence as a random process Edit operations selected according to aprobability distribution For edit operation sequence View string edit-distance as memoryless (Markov): stochastic: random process according to δ ( ) isgoverned by a true probability distribution transducer:23/31

Edit-Distance Transducer Arc label a:b/0 means input a, output b andweight 0 Assume24/31

Two Distances Define yield of an edit sequence ν (zn#)as the set of all strings hx,yi such that zn#turns x into y Viterbi edit-distance: negative loglikelihood of most likely edit sequence Stochastic edit-distance: negative loglikelihood of all edit sequences from x to y25/31

Evaluating Likelihood Viterbi: Stochastic: Both require calculation ofpossible edit sequences over allpossibilities (three edit operations) However, memoryless assumption allowsus to compute likelihood efficiently Use the forward-backward method!26/31

Forward Evaluation of forward probabilities:likelihood of picking an edit sequence thatgenerates the prefix pair Memoryless assumption allows efficientrecursive computation:27/31

Backward Evaluation of backward probabilities:likelihood of picking an edit sequence thatgenerates the suffix pair Memoryless assumption allows efficientrecursive computation:28/31

EM Formulation Edit operations selected according to aprobability distribution So, EM has to update δ based onoccurrence counts of each operation(similar to coin-tossing example) Idea: accumulate expected counts fromforward, backward variables γ(z): expected count of edit operation z29/31

EM Details γ(z): expected count of edit operation z e.g,30/31

References A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incompletedata via the EM algorithm, Journal of the Royal Statistical Society B, 39(1), 1977 pp.1-38.C. F. J. Wu, On the Convergence Properties of the EM Algorithm, The Annals ofStatistics, 11(1), Mar 1983, pp. 95-103.F. Jelinek, Statistical Methods for Speech Recognition, 1997M. Collins, The EM Algorithm, 1997J. A. Bilmes, A Gentle Tutorial of the EM Algorithm and its Application to ParameterEstimation for Gaussian Mixture and Hidden Markov Models, Technical Report,University of Berkeley, TR-97-021, 1998E. S. Ristad and P. N. Yianilos, Learning string edit distance, IEEE Transactions onPattern Analysis and Machine Intelligence, 20(2), 1998, pp. 522-532.L.R. Rabiner. A tutorial on HMM and selected applications in speech recognition, InProc. IEEE, 77(2), 1989, pp. 257-286.A. D'Souza, Using EM To Estimate A Probablity [sic] Density With A Mixture OfGaussiansM. Mohri. Edit-Distance of Weighted Automata, in Proc. Implementation andApplication of Automata, (CIAA) 2002, pp. 1-23J. Glass, Lecture Notes, MIT class 6.345: Automatic Speech Recognition, 2003Carlo Tomasi, Estimating Gaussian Mixture Densities with EM – A Tutorial, 2004Wikipedia31/31

Expectation-Maximization Algorithm and Applications Eugene Weinstein Courant Institute of Mathematical Sciences . Can prove is the maximum-likelihood estimate of θ Differentiate with respect to θ, set equal to 0. 5/31 EM Motivation So, to solve any ML-type p

Related Documents:

Expectation Maximization We first focus on the analysis of the convergenceproperties of the Expectation-Maximization (EM) algorithm. Con-sider a probabilistic model of observed data x which uses latent variables z. The log-likelihood (objective function

Maximum Lq-Likelihood Estimation via the Expectation-Maximization Algorithm: A Robust Estimation of Mixture Models Yichen QIN and Carey E. PRIEBE We introduce a maximum Lq-likelihood estimation (MLqE) of mixture models using our proposed expectation-maximization (EM) al- gorithm, namely the EM algorithm with Lq-likelihood (EM-Lq).

The expectation maximization (EM) algorithm computes maximum likelihood (ML) estimates of unknown parameters in probabilistic models involving latent avriables Z1. An instructive way of thinking about EM is to think of it as a systematic way o

Machine Learning for Computer Vision Expectation-Maximization EM is an elegant and powerful method for MLE problems with latent variables Main idea: model parameters and latent variables are estimated iteratively, where average over the latent variables (expectation) A typical exam

We consider maximum likelihood (ML) estimation of mean and covariance structure models when data are missing. Expectation maximization (EM), generalized expectation maximization (GEM), Fletcher-Powell, and Fisher-scoring algorithm

2. Expectation Maximization (EM) Iteration. A maximum likelihood (ML) method for image reconstruction based on Poisson data was introduced by Shepp and Vardi [42] in 1982 for image reconstruction in emission tomography. In fact, this algorithm was originally proposed by Richardson [37]

Mathematical Expectation Properties of Mathematical Expectation I The concept of mathematical expectation arose in connection with games of chance. In its simplest form, mathematical expectation is the product of the amount a player stands to win and the probability that the player would win.

investigate the behaviour and transient response of a fuel cell system for automotive applications. Fuel cell dynamics are subjective to reactant flows, heat management and water transportation inside the fuel cell. Therefore, a control-oriented model has been devised in Aspen Plus Dynamics, which accommodates electrochemical, thermal, feed flow and water crossover models in addition to two .