Expectation- Maximization Algorithm And Applications

2y ago

16 Views

2 Downloads

662.33 KB

31 Pages

Last View : 2m ago

Last Download : 3m ago

Upload by : Melina Bettis

Report this link

Download PDF

Transcription

ExpectationMaximization Algorithmand ApplicationsEugene WeinsteinCourant Institute ofMathematical SciencesNov 14th, 2006

List of Concepts Maximum-Likelihood Estimation (MLE)Expectation-Maximization (EM)Conditional ProbabilityMixture ModelingGaussian Mixture Models (GMMs)String edit-distanceForward-backward algorithms2/31

Overview Expectation-Maximization Mixture Model Training Learning String Edit-Distance3/31

One-Slide MLE Review Say I give you a coin with But I don’t tell you the value of θ Now say I let you flip the coin n times You get h heads and n-h tails What is the natural estimate of θ ? This is More formally, the likelihood of θ is governed by abinomial distribution: Can prove is the maximum-likelihood estimate of θ Differentiate with respect to θ, set equal to 04/31

EM Motivation So, to solve any ML-type problem, we analyticallymaximize the likelihood function? Seems to work for 1D Bernoulli (coin toss) Also works for 1D Gaussian (find µ, σ 2 ) Not quite Distribution may not be well-behaved, or have too manyparameters Say your likelihood function is a mixture of 1000 1000dimensional Gaussians (1M parameters) Direct maximization is not feasible Solution: introduce hidden variables to Simplify the likelihood function (more common) Account for actual missing data5/31

Hidden and Observed Variables Observed variables: directly measurable from thedata, e.g. The waveform values of a speech recording Is it raining today? Did the smoke alarm go off? Hidden variables: influence the data, but nottrivial to measure The phonemes that produce a given speech recording P (rain today rain yesterday) Is the smoke alarm malfunctioning?6/31

Expectation-Maximization Model dependent random variables: Observed variable x Unobserved (hidden) variable y that generates x Assume probability distributions: θ represents set of all parameters of distribution Repeat until convergence E-step: Compute expectation of(θ ′,θ : old, new distribution parameters) M-step: Find θ that maximizes Q7/31

Conditional Expectation Review Let X, Y be r.v.’s drawn from thedistributions P(x) and P(y) Conditional distribution given by: Then For function h(Y ): Given a particular value of X (X x):8/31

Maximum Likelihood Problem Want to pick θ that maximizes the loglikelihood of the observed (x) andunobserved (y) variables given Observed variable x Previous parameters θ ′ Conditional expectation ofgiven x and θ ′ is9/31

EM Derivation Lemma (Special case of Jensen’sInequality): Let p(x), q(x) be probabilitydistributions. Then Proof: rewrite as: Interpretation: relative entropy non-negative10/31

EM Derivation EM Theorem: If then Proof: By some algebra and lemma, So, if this quantity is positive, so is 11/31

EM Summary Repeat until convergence E-step: Compute expectation of(θ ′,θ : old, new distribution parameters) M-step: Find θ that maximizes Q EM Theorem: If then Interpretation As long as we can improve the expectation of the log-likelihood,EM improves our model of observed variable x Actually, it’s not necessary to maximize the expectation, just needto make sure that it increases – this is called “Generalized EM”12/31

EM Comments In practice, the x is series of data points To calculate expectation, can assume i.i.d and sumover all points: Problems with EM? Local maxima Need to bootstrap training process (pick a θ ) When is EM most useful? When model distributions easy to maximize (e.g.,Gaussian mixture models) EM is a meta-algorithm, needs to be adapted toparticular application13/31

Overview Expectation-Maximization Mixture Model Training Learning String Distance14/31

EM Applications: Mixture Models Gaussian/normal distribution Parameters: mean µ and variance σ 2 In the multi-dimensional case, assume isotropic Gaussian:same variance in all dimensions We can model arbitrary distributions with density mixtures15/31

Density Mixtures Combine m elementary densities to model acomplex data distribution kth Gaussian parametrized by16/31

Density Mixtures Combine m elementary densities to model acomplex data distribution17/31

Density Mixtures Combine m elementary densities to model a complexdata distribution Log-likelihood function of the data x given: Log of sum – hard to optimize analytically! Instead, introduce hidden variable y : x generated by Gaussian k EM formulation: maximize18/31

Gaussian Mixture Model EM Goal: maximize n (observed) data points: n (hidden) labels: : xi generated by Gaussian k Several pages of math later, we get: E step: compute likelihood of M step: update αk, µk, σk for each Gaussian k 1.m19/31

GMM-EM Discussion Summary: EM naturally applicable to trainingprobabilistic models EM is a generic formulation, need to do somehairy math to get to implementation Problems with GMM-EM? Local maxima Need to bootstrap training process (pick a θ ) GMM-EM applicable to enormous number ofpattern recognition tasks: speech, vision, etc. Hours of fun with GMM-EM20/31

Overview Expectation-Maximization Mixture Model Training Learning String Distance21/31

String Edit-Distance Notation: Operate on two strings: Edit-distance: transform one string intoanother using Substitution: kitten Æ bitten, cost Insertion: cop Æ crop, cost Deletion: learn Æ earn, cost Can compute efficiently recursively22/31

Stochastic String Edit-Distance Instead of setting costs, model edit operationsequence as a random process Edit operations selected according to aprobability distribution For edit operation sequence View string edit-distance as memoryless (Markov): stochastic: random process according to δ ( ) isgoverned by a true probability distribution transducer:23/31

Edit-Distance Transducer Arc label a:b/0 means input a, output b andweight 0 Assume24/31

Two Distances Define yield of an edit sequence ν (zn#)as the set of all strings hx,yi such that zn#turns x into y Viterbi edit-distance: negative loglikelihood of most likely edit sequence Stochastic edit-distance: negative loglikelihood of all edit sequences from x to y25/31

Evaluating Likelihood Viterbi: Stochastic: Both require calculation ofpossible edit sequences over allpossibilities (three edit operations) However, memoryless assumption allowsus to compute likelihood efficiently Use the forward-backward method!26/31

Forward Evaluation of forward probabilities:likelihood of picking an edit sequence thatgenerates the prefix pair Memoryless assumption allows efficientrecursive computation:27/31

Backward Evaluation of backward probabilities:likelihood of picking an edit sequence thatgenerates the suffix pair Memoryless assumption allows efficientrecursive computation:28/31

EM Formulation Edit operations selected according to aprobability distribution So, EM has to update δ based onoccurrence counts of each operation(similar to coin-tossing example) Idea: accumulate expected counts fromforward, backward variables γ(z): expected count of edit operation z29/31

EM Details γ(z): expected count of edit operation z e.g,30/31

References A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incompletedata via the EM algorithm, Journal of the Royal Statistical Society B, 39(1), 1977 pp.1-38.C. F. J. Wu, On the Convergence Properties of the EM Algorithm, The Annals ofStatistics, 11(1), Mar 1983, pp. 95-103.F. Jelinek, Statistical Methods for Speech Recognition, 1997M. Collins, The EM Algorithm, 1997J. A. Bilmes, A Gentle Tutorial of the EM Algorithm and its Application to ParameterEstimation for Gaussian Mixture and Hidden Markov Models, Technical Report,University of Berkeley, TR-97-021, 1998E. S. Ristad and P. N. Yianilos, Learning string edit distance, IEEE Transactions onPattern Analysis and Machine Intelligence, 20(2), 1998, pp. 522-532.L.R. Rabiner. A tutorial on HMM and selected applications in speech recognition, InProc. IEEE, 77(2), 1989, pp. 257-286.A. D'Souza, Using EM To Estimate A Probablity [sic] Density With A Mixture OfGaussiansM. Mohri. Edit-Distance of Weighted Automata, in Proc. Implementation andApplication of Automata, (CIAA) 2002, pp. 1-23J. Glass, Lecture Notes, MIT class 6.345: Automatic Speech Recognition, 2003Carlo Tomasi, Estimating Gaussian Mixture Densities with EM – A Tutorial, 2004Wikipedia31/31

Expectation-Maximization Algorithm and Applications Eugene Weinstein Courant Institute of Mathematical Sciences . Can prove is the maximum-likelihood estimate of θ Differentiate with respect to θ, set equal to 0. 5/31 EM Motivation So, to solve any ML-type p

Related Documents:

Optimization with EM and Expectation-Conjugate-Gradient

Expectation Maximization We ﬁrst focus on the analysis of the convergenceproperties of the Expectation-Maximization (EM) algorithm. Con-sider a probabilistic model of observed data x which uses latent variables z. The log-likelihood (objective function

24 Views

2y ago

Yichen Qin Estimation of Mixture Models & Carey E. Priebe ...

Maximum Lq-Likelihood Estimation via the Expectation-Maximization Algorithm: A Robust Estimation of Mixture Models Yichen QIN and Carey E. PRIEBE We introduce a maximum Lq-likelihood estimation (MLqE) of mixture models using our proposed expectation-maximization (EM) al- gorithm, namely the EM algorithm with Lq-likelihood (EM-Lq).

34 Views

2y ago

An Explanation of the Expectation Maximization Algorithm ...

The expectation maximization (EM) algorithm computes maximum likelihood (ML) estimates of unknown parameters in probabilistic models involving latent avriables Z1. An instructive way of thinking about EM is to think of it as a systematic way o

22 Views

2y ago

8. Mixture Models and Expectation-Maximization

Machine Learning for Computer Vision Expectation-Maximization EM is an elegant and powerful method for MLE problems with latent variables Main idea: model parameters and latent variables are estimated iteratively, where average over the latent variables (expectation) A typical exam

22 Views

2y ago

1 Mortaza Jamshidian University of Central Florida Peter M ...

We consider maximum likelihood (ML) estimation of mean and covariance structure models when data are missing. Expectation maximization (EM), generalized expectation maximization (GEM), Fletcher-Powell, and Fisher-scoring algorithm

8 Views

2y ago

GENERAL CONVERGENT EXPECTATION MAXIMIZATION …

2. Expectation Maximization (EM) Iteration. A maximum likelihood (ML) method for image reconstruction based on Poisson data was introduced by Shepp and Vardi [42] in 1982 for image reconstruction in emission tomography. In fact, this algorithm was originally proposed by Richardson [37]

16 Views

2y ago

Mathematical Expectation - Western Washington University

Mathematical Expectation Properties of Mathematical Expectation I The concept of mathematical expectation arose in connection with games of chance. In its simplest form, mathematical expectation is the product of the amount a player stands to win and the probability that the player would win.

30 Views

2y ago

Dynamic characteristics of an automotive fuel cell system ...

investigate the behaviour and transient response of a fuel cell system for automotive applications. Fuel cell dynamics are subjective to reactant flows, heat management and water transportation inside the fuel cell. Therefore, a control-oriented model has been devised in Aspen Plus Dynamics, which accommodates electrochemical, thermal, feed flow and water crossover models in addition to two .

48 Views

3y ago

Recent Views

Novell SUSE Linux Package Description and Support Level .

aspell-eo An Esperanto Dictionary for Aspell L2 aspell-es A Spanish Dictionary for ASpell L2 aspell-et An Estonian dictionary for aspell L2 aspell-fa A Persian dictionary for aspell L2 aspell-fi Finnish Dictionary Package L2 aspell-fo A Faroese Dictionary for ASpell L2 aspell-fr A French Dictionary for ASpell L2 aspell-ga An Irish Dictionary .

2y ago

356 Views

Dictionary of Aviation - THE AIRLINE PILOTS

Dictionary of Accounting 0 7475 6991 6 . Dictionary of Computing 0 7475 6622 4 Dictionary of Economics 0 7136 8203 5 Dictionary of Environment and Ecology 0 7475 7201 1 Dictionary of Food Science and Nutrition 0 7136 7784 8 Dictionary of Human Resources and Personnel Management 0 7136 8142 X

2y ago

168 Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

753 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

506 Views

Oxford and the Dictionary - Oxford English Dictionary

What makes an Oxford Dictionary? People find dictionary-making fascinating. The 250th anniversary of Samuel Johnson’s Dictionary in 2005 was widely celebrated, and the recent BBC television series Balderdash and Piffle had a huge response to its call to viewers to help track down elusive word and phrase or

2y ago

218 Views

Cambridge Essential English Dictionary

These Dictionary Guide Worksheets are downloadable versions of the Guide to the Dictionary presented in the Cambridge Essential English Dictionary, Second Edition. The Guide is designed to help you develop skills in using a dictionary. The worksheets are grouped as five separate units, whi

2y ago

523 Views

The Interactive Arabic Dictionary: Another Collaboratively .

the Interactive Arabic Dictionary” [11], and “Conceptual Design of the Interactive Arabic Dictionary” [12], were the main studies used in HIAST to implement the Interactive dictionary. 2.1. Objectives IAD is a Monolingual dictionary (Arabic-Arabic), targeted to

2y ago

338 Views

Dictionary-guided Scene Text Recognition

A dictionary is an explicit language model, and the ben-eﬁts of a dictionary for scene text recognition are well es-tablished. In most previous works, a dictionary was used to ensure that the output sequence of characters is a legit-imate word from the dictionary, and it improved the accu-r

2y ago

319 Views

Going Online with a German Collocations Dictionary - unibas.ch

dictionary articles on two levels: a minimalistic view for the search and navigation stage and a more detailed view once a collocation is found. Keywords: online dictionary, collocations, dictionary design, learners' dictionary, German language . 1. Introduction Many dictionaries are available on the Web today. However, as yet there areno well-

7m ago

72 Views

A Fault Dictionary-Based Fault Diagnosis Approach for CMOS Analog .

Step 5: Fault dictionary construction: The fault dictionary is a collection of potential faulty and fault-free responses. The signatures obtained will be stored in the dictionary. This dictionary involves for each fault a correspondence between the faulty circuit responses and the defect sites.

4m ago

62 Views

On Entries for Neologisms in English-Chinese Learner's Dictionaries

A New English Chinese Dictionary of Journalism (2007) by Hu Zhiyong, An English -Chinese Dictionary of Neologisms (2009) by Li Mingyi, English-Chinese Neologism Dictionary (2013) by Wu Xuemei, A Dictionary of New Chinese Phrases in English (2015) by China Daily and A Chinese-English Dictionary of New Words and Expressions (2015) by Wu .

4m ago

71 Views

Companies Law - Cayman Islands dollar

Law 1 of 1971-15th December, 1970 Law 7 of 2000- 20th July, 2000 Law 7 of 1973-28th June, 1973 Law 5 of 2001-20th April, 2001 Law 24 of 1974-22nd November, 1974 Law 10 of 2001-25th May, 2001 Law 25 of 1975-9th December, 1975 Law 29 of 2001-26th September, 2001 Law 19 of 1977-10th November, 1977 Law 46 of 2001-14th January, 2002

3y ago

464 Views

It’s the Law!

ciples stated in Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law. Students will be able to explain the application of Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law to observations or events related to SCUBA diving. MateriaLs None audio/visuaL MateriaLs None teachinG tiMe

2y ago

387 Views

WHAT LAW IS ? An Introduction to Law

common law system civil law system!! sources of law in civil law !! a1. primary: statutes (written law) enacted by legislative power are the principal source of law. ! a2. two subsidiary sources of law: ! a2.1 administrative regulations a.2.2 customs!! ! sources of law in common law !!! b1. two primary sources of

2y ago

396 Views

Ross E. Davies, George Mason University School of Law

Jan 15, 2012 · 4. Bryan A. Garner, Preface to the First Pocket Edition of BLACK‘S LAW DICTIONARY, reprinted in BLACK‘S LAW DICTIONARY vii (3d Pocket ed. 2006). Garner is the current editor-in-chief of Black‟s Law Dictionary and (even more surely than was Black in his own time) the most influential contemporary scho-lar of American legal language. 5.

2y ago

305 Views

Expectation- Maximization Algorithm And Applications

It looks like you're using an ad-blocker