8. Mixture Models And Expectation-Maximization

2y ago

21 Views

2 Downloads

8.62 MB

43 Pages

Last View : 8d ago

Last Download : 3m ago

Upload by : Sutton Moon

Report this link

Download PDF

Transcription

Computer Vision GroupProf. Daniel Cremers8. Mixture Models andExpectation-Maximization

Motivation Often the introduction of latent (unobserved)random variables into a model can help to expresscomplex (marginal) distributions A very common example are mixture models, inparticular Gaussian mixture models (GMM) Mixture models can be used for clustering(unsupervised learning) and to express morecomplex probability distributions As we will see, the parameters of mixture modelscan be estimated using maximum-likelihoodestimation such as expectation-maximizationMachine Learning forComputer Vision2Dr. Rudolph TriebelComputer Vision Group

K-means Clustering Given: data set {x1 , . . . , xN }, number of clusters K Goal: find cluster centers {µ1 , . . . , µK } so thatJ N XKXn 1 k 1rnk kxn2µk kis minimal, where rnk 1 if xn is assigned to µk Idea: compute rnk and µk iteratively Start with some values for the cluster centers Find optimal assignments rnk Update cluster centers using these assignments Repeat until assignments or centers don’t changeMachine Learning forComputer Vision3Dr. Rudolph TriebelComputer Vision Group

K-means ClusteringInitialize cluster means:Machine Learning forComputer Vision{µ1 , . . . , µK }4Dr. Rudolph TriebelComputer Vision Group

K-means ClusteringFind optimal assignments:rnk Machine Learning forComputer Vision(10if k arg minj kxnotherwise5µj kDr. Rudolph TriebelComputer Vision Group

K-means ClusteringFind new optimal means:NX@J 2rnk (xn@µkn 1) µk Machine Learning forComputer Vision6!µk ) 0PNn 1 rnk xnPNn 1 rnkDr. Rudolph TriebelComputer Vision Group

K-means ClusteringFind new optimal assignments:rnk Machine Learning forComputer Vision(10if k arg minj kxnotherwise7µj kDr. Rudolph TriebelComputer Vision Group

K-means ClusteringIterate these steps until means andassignments do not change any moreMachine Learning forComputer Vision8Dr. Rudolph TriebelComputer Vision Group

2D Example Real data set Random initializationMachine Learning forComputer Vision Magenta line is “decisionboundary”9Dr. Rudolph TriebelComputer Vision Group

The Cost Function After every step the cost function J is minimized Blue steps: update assignments Red steps: update means Convergence after 4 roundsMachine Learning forComputer Vision10Dr. Rudolph TriebelComputer Vision Group

K-means for SegmentationMachine Learning forComputer Vision11Dr. Rudolph TriebelComputer Vision Group

K-Means: Additional Remarks K-means converges always, but the minimum isnot guaranteed to be a global one There is an online version of K-means! After each addition of xn, the nearest center μk isupdated:oldoldµnew µ (xµn nkkk )The K-medoid variant: Replace the Euclidean distance by a general measureV.N XKXJ rnk V(xn , µk )n 1 k 1Machine Learning forComputer Vision12Dr. Rudolph TriebelComputer Vision Group

Mixtures of Gaussians Assume that the data consists of K clusters The data within each cluster is Gaussian For any data point x we introduce a K-dimensionalbinary random variable z so that:p(x) KXk 1wherezk 2 {0, 1},Machine Learning forComputer Visionp(zk 1) N (x µk , k ) {z } : kKXzk 1k 113Dr. Rudolph TriebelComputer Vision Group

A Simple Example Mixture of three Gaussians with mixing coefficients Left: all three Gaussians as contour plot Right: samples from the mixture model, the redcomponent has the most samplesMachine Learning forComputer Vision14Dr. Rudolph TriebelComputer Vision Group

Parameter Estimation From a given set of training data {x1 , . . . , xN } wewant to find parameters ( 1,.,K , µ1,.,K , 1,.,K )so that the likelihood is maximized (MLE):p(x1 , . . . , xN 1,.,K , µ1,.,K , 1,.,K ) or, applying the logarithm:!!log p(X , µ, ) NXlogn 1N XKYn 1 k 1KXk 1 k N (xn µk , k ) k N (xn µk , k ) However: this is not as easy as maximumlikelihood for single Gaussians!Machine Learning forComputer Vision15Dr. Rudolph TriebelComputer Vision Group

Problems with MLE for Gaussian Mixtures Assume that for one k the mean µk is exactly at adata point xn For simplicity: assume that k 1 Then:2N (xn xn , k I) pD2 !k2kI This means that the overall log-likelihood can bemaximized arbitrarily by letting k ! 0 (overfitting) Another problem is the identifiability: The order of the Gaussians is not fixed, therefore: There are K! equivalent solutions to the MLE problemMachine Learning forComputer Vision16Dr. Rudolph TriebelComputer Vision Group

Overfitting with MLE for Gaussian Mixtures One Gaussian fits exactly to one data point It has a very small variance, i.e. contributesstrongly to the overall likelihood In standard MLE, there is no way to avoid this!Machine Learning forComputer Vision17Dr. Rudolph TriebelComputer Vision Group

Expectation-Maximization EM is an elegant and powerful method for MLEproblems with latent variables Main idea: model parameters and latent variablesare estimated iteratively, where average over thelatent variables (expectation) A typical example application of EM is theGaussian Mixture model (GMM) However, EM has many other applications First, we consider EM for GMMsMachine Learning forComputer Vision18Dr. Rudolph TriebelComputer Vision Group

Expectation-Maximization for GMM First, we define the responsibilities:(znk ) p(znk 1 xn )znk 2 {0, 1}Xznk 1kMachine Learning forComputer Vision19Dr. Rudolph TriebelComputer Vision Group

Expectation-Maximization for GMM First, we define the responsibilities:!(znk ) p(znk 1 xn )! k N (xn µk , k ) PK!j 1! j N (xn µj , j ) Next, we derive the log-likelihood wrt. to µk :@log p(X , µ, ) ! 0@µkMachine Learning forComputer Vision20Dr. Rudolph TriebelComputer Vision Group

Expectation-Maximization for GMM First, we define the responsibilities:!(znk ) p(znk 1 xn )! k N (xn µk , k ) PK!j 1! j N (xn µj , j ) Next, we derive the log-likelihood wrt. to µk :@log p(X , µ, ) ! 0@µkPNand we obtain: µ n 1 (znk )xnPNkn 1 (znk )Machine Learning forComputer Vision21Dr. Rudolph TriebelComputer Vision Group

Expectation-Maximization for GMM We can do the same for the covariances:@log p(X , µ, ) ! 0@ kand we obtain:!! k PNn 1(znk )(xnPNµk )Tµk )(xnn 1(znk ) Finally, we derive wrt. the mixing coefficients k :@log p(X , µ, ) ! 0@ kMachine Learning forComputer Vision22where:KX k 1k 1Dr. Rudolph TriebelComputer Vision Group

Expectation-Maximization for GMM We can do the same for the covariances:@log p(X , µ, ) ! 0@ kand we obtain:!! k PNn 1(znk )(xnPNµk )Tµk )(xnn 1(znk ) Finally, we derive wrt. the mixing coefficients k :@log p(X , µ, ) ! 0@ kand the result is:Machine Learning forComputer Vision1 k N23where:NXKX k 1k 1(znk )n 1Dr. Rudolph TriebelComputer Vision Group

Algorithm Summary1.Initialize means µk covariance matrices k andmixing coefficients k2.Compute the initial log-likelihood log p(X , µ, )3. E-Step. Compute the responsibilities: k N (xn µk , k )(znk ) PKj 1 j N (xn µj , j )4. M-Step. Update the parameters:PNn 1µnew PkNn 1(znk )xn(znk ) new kPNn 1(znk )(xnPNn 1µnewk )(xn(znk )Tµnew)k knewNX1 (znk )N n 15.Compute log-likelihood; if not converged go to 3.Machine Learning forComputer Vision24Dr. Rudolph TriebelComputer Vision Group

The Same Example AgainMachine Learning forComputer Vision25Dr. Rudolph TriebelComputer Vision Group

Observations Compared to K-means, points can now belong toboth clusters (soft assignment) In addition to the cluster center, a covariance isestimated by EM Initialization is the same as used for K-means Number of iterations needed for EM is much higher Also: each cycle requires much more computation Therefore: start with K-means and run EM on theresult of K-means (covariances can be initialized tothe sample covariances of K-means) EM only finds a local maximum of the likelihood!Machine Learning forComputer Vision26Dr. Rudolph TriebelComputer Vision Group

A More General View of EM Assume for a moment that we observe X and thebinary latent variables Z. The likelihood is then:p(X, Z , µ, ) where p(zn ) p(xn zn , µ, ) NYn 1KYp(zn )p(xn zn , µ, )znk kRemember:K!Xznk 2 {0, 1},z 1! k 1 nkandk 1KYk 1N (xn µk , k )znkwhich leads to the log-formulation:log p(X, Z , µ, ) Machine Learning forComputer VisionN XKXn 1 k 1znk (log k log N (xn µk , k ))27Dr. Rudolph TriebelComputer Vision Group

The Complete-Data Log-Likelihoodlog p(X, Z , µ, ) N XKXn 1 k 1znk (log k log N (xn µk , k )) This is called the complete-data log-likelihood Advantage: solving for the parameters ( k , µk , k )is much simpler, as the log is inside the sum! We could switch the sums and then for everymixture component k only look at the points thatare associated with that component. This leads to simple closed-form solutions for theparameters However: the latent variables Z are not observed!Machine Learning forComputer Vision28Dr. Rudolph TriebelComputer Vision Group

The Main Idea of EM Instead of maximizing the joint log-likelihood, wemaximize its expectation under the latent variabledistribution:EZ [log p(X, Z , µ, )] N XKXn 1 k 1EZ [znk ](log k log N (xn µk , k ))where the latent variable distribution per point is:p(xn zn , )p(zn )p(zn xn , ) p(xn ) Machine Learning forComputer VisionQK ( , µ, )znll 1 ( l N (xn µl , l ))PKj 1 j N (xn µj , j )29Dr. Rudolph TriebelComputer Vision Group

The Main Idea of EMThe expected value of the latent variables is:E[znk ] (znk )!plugging in we obtain:!EZ [log p(X, Z , µ, )] !N XKXn 1 k 1(znk )(log k log N (xn µk , k ))We compute this iteratively:1. Initialize i 0, ( ki , µik , ik )2. Compute E[znk ] (znk )i 1i 1i 13. Find parameters( k , µk , k ) that maximize this4. Increase i; if not converged, goto 2.Machine Learning forComputer Vision30Dr. Rudolph TriebelComputer Vision Group

The Theory Behind EM We have seen that EM maximizes the expectedcomplete-data log-likelihood, but: Actually, we need to maximize the log-marginal!log p(X ) log!XZp(X, Z ) It turns out that the log-marginal is maximizedimplicitly!Machine Learning forComputer Vision31Dr. Rudolph TriebelComputer Vision Group

The Theory Behind EM We have seen that EM maximizes the expectedcomplete-data log-likelihood, but: Actually, we need to maximize the log-marginallog p(X ) log!!XZp(X, Z ) It turns out that the log-marginal is maximizedimplicitly!log p(X ) L(q, ) KL(qkp)L(q, ) XZp(X, Z )q(Z) logq(Z)Machine Learning forComputer VisionKL(qkp) XZ32p(Z X, )q(Z) logq(Z)Dr. Rudolph TriebelComputer Vision Group

Visualization The KL-divergence is positive or 0 Thus, the log-likelihood is at least as large as L or: L is a lower bound of the log-likelihoodMachine Learning forComputer Vision33Dr. Rudolph TriebelComputer Vision Group

What Happens in the E-Step? The log-likelihood is independent of q Thus: L is maximized iff KL is minimal This is the case iff q(Z) p(Z X, )Machine Learning forComputer Vision34Dr. Rudolph TriebelComputer Vision Group

What Happens in the M-Step? In the M-step we keep q fixed and find new !L(q, ) XZp(Z X, old) log p(X, Z )Xq(Z) log q(Z)Z We maximize the first term, the second is indep. This implicitly makes KL non-zero The log-likelihood is maximized even more!Machine Learning forComputer Vision35Dr. Rudolph TriebelComputer Vision Group

Visualization in Parameter-Space In the E-step we compute the concave loweroldbound for given old parameters (blue curve) In the M-step, we maximize this lower bound andnewobtain new parameters This is repeated (green curve) until convergenceMachine Learning forComputer Vision36Dr. Rudolph TriebelComputer Vision Group

Variants of EM Instead of maximizing the log-likelihood, we canuse EM to maximize a posterior when a prior isgiven (MAP instead of MLE) less overfitting In Generalized EM, the M-step only increases thelower bound instead of maximization (useful ifstandard M-step is intractable) Similarly, the E-step can be generalized in that theoptimization wrt. q is not complete Furthermore, there are incremental versions of EM,where data points are given sequentially and theparameters are updated after each data point.Machine Learning forComputer Vision37Dr. Rudolph TriebelComputer Vision Group

Example 1: Learn a Sensor Model A Radar range finder on a metallic target willreturns 3 types of measurement: The distance to target The distance to the wall behind the target A completely random valueMachine Learning forComputer Vision38Dr. Rudolph TriebelComputer Vision Group

Example 1: Learn a Sensor Model Which point corresponds to from which model?What are the different model parameters?Solution: Expectation-MaximizationMachine Learning forComputer Vision39Dr. Rudolph TriebelComputer Vision Group

Example 2: Environment Classification From each image, the robot extractsfeatures: points in nD spaceK-means only finds the clustercenters, not their extent and shapeThe centers and covariances canbe obtained with EMMachine Learning forComputer Vision40Dr. Rudolph TriebelComputer Vision Group

Example 3: Plane Fitting in 3D Has been done in this paper Given a set of 3D points, fit planes into the data Idea: Model parameters are normal vectors anddistance to origin for a set of planes Gaussian noise model: p(z ) N (d(z, ) 0, )!point-to-planedistancenoisevariance Introduce latent correspondencevariables Cij and maximize the expected log-lik.:E[log p(Z, C )]! Maximization can be done in closed formMachine Learning forComputer Vision41Dr. Rudolph TriebelComputer Vision Group

Example 3: Plane Fitting in 3DMachine Learning forComputer Vision42Dr. Rudolph TriebelComputer Vision Group

Summary!!!!!!K-means is an iterative method for clusteringMixture models can be formalized using latent(unobserved) variablesA very common example are Gaussian mixturemodels (GMMs)To estimate the parameters of a GMM we canuse expectation-maximization (EM)In general EM can be interpreted as maximizinga lower bound to the complete-data loglikelihoodEM is guaranteed to converge, but it may runinto local maximaMachine Learning forComputer Vision43Dr. Rudolph TriebelComputer Vision Group

Machine Learning for Computer Vision Expectation-Maximization EM is an elegant and powerful method for MLE problems with latent variables Main idea: model parameters and latent variables are estimated iteratively, where average over the latent variables (expectation) A typical exam

Related Documents:

Yichen Qin Estimation of Mixture Models & Carey E. Priebe ...

Maximum Lq-Likelihood Estimation via the Expectation-Maximization Algorithm: A Robust Estimation of Mixture Models Yichen QIN and Carey E. PRIEBE We introduce a maximum Lq-likelihood estimation (MLqE) of mixture models using our proposed expectation-maximization (EM) al- gorithm, namely the EM algorithm with Lq-likelihood (EM-Lq).

34 Views

2y ago

Mathematical Expectation - Western Washington University

Mathematical Expectation Properties of Mathematical Expectation I The concept of mathematical expectation arose in connection with games of chance. In its simplest form, mathematical expectation is the product of the amount a player stands to win and the probability that the player would win.

29 Views

2y ago

Gaussian Mixture Models (GMM)

Machine Learning, Murphy, 2012. MLE for mixtures is difficult Reason 1: The algebraic form is more complex The mixture log likelihood cannot be simplified . Machine Learning: A probabilistic perspective, Murphy, 2012. EM algorithm for Gaussian mixture models Expectation step

8 Views

1y ago

Unit IV “Mixtures” and Unit V “Matter”

mixture of these substances. The mass percent composition of the mixture can be . The heterogeneous mixture is a mixture of sand (silicon dioxide, SiO 2), salt (sodium chloride, NaCl), and iron filings. Your goal is to separate the dyes of the M&M and to separate the solid mixture and determine it

14 Views

2y ago

Read the information provided within this booklet Answer ...

1. Write down three mixtures you can find in your home. a) b) c) 2. How is a mixture different to a pure substance? 3. Match the following mixtures with their type of mixture. Mixture Type of mixture Smoke Mixture of gases L

21 Views

2y ago

Recurring Decimals GCSE Mark Score (%) Edexcel Mathematics

Grade (9-1) _ 58 (Total for question 1 is 4 marks) 2. Write ̇8̇ as a fraction in its simplest form. . 90. 15 blank Find the fraction, in its

74 Views

2y ago

Precept Seven: Mixture Models and the EM Algorithm

In ation models as mixture models IIn precept 6, Ian took us through an example of a zero-in ated negative binomial (if a logit and negative binomial model mated and gave birth to a zinb!) I Example: count of male ’satellites’ around a female horseshoe crab I Counts are a mixture of two processes, each modeled using a di erent distribution: I Whether or not the female attracts any .

12 Views

2y ago

Oregon Standards for English Language Arts Grade 2

Oregon English Language Arts and Literacy Standards Grade 2 Standards June 2019 * Denotes a revision has been made to the original Common Core State Standard. 255 Capitol St NE, Salem, OR 97310 503-947-5600 1 . Oregon achieves . . . together! Grade 2 Introduction to the Oregon Standards for English Language Arts and Literacy Preparing Oregon’s Students When Oregon adopted the Common Core .

120 Views

3y ago

Recent Views

Grammar as a Foreign Language - List of Proceedings

Grammar as a Foreign Language Oriol Vinyals Google vinyals@google.com Lukasz Kaiser Google lukaszkaiser@google.com Terry Koo Google terrykoo@google.com Slav Petrov Google slav@google.com Ilya Sutskever Google ilyasu@google.com Geoffrey Hinton Google geoffhinton@google.com Abstract Synta

2y ago

445 Views

Attention is All you Need - NIPS

Google Brain avaswani@google.com Noam Shazeer Google Brain noam@google.com Niki Parmar Google Research nikip@google.com Jakob Uszkoreit Google Research usz@google.com Llion Jones Google Research llion@google.com Aidan N. Gomezy University of Toronto aidan@cs.toronto.edu Łukasz Kaiser Google Brain lukaszkaiser@google.com Illia Polosukhinz illia .

1y ago

303 Views

GSA Implementation of Google (G) Suite

Google Meet Classic Hangouts Google Chat Google Calendar Google Drive and Shared Drive Google Docs Google Sheets Google Slides Google Forms Google Sites Google Keep Apps Script D

2y ago

316 Views

Google Drive (Google Docs, Google Sheets, Google Slides)

Google Drive (Google Docs, Google Sheets, Google Slides) Employees are automatically issued a Kyrene Google account. Navigate to drive.google.com. Use Kyrene email address and network password to login. Launch in Chrome browser for best experience. Google Drive is a cloud storage sys

2y ago

388 Views

Quick Guide of Using Google Home to Control Smart Devices

Configuration needs Google Home app. Search "Google Home" in App Store or Google Play to install the app. 3.1 Set up Google Home with Google Home app You can skip this part if your Google Home is already set up. 1. Make sure your Google Home is energized. 2. Open the Google Home app by tapping the app icon on your mobile device. 3.

1y ago

326 Views

Elaboração de Provas Online usando o Formulário Google Docs

2 Após o login acesse o Google Drive ou o Google Docs e selecione a ferramenta Google Forms (Formulários). Clique na caixa de Ferramentas do Google, localizada no canto direito superior da tela e selecione o Google Drive. Na tela do Google Drive clique em New , opção More e selecione Google Forms. OBS: É possível acessar o google

10m ago

123 Views

ACS WASC Templates

File upload, Folder upload, Google Docs, Google Sheets, or Google Slides. You can also create Google Forms, Google Drawings, Google My Maps, etc. Share with exactly who you want — without email attachments. Search or sort your list of files, folders, and Google Docs. Preview files and Google Docs.

2y ago

366 Views

Google Drive - San Bernardino City Unified School District

Google Apps All of the Google applications that are available upon logging into Google.com (G , Gmail, Gphotos, Gdrive, etc.). Google Suite Google’s online cloud based office companion applications (Docs, Sheets, Slides). Google Drive Google’s online cloud storage and file sharing/collaboration application.

2y ago

378 Views

Single Sign On for Google Apps with NetScaler Unified Gateway

Google Apps for Work is a suite of cloud computing productivity and collaboration applications provided by Google on a subscription basis. It includes Google’s popular web applications including Gmail, Google Drive, Google Hangouts, Google Calendar and Google

2y ago

295 Views

Serviceteil

Google 84, 87, 124 Google 110 Google AdWords 101, 103 Google Alerts 127 Google Analytics 89 Google Maps 100, 110, 173 Google-Maps 63 Google Places 100, 103, 124 Graphiken 66 H Haftung 170 Haftungsausschluss 72 Hausfarbe 11 Headline 35 Heilmittelwerbegesetz 14, 69, 163 Heilversprechen 164 HONcode 78 HTML 58 HWG 31 I Imagefilm 31

2y ago

336 Views

Best practices for managing identities when you move to Google Cloud

Google Cloud. To provide t he informat ion an organizat ion would ne e d to transfer data and ownership from one Google Account to anot her for s ome of t he noncore Google s er vice s, such as Google Ads, Google Analyt ics, or DV360. Intende d audience Organizat ion administrators. Sta planning Google Cloud / Google Wor kspace migrat ion. Key .

1y ago

481 Views

MANAGERIAL FINANCE - GBV

of Managerial Finance page 2 Introduction to Managerial Finance 1 Starbucks—A Taste for Growth page 3 1.1 Finance and Business What Is Finance? 4 Major Areas and Opportunities in Finance 4 Legal Forms of Business Organization 5 Why Study Managerial Finance? Review Questions 9 1.2 The Managerial Finance Function 9 Organization of the Finance

3y ago

6.8K Views

Chapter 1 The roles of finance function in organisations

The roles of the finance function in organisations 4. The role of ethics in the role of the finance function Ethics is the system of moral principles that examines the concept of right and wrong. Ethics underpins an organisation’s sustained value creation. The roles that the finance function performs should be carried out in an .File Size: 888KBPage Count: 10Explore furtherRole of the Finance Function in the Financial Management .www.managementstudyguide.c Roles and Responsibilities of a Finance Department in a .www.pharmapproach.comRoles and Responsibilities of a Finance Department .www.smythecpa.comTop 10 – Functions of Business Finance in an om23 Functions and Duties of Accounting and Finance nded to you b

1y ago

335 Views

Introduction - Google Earth User Guide

Google Earth Community: Learn from other Google Earth users by asking questions and sharing answers on the Google Earth Community forums. Using Google Earth: This blog describes how you can use some of the interesting features of Google Earth. Selecting a Server Note: This section is relevant to Google Earth Pro and EC users.

3y ago

288 Views

Using Google Forms to Manage Officials Signups

Google Sheets, deleting a response from the form or sheet will not affect the other. Once the Google Form is linked to a Google Sheet, clicking on the spreadsheet icon will open the linked Google Sheet. Google Responses Sheet Google automatically creates and populates the sp

2y ago

276 Views

8. Mixture Models And Expectation-Maximization

It looks like you're using an ad-blocker