Expectation Maximization - UConn Health

2y ago
10 Views
2 Downloads
624.38 KB
51 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Ronan Garica
Transcription

Expectation Maximization

Expectation-MaximizationE-MIf the underlying governing pdf is known only inits general form, and there may or may not bemissing data as well, we need E-M– To reconstruct the underlying pdf– To find missing data based on the underlying pdf

Example: Missing NormallyDistributed Data We have 4 data points from a Gaussiandistribution with unit variance. 2 datapoints are missing.––––511xx

We need to infer the parameters of the densityfrom which the observations were drawn. We are given one of the parameters in thiscase,variance 1 We seek the most likely for the Gaussian densityfunction; i.e. we seek ML Once we can find the mean of the underlyingGaussian distribution ( ML ), we can generate themissing data

We can take advantage of properties ofGaussian densities to make life easier– The value (arg) of that yields the least squareerror m(x )ii 1is simply the sample mean1n nxi 1 i2

So, to maximize the likelihoodestimate of the parameter ML itis necessary and sufficient to usethe sample mean for j ML arg min i 1 ( xi )m 2

Strategy Guess the parameter Expectation: Find the expected data, giventhe parameter Maximization: Find the most likelyparameter, given the data, by finding thesample mean Repeat and converge to a solution for ML

Example: Missing NormallyDistributed Data The first step is to guess at a mean for the pdf.Guess 0. The expectation of a normallydistributed value is the mean. (expectationstep) 511xx

Now minimize the error of the data estimate by using theexpected mean (maximization step)51100Now re-estimate the mean by using the data(expectation step)(5 11 0 0)/4 4

Now minimize the error of the data estimate by using thenew expected mean (maximization step)51144Now re-estimate a new mean by using the data(expectation step)(5 11 4 4)/4 6

Now minimize the error of the data estimate by using thenew expected mean (maximization step)51166Now re-estimate a new mean by using the data(expectation step)(5 11 6 6)/4 7.5

Now minimize the error of the data estimate by using thenew expected mean (maximization step)5117.57.5Now re-estimate a new mean by using the data(expectation step)(5 11 7.5 7.5)/4 7.75

ML values for subsequentiterations 7.75 7.875 7.9375 7.96875 7.984375 7.992188 7.996094 7.998047 7.999023 7.999512 7.999756The iteration isguaranteed toconverge to alocal minimum!

Types of Problems for E-M Parametric– We need to find the parameters of the pdf Non Parametric (Most Common)– We need to find pdf itself

Parametric ProblemExample(After Mitchell, Machine Learning)

Parametric ExampleThere are 2 Gaussian processes, mixing.We don’t see the processes nor do we knowthe parameters. (In the example, forsimplicity we will assume unit variance foreach, so only the means are the unknownparameters)We have observations, but we don’t knowabout the two specific processes to whichthe observations belong

EXAMPLE:GAUSSIAN MIXTURE

Source AunknownSource BunknownObservations from Sources A and B

We need to infer theparameters of the densities fromwhich the observations weredrawn

EMTypically, we have data which are generated bymultiple, say WLOG, 2, Gaussian probabilitydensities, A and B.The data are a1,a2,a3 .,b1,b2 ., but the difficulty isthat we only observe the unlabeled data x1,x2,x3 .,where the random variable x could be either an a orabWe need to figure out the parameters of the entiresystem which parameters of A and B that are mostlikely to have generated the data. If the system isGaussian, we are looking for and 2 for each of thegenerating densities.

EM: The Problem StatementAs stated, we are give theobservations x1,x2,x3 where the x areunlabeled data from A,B and we are lookingfor the parameter(s) of Then the labels that associate x with either aor b are the hidden data.

A MysteryWe don’t know the labelsBut.If we knew the parameters of each of themother densities, we could calculate theprobability that each x came from A and theprobability that it came from B, and choosethe more probable.

A SolutionWe have only the data. How can we knowwhat’s what?Answer: Assume there are labels for each datapoint xi. These labels, zi,1,zi2,.zi,j, areassociated with all j mother densities (inthis case, j 2), each telling the probabilitythat the jth density generated the data point

Expectation MaximizationSo We need a 2 step process1. Determine the expectations of the hiddenvariables, given the parametersp(Data Parameters)2. Determine the best parameters, given thedatap(Parameters Data)This is the likelihood of the data and wewish to maximize this likelihood

EM: In Summary We need to optimize the expressionP(Obs,Hidden )Given onlyP(Obs )by choosing the best

Getting StartedWhile we don’t know the initial values of theparameters, we can guessIf the guess is reasonable, this process can beshown to converge to a local maximumlikelihood (Dempster,Laird,Rubin)

EXAMPLEThe Gaussian Mixtures alreadypresentedLet the data set consist of a set of points[(xi,z1i,z2i)]Here, the observed data point is x and the hiddendata are the z’s. The zi,j represent the probabilityth densitythat xi came from the jExample after Mitchell, Machine Learning

Given 2 processes A, B (W.L.O.G.)Probability that x was generated froma process with mean AExpectation zA Probability that x was generated fromeither process with mean A or processwith mean B

Q:What is the probability of anyxi given process ( , 2) j?Answer for thisparticular model(Gaussian Mixture):p( xi ) 12 e2j 12 j(x )ij22

EXPECTATION StepE[ zi , j ] p( x xi j ) 2p(x x )inn 1Notice we are normalizing

Since it is a GaussianDistribution, we can define theprobabilities conciselyE[ zi , j ] e 12 n 1 e22(x )j2 i 12 2(x )n2 iThe denominator effectively ‘normalizes’ thenumerator, making it a probability measure

MAXIMIZATION STEPSo, the Maximum Likelihood estimate of the jthparameter (j th mean) is the normalized sum of the valueof each sample weighted by the probability of eachsample j nE[z]xi,jii 11 nx i 1 in

MAXIMIZATION STEP-explainedFor the the jth parameter (j th mean) : j Expectation of eachdata point using the jthparameternE[z]xi,jii 11 nx i 1 inSum of theexpectationsMean of theobserved data

DEMOhttp://bioinformatics.uchc.edu/Bioinformatics tools/EMdemo.aspHistogram60This is a singlesample of 1000points drawn fromthe random numbergenerator used in quency40

Non ParametricExampleAfter Karp,R University of Washington

The TaskWe observe blood types in a bunch of people– Theses types are A,B, AB, and O– (the Blood types determined by the Blood Bank are thephenotypes)– The phenotypes are the observed dataThe Task:Infer the frequencies ( ie a discrete pdf) of the bloodtype alleles A,B and O, using known principles ofgenetics, by means of the hidden data

What’s missing?We need to know the genotypes whichunderlie the phenotypic expression– The possible genotypes are AA, AO,OA,AB,BB, BO,OB,OO– These genotypes are the hidden data

Experiment: ObservePhenotypes Determine the blood type of 30 peopleSample:––––Type AType BType ABType O162111

Create a matrix for the alleleprobabilitiesStart with a guess at the probabilities‘left’ allele p( A) p(B) p(O) ‘right’alleleInitial guessp( A) .4 .4 p( B) .2 .2 p(O) .4 .4 The goal is to refine the guess into a most likelyestimate

Now create a matrix for thehidden data (the genotypes)based on the first guess of theallele probabilitiesAAAOOABBBOOBABBAOO .16 .16 .16 prob of genotypes determining phenotype A .04.08.08prob of genotypes determining phenotype B H prob of genotypes determining phenotype AB.08 .08 .16 prob of genotypes determining phenotypeO

EXPECTATIONNORMALIZE each row so that the rowentries represent a probability.Specifically, each entry in a row representsthe probability that the entry’s genotype(column heading) led to the entry’sphenotype (row heading).

Matrix normalizedAAAOOABBBOOBABBAOO .33 .33 .33 .2.4.4 H .5 .5 .16 .08.16 .16 .161 .04 .08 .08

MAXIMIZATION Now we wish to recover the probabilities of thealleles being drawn from the population.(Remember this was the goal) The Probability matrix was set up as a 2 columnmatrix– Column 1 was the probability of recovering the ‘left’allele from the population*– Column 2 was the probability of recovering the ‘right’allele from the population*In this particular example, the left and right allele probabilities are equal

MAXIMIZATIONNow, ask the question: Which entries havecontributed to a specific left allele, say, A?(We will ask and answer the same question for allalleles, both left and right)Answer: For the ‘left’ A, it is row 1 (A phenotype) col 1(genotype AA) row 1 (A phenotype) col 2 (genotype AO), row 3 (AB phenotype), col 7 (AB genotype)

AAAOOABBBOOBABBAOO .33 .33 .33 .2.4.4 H .5 .5 1. ABABO

MAXIMIZATION So, compute the probability of the Left Aallele as follows:– Sum of entries in row 1 (A phenoype).333 .333 .666There are 16 individuals with phenotype A– Sum of entries in row 3 (AB phenotype) .5There is one individual with phenotype AB

MAXIMIZATION P(Aleft) .667 16 0.5 1 11.06 Do the same calculation for Aright, for Bright,for for Oright, and for Oleft

Continuing the calculation for all alleles .LEFTRIGHT11.0611.061.71.716.7516.75ABOthen normalizing .ABOLEFT RIGHT0.375 0.3750.058 0.0580.568 0.568

MAXIMIZATIONWe thus recover the ‘new’ allele probability matrixThis was our goal p( A) p(B) p(O) p( A) .375 .375 p( B) .058 .058 p(O) .568 .568

EMNow, use the new Allele ProbabilityMatrix andITERATE!Quit when it .edu/Bioinformatics tools/EMDemo alleles.html

Many applications of EM, but we are interestedin three. Learning transition probabilities in a Hidden MarkovModel, given only the emissions (observations) astraining data. There is an efficient (polynomial)special-case algorithm, the Baum-Welch Algorithm,exploiting the Markov structure of the HMM . Learning a most likely motif subsequence given manysequences of data, each of which contains a codingsubsequence for the protein function of interest. Thealgorithm is called the Multiple-SequenceExpectation-Maximization Motif Elicitation (MEME) Unsupervised cluster discovery

MAXIMIZATION STEP 1 , 1 [] 1 n i i j i j n i i E z x x n P m So, the Maximum Likelihood estimate of the jth para

Related Documents:

UConn School of Business, Room 225, Storrs To access the VPN through the web interface, go to https://vpn.uconn.edu Wireless Access Uconn provides wireless service to anyone with a University NetID. The Wireless ID is UCONN-SECURE. To learn how to access UCONN-SECURE from any

UConn Trails There are trails on the edge of the UConn campus that provide an opportunity to observe woodland birds and native plants. The 580-acre Fenton Tract is the largest contiguous parcel of the UConn Forest. It is located east of campus along the Fenton River. UConn Forest Trail Map. Upcoming Webinars. CLEAR 2021 Webinar Series UCONN

at magazine.uconn.edu, email me at lisa.stiepock@uconn.edu, and send by regular mail to UConn Magazine Letters, 34 N. Eagleville Rd., Storrs, CT 06268-3144. LETTERS Cliff I have not returned to UConn since Robinson Your story on the tragic passing of Cliff Robinson certainly evoked some memories of what seems like an era long since passed.

Expectation Maximization We first focus on the analysis of the convergenceproperties of the Expectation-Maximization (EM) algorithm. Con-sider a probabilistic model of observed data x which uses latent variables z. The log-likelihood (objective function

Machine Learning for Computer Vision Expectation-Maximization EM is an elegant and powerful method for MLE problems with latent variables Main idea: model parameters and latent variables are estimated iteratively, where average over the latent variables (expectation) A typical exam

recruiting players. A .PDF of the expense forms from UConn is here . Although on the budget forms UConn submitted to the NCAA and the Federal Department of Education, the Huskies claim 247,404 in recruiting expenses for basketball. Comparatively, the UConn women's basketball

Oct 13, 2016 · UCONN MAGAZINE MA GAZINE.UCONN.EDU SPRING 2017. Teams of UConn students worked to solve puzzles and “Escape the Expected” as part of a game that saw Ratcliffe Hicks Arena transformed into rooms from hit . HBO shows. Competitors had just six minutes to find hidden codes and uncove

BEC HIGHER PART TWO Questions 2 – 4 4 Write an answer to one of the questions 2 – 4 in this part. Write 200 – 250 w ords on pages 5 and 6. Write the question number in the box at the top of page 5. Question 2 Your manager is keen to introduce new practices into your company. He has asked you to write a report which includes .