Maximum Likelihood (ML), Expectation Maximization (EM)

2y ago
13 Views
2 Downloads
2.90 MB
46 Pages
Last View : 2m ago
Last Download : 3m ago
Upload by : Aliana Wahl
Transcription

Maximum Likelihood (ML),Expectation Maximization (EM)Pieter AbbeelUC Berkeley EECSMany slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

Outlinen Maximum likelihood (ML)n Priors, and maximum a posteriori (MAP)n Cross-validationn Expectation Maximization (EM)

Thumbtackn Let µ P(up), 1-µ P(down)n How to determine µ ?n Empirical estimate: 8 up, 2 down à

n http://web.me.com/todd6ton/Site/Classroom Blog/Entries/2009/10/7 A Thumbtack Experiment.html

Maximum Likelihoodn µ P(up), 1-µ P(down)n Observe:n Likelihood of the observation sequence depends on µ:n Maximum likelihood findsà extrema at µ 0, µ 1, µ 0.8à Inspection of each extremum yields µML 0.8

Maximum Likelihoodn More generally, consider binary-valued random variable with µ P(1), 1-µ P(0), assume we observe n1 ones, and n0 zerosn Likelihood:n Derivative:n Hence we have for the extrema:n n n1/(n0 n1) is the maximum empirical counts.

Log-likelihoodn The functionis a monotonically increasing function of xn n n Hence for any (positive-valued) function f:In practice often more convenient to optimize the loglikelihood rather than the likelihood itselfExample:

Log-likelihood ß à Likelihoodn Reconsider thumbtacks: 8 up, 2 downn LikelihoodNot Concaven n n log-likelihoodConcaveDefinition: A function f is concave if and onlyConcave functions are generally easier to maximize thennon-concave functions

Concavity and Convexityf is convex if and onlyf is concave if and onlyx1 x2 (1- )x2“Easy” to maximizex2x1 x2 (1- )x2“Easy” to minimizex2

ML for Multinomialn Consider having received samples

ML for Fully Observed HMMn Given samplesn Dynamics model:n Observation model:à Independent ML problems for eachand each

ML for Exponential DistributionSource: wikipedian Consider having received samplesn 3.1, 8.2, 1.7ll

ML for Exponential DistributionSource: wikipedian Consider having received samplesn

Uniformn Consider having received samplesn

ML for Gaussiann Consider having received samplesn

ML for Conditional GaussianEquivalently:More generally:

ML for Conditional Gaussian

ML for Conditional Multivariate Gaussian

Aside: Key Identities for Derivation onPrevious Slide

ML Estimation in Fully ObservedLinear Gaussian Bayes Filter Settingn Consider the Linear Gaussian setting:n Fully observed, i.e., givenn à Two separate ML estimation problems for conditionalmultivariate Gaussian:n 1:n 2:

Priors --- Thumbtackn Let µ P(up), 1-µ P(down)n How to determine µ ?n ML estimate: 5 up, 0 down à n Laplace estimate: add a fake count of 1 for each outcome

Priors --- Thumbtackn Alternatively, consider µ to be random variablen Prior P(µ) / µ(1-µ)n Measurements: P( x µ )n Posterior:n Maximum A Posterior (MAP) estimationn find µ that maximizes the posteriorà

Priors --- Beta DistributionFigure source: Wikipedia

Priors --- Dirichlet Distributionn Generalizes Beta distributionn MAP estimate corresponds to adding fake counts n1, , nK

MAP for Mean of Univariate Gaussiann Assume variance known. (Can be extended to also find MAP for variance.)n Prior:

MAP for Univariate Conditional LinearGaussiann n Assume variance known. (Can be extended to also find MAPfor variance.)Prior:[Interpret!]

MAP for Univariate Conditional LinearGaussian: ExampleTRUE --Samples .ML --MAP ---

Cross Validationn Choice of prior will heavily influence quality of resultn Fine-tune choice of prior through cross-validation:n 1. Split data into “training” set and “validation” setn 2. For a range of priors,n n n 3. Choose prior with highest validation scoren n Train: compute µMAP on training setCross-validate: evaluate performance on validation set by evaluatingthe likelihood of the validation data under µMAP just foundFor this prior, compute µMAP on (training validation) setTypical training / validation splits:n n 1-fold: 70/30, random split10-fold: partition into 10 sets, average performance for each of the sets being thevalidation set and the other 9 being the training set

Outlinen Maximum likelihood (ML)n Priors, and maximum a posteriori (MAP)n Cross-validationn Expectation Maximization (EM)

Mixture of Gaussiansn Generally:n Example:n ML Objective: given data z(1), , z(m)n Setting derivatives w.r.t. µ, µ, § equal to zero does not enable to solvefor their ML estimates in closed formWe can evaluate function à we can in principle perform local optimization. In this lecture: “EM” algorithm,which is typically used to efficiently optimize the objective (locally)

Expectation Maximization (EM)n Example:n Model:n Goal:n n n n Given data z(1), , z(m) (but no x(i) observed)Find maximum likelihood estimates of µ1, µ2EM basic idea: if x(i) were known à two easy-to-solve separate MLproblemsEM iterates overn n E-step: For i 1, ,m fill in missing data x(i) according to what is mostlikely given the current model µM-step: run ML for completed data, which gives new model µ

EM Derivationn EM solves a Maximum Likelihood problem of the form:µ: parameters of the probabilistic model we try to findx: unobserved variablesz: observed variablesJensen’s Inequality

Jensen’s inequalityIllustration:P(X x1) 1- ,P(X x2) x1x2E[X] x2 (1- )x2

EM Derivation (ctd)Jensen’s Inequality: equality holds whenis an affinefunction. This is achieved forEM Algorithm: Iterate1. E-step: Compute2. M-step: ComputeM-step optimization can be done efficiently in most casesE-step is usually the more expensive stepIt does not fill in the missing data x with hard values, but finds a distribution q(x)

EM Derivation (ctd)n n n M-step objective is upperbounded by true objectiveM-step objective is equalto true objective atcurrent parameterestimateà Improvement in true objective is at least as large asimprovement in M-step objective

EM 1-D Example --- 2 iterationsn Estimate 1-d mixture of two Gaussians with unit variance:n n one parameter µ ; µ1 µ - 7.5, µ2 µ 7.5

EM for Mixture of Gaussiansn X Multinomial Distribution, P(X k ; µ) µkn Z N(µk, §k)n Observed: z(1), z(2), , z(m)

EM for Mixture of Gaussiansn E-step:n M-step:

ML Objective HMMn Given samplesn Dynamics model:n Observation model:n ML objective:à à No simple decomposition into independent ML problems foreachand eachNo closed form solution found by setting derivatives equal to zero

EM for HMM --- M-stepn à µ and computed from “soft” counts

EM for HMM --- E-stepn No need to find conditional full jointn Run smoother to find:

ML Objective for Linear Gaussiansn Linear Gaussian setting:n Givenn ML objective:n EM-derivation: same as HMM

EM for Linear Gaussians --- E-Stepn Forward:n Backward:

EM for Linear Gaussians --- M-step[Updates for A, B, C, d. TODO: Fill in once found/derived.]

EM for Linear Gaussians --- TheLog-likelihoodn When running EM, it can be good to keep track of the loglikelihood score --- it is supposed to increase every iteration

EM for Extended Kalman Filter Settingn n As the linearization is only an approximation, whenperforming the updates, we might end up with parametersthat result in a lower (rather than higher) log-likelihoodscoreà Solution: instead of updating the parameters to the newlyestimated ones, interpolate between the previous parametersand the newly estimated ones. Perform a “line-search” tofind the setting that achieves the highest log-likelihood score

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAAAA!File Size: 2MB

Related Documents:

Maximum Lq-Likelihood Estimation via the Expectation-Maximization Algorithm: A Robust Estimation of Mixture Models Yichen QIN and Carey E. PRIEBE We introduce a maximum Lq-likelihood estimation (MLqE) of mixture models using our proposed expectation-maximization (EM) al- gorithm, namely the EM algorithm with Lq-likelihood (EM-Lq).

Likelihood Expectation–Maximization abstract Maximum-likelihood (ML) estimation has very desirable properties for reconstructing 3D volumes from noisy cryo-EM images of single macromolecular particles. Current implementations of ML estimation make use of the Expectation–Maximization

Expectation Maximization We first focus on the analysis of the convergenceproperties of the Expectation-Maximization (EM) algorithm. Con-sider a probabilistic model of observed data x which uses latent variables z. The log-likelihood (objective function

Expectation-Maximization Algorithm and Applications Eugene Weinstein Courant Institute of Mathematical Sciences . Can prove is the maximum-likelihood estimate of θ Differentiate with respect to θ, set equal to 0. 5/31 EM Motivation So, to solve any ML-type p

The expectation maximization (EM) algorithm computes maximum likelihood (ML) estimates of unknown parameters in probabilistic models involving latent avriables Z1. An instructive way of thinking about EM is to think of it as a systematic way o

Maximum Likelihood (ML). Expectation-Maximization (EM). Recursive EM (REM). Distributed EM (DEM). Simulation results. S. Gannot EM Localization & Tracking 2 / 17. Problem Formulation Statistical Model Received Data @microphone pair m STFT . (ML) Maximu

We consider maximum likelihood (ML) estimation of mean and covariance structure models when data are missing. Expectation maximization (EM), generalized expectation maximization (GEM), Fletcher-Powell, and Fisher-scoring algorithm

transactions would allow participants to enter in commercial bilateral transactions to find a counterparty that will assume the Capacity Supply Obligation (“CSO”) and mitigate exposure ‒Reliability can be improved by finding a counterparty in the bilateral window for a given season since in times of scarcity, in ARA3 the CSO may not be acquired by another resource . Current Rules 3 T