Topic 14: Maximum Likelihood Estimation

2y ago
23 Views
3 Downloads
683.05 KB
6 Pages
Last View : 9d ago
Last Download : 3m ago
Upload by : Jacoby Zeller
Transcription

Topic 14: Maximum Likelihood EstimationNovember, 2009As before, we begin with a sample X (X1 , . . . , Xn ) of random variables chosen according to one of a familyof probabilities Pθ .In addition, f (x θ), x (x1 , . . . , xn ) will be used to denote the density function for the data when θ is the truestate of nature.Definition 1. The likelihood function is the density function regarded as a function of θ.L(θ x) f (x θ), θ Θ.(1)The maximum likelihood estimator (MLE),θ̂(x) arg max L(θ x).(2)θNote that if θ̂(x) is a maximum likelihood estimator for θ, then g(θ̂(x)) is a maximum likelihood estimatorforpg(θ). For example, if θ is a parameter for the variance and θ̂ is the maximum likelihood estimator, then θ̂ is themaximum likelihood estimator for the standard deviation. This flexibility in estimation criterion seen here is notavailable in the case of unbiased estimators.Typically, maximizing the score function ln L(θ x) will be easier.1ExamplesExample 2 (Bernoulli trials). If the experiment consists of n Bernoulli trial with success probability θ, thenL(θ x) θx1 (1 θ)(1 x1 ) · · · θxn (1 θ)(1 xn ) θ(x1 ··· xn ) (1 θ)n (x1 ··· xn ) .ln L(θ x) ln θ(nXxi ) ln(1 θ)(n nXi 1xi ) nx̄ ln θ n(1 x̄) ln(1 θ).i 1 ln L(θ x) n θ x̄ 1 x̄ θ1 θ .This equals zero when θ x̄. Check that this is a maximum. Thus,θ̂(x) x̄.Example 3 (Normal data). Maximum likelihood estimation can be applied to a vector valued parameter. For a simplerandom sample of n normal random variables,L(µ, σ 2 x) 12πσ 2exp (x1 µ)22σ 2 ··· 12πσ 2exp89 (xn µ)22σ 2 p1(2πσ 2 )nexp n1 X(xi µ)2 .2σ 2 i 1

Maximum Likelihood Estimation0.20.30.40.50.61.0e-120.0e 00l0.0e 00 5.0e-07l1.0e-062.0e-121.5e-06Introduction to Statistical p0.30.4pFigure 1: Likelihood function (top row) and its logarithm, the score function, (bottom row) for Bernouli trials. The left column is based on 20 trialshaving 8 and 11 successes. The right column is based on 40 trials having 16 and 22 successes. Notice that the maximum likelihood is approximately10 6 for 20 trials and 10 12 for 40. Note that the peaks are more narrow for 40 trials rather than 20.90

Introduction to Statistical MethodologyMaximum Likelihood Estimationln L(µ, σ 2 x) nn1 Xln 2πσ 2 2(xi µ)2 .22σ i 1n11 X ln L(µ, σ 2 x) 2(xi µ) . 2 n(x̄ µ) µσ i 1σBecause the second partial derivative with respect to µ is negative,µ̂(x) x̄is the maximum likelihood estimator.nn1 X n2lnL(µ,σ x) (xi µ)2 2 2 σ 2σ2(σ 2 )2 i 1(σ )n1Xσ2 (xi µ)2n i 1!.Recalling that µ̂(x) x̄, we obtainnσ̂ 2 (x) 1X(xi x̂)2 .n i 1Note that the maximum likelihood estimator is a biased estimator.Example 4 (Linear regression). Our data is n observations with one explanatory variable and one response variable.The model is thatyi α βxi iwhere the i are independent mean 0 normal random variable. The (unknown) variance is σ 2 . The likelihood functionL(α, β, σ 2 y, x) p1(2πσ 2 )nln L(α, β, σ 2 y, x) exp n1 X(yi (α βxi ))2 .2σ 2 i 1nn1 Xln 2πσ 2 2(yi (α βxi ))2 .22σ i 1This, the maximum likelihood estimators α̂ and β̂ also the least square estimator. The predicted value for the responsevariableŷi α̂ β̂xi .The maximum likelihood estimator for σ 2 isn2σ̂MLE 1X(yi ŷi )2 .nk 1The unbiased estimator isn2σ̂U 1 X(yi ŷi )2 .n 2k 1For the measurements on the lengths in centimeters of the femur and humerus for the five specimens of Archeopteryx,we have the following R output for linear regression. femur -c(38,56,59,64,74) humerus -c(41,63,70,72,84) summary(lm(humerus femur))Call:91

Introduction to Statistical MethodologyMaximum Likelihood Estimationlm(formula humerus femur)Residuals:12-0.8226 -0.36683453.0425 -0.9420 -0.9110Coefficients:Estimate Std. Error t value Pr( t )(Intercept) -3.659594.45896 -0.821 0.471944femur1.196900.07509 15.941 0.000537 ***--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.11Residual standard error: 1.982 on 3 degrees of freedomMultiple R-squared: 0.9883,Adjusted R-squared: 0.9844F-statistic: 254.1 on 1 and 3 DF, p-value: 0.0005368The residual standard error of 1.982 centimeters is obtained by squaring the 5 residuals, dividing by 3 5 2and taking a square root.Example 5 (Uniform random variables). If our data X (X1 , . . . , Xn ) are a simple random sample drawn fromuniformly distributed random variable whose maximum value θ is unknown, then each random variable has density 1/θ if 0 x θ,f (x θ) 0otherwise.Therefore, the likelihood L(θ x) 1/θn0if, for all i, 0 xi θ,otherwise.Consequently, to maximize L(θ x), we should minimize the value of θn in the first alternative for the likelihood. Thisis achieved by takingθ̂(x) max xi .1 i nHowever,θ̂(X) max Xi θ1 i nand the maximum likelihood estimator is biased.For 0 x θ, the distribution of X(n) max1 i n Xi isF(n) (x) P { max Xi x} P {X1 x}n (x/θ)n .1 i nThus, the densityf(n) (x) nxn 1.θnEθ X(n) nθ.n 1The meanand thusd(X) n 1X(n)nis an unbiased estimator of θ.92

Introduction to Statistical Methodology2Maximum Likelihood EstimationAsymptotic PropertiesMuch of the attraction of maximum likelihood estimators is based on their properties for large sample sizes.1. Consistency If θ0 is the state of nature, thenL(θ0 X) L(θ X)if and only ifn1 X f (Xi θ0 )ln 0.n i 1f (Xi θ)By the strong law of large numbers, this sum converges to f (X1 θ0 )Eθ0 ln.f (X1 θ)which is greater than 0. From this, we obtainθ̂(X) θ0as n .We call this property of the estimator consistency.2. Asymptotic normality and efficiency Under some assumptions that is meant to insure some regularity, a centrallimit theorem holds. Here we have n(θ̂(X) θ0 )converges in distribution as n to a normal random variable with mean 0 and variance 1/I(θ0 ), the Fisherinformation for one observation. Thus1,Varθ0 (θ̂(X)) nI(θ0 )the lowest possible under the Crámer-Rao lower bound. This property is called asymptotic efficiency.3. Properties of the log likelihood surface. For large sample sizes, the variance of an MLE of a single unknownparameter is approximately the negative of the reciprocal of the the Fisher information 2 I(θ) ElnL(θ X). θ2Thus, the estimate of the variance given data xσ̂ 2 1. 2ln L(θ̂ x). θ2the negative reciprocal of the second derivative, also known as the curvature, of the log-likelihood functionevaluated at the MLE.If the curvature is small, then the likelihood surface is flat around its maximum value (the MLE). If the curvatureis large and thus the variance is small, the likelihood is strongly curved at the maximum.For a multidimensional parameter space θ (θ1 , θ2 , . . . , θn ), Fisher information I(θ) is a matrix, the ij-thentry is 2 I(θi , θj ) Eθln L(θ X)ln L(θ X) Eθln L(θ X) θi θj θi θj93

Introduction to Statistical MethodologyMaximum Likelihood EstimationExample 6. To obtain the maximum likelihood estimate for the gamma family of random variables, write α α ββα 1 βx1α 1 βxnL(α, β x) xe···xe.Γ(α) 1Γ(α) nln L(α, β x) n(α ln β ln Γ(α)) (α 1)nXln xi βi 1nXxi .i 1To determine the parameters that maximize the likelihood, solve the equationsnXd ln L(α̂, β̂ x) n(ln β̂ ln Γ(α̂)) ln xi 0, αdαi 1andln x n α̂ Xln L(α̂, β̂ x) n xi 0, ββ̂ i 1x̄ α̂β̂dln Γ(α̂) ln β̂dα.To compute the Fisher information matrix note thatI(α, β)11 2d2ln L(α, β x) n 2 ln Γ(α),2 αdαI(α, β)12 I(α, β)22 2αln L(α, β x) n 2 , β 2β 21ln L(α, β x) n . α ββThis give a Fisher information matrixI(α, β) nd2dα2ln Γ(α) β1α β1β2!.The inverse 1I(α, β) 2dnα( dα21ln Γ(α) 1) αβ2 d2β β dα2 ln Γ(α) .For the example for the distribution of fitness effects α 0.23 and β 5.35 and n 100, and 10.235.350.0001202 0.01216I(0.23, 5.35) 1 .0.01216 1.3095100(0.23)(19.12804) 5.35 5.352 (20.12804)Var(0.23,5.35) (α̂) 0.0001202,Var(0.23,5.35) (β̂) 1.3095.Compare this to the empirical values of 0.0662 and 2.046 for the method of moments94

the negative reciprocal of the second derivative, also known as the curvature, of the log-likelihood function evaluated at the MLE. If the curvature is small, then the likelihood surface is flat around its maximum value (the MLE). If the curvature is large and thus the variance is small, the likelihood is strongly curved at the maximum.

Related Documents:

Maximum Lq-Likelihood Estimation via the Expectation-Maximization Algorithm: A Robust Estimation of Mixture Models Yichen QIN and Carey E. PRIEBE We introduce a maximum Lq-likelihood estimation (MLqE) of mixture models using our proposed expectation-maximization (EM) al- gorithm, namely the EM algorithm with Lq-likelihood (EM-Lq).

Tiny likelihood can be efficiently computed using dynamic programming (Felsenstein, 1981). Maximum Likelihood Analysis ofPhylogenetic Trees - p.11. Three Likelihood Versions Big Likelihood: Given the sequence data, find a tree and edge weights that maximize data tree & edge weights .

As usual, the log-likelihood function L( ) can be interpreted as a measure of how well the parameter values fit the training example. In ML estimation we seek the parameter values that maximize L( ). The maximum-likelihood problem is the following: Definition 2 (ML Estimates for Naive Bayes

[1] or [2]. However most of the parameter estimation techniques used in engineering for nonlinear continuous time system do not care about that correlation at all. In this paper we describe an estimation algorithm based on the maximum likelihood framework that allows us to take into account the way the model uncertainty vectors

Topic 5: Not essential to progress to next grade, rather to be integrated with topic 2 and 3. Gr.7 Term 3 37 Topic 1 Dramatic Skills Development Topic 2 Drama Elements in Playmaking Topic 1: Reduced vocal and physical exercises. Topic 2: No reductions. Topic 5: Topic 5:Removed and integrated with topic 2 and 3.

Timeframe Unit Instructional Topics 4 Weeks Les vacances Topic 1: Transportation . 3 Weeks Les contes Topic 1: Grammar Topic 2: Fairy Tales Topic 3: Fables Topic 4: Legends 3 Weeks La nature Topic 1: Animals Topic 2: Climate and Geography Topic 3: Environment 4.5 Weeks L’histoire Topic 1: Pre-History - 1453 . Plan real or imaginary travel .

troduces a general method for fast MRI parameter estimation. A common MRI parameter estimation strategy involves minimizing a cost function related to a statistical likelihood function. Because MR signal models are typically nonlinear functions of the underlying latent parameters, such likelihood-based estimation usually requires non .

Studi Pendidikan Akuntansi secara keseluruhan adalah sebesar Rp4.381.147.409,46. Biaya satuan pendidikan (unit cost) pada Program Studi Akuntansi adalah sebesar Rp8.675.539,42 per mahasiswa per tahun. 2.4 Kerangka Berfikir . Banyaknya aktivitas-aktivitas yang dilakukan Fakultas dalam penyelenggaraan pendidikan, memicu biaya-biaya dalam penyelenggaraan pendidikan. Biaya dalam pendidikan .