Lecture 4 Su Cient Statistics. Introduction To Estimation

2y ago
34 Views
2 Downloads
264.60 KB
8 Pages
Last View : 14d ago
Last Download : 3m ago
Upload by : Genevieve Webb
Transcription

Lecture 4Su cient Statistics. Introduction to Estimation1 Su cient statisticsLetf (x θ) with θ Θ be some parametric family.f (x θ).LetX (X1 , ., Xn ) be a random sample from distributionSuppose we would like to learn parameter valueXstatistic allows us to separate information contained ininformation as long as we are concerned with parameterθfrom our sample.The concept of su cientinto two parts. One part contains all the valuableθ,while the other part contains pure noise in thesense that this part has no valuable information. Thus, we can ignore the latter part.De nition 1.depend onLetT (X)Statisticθis su cient forif the conditional distribution ofbe a su cient statistic. Consider the pairθinformation aboutasXalone, sinceT (X)T (X)),T (X),we can discardX (X1 , ., Xn )only parameter isT (X)X.Obviously,µ (θ µ).conditional distribution ofXXT (X)does notis independent ofWe have already seen thatT (X) t.n (xi µ)2 n(xn µ)2θθ.T (X),i 1 Xitself hasX(inis more likely than another.N (µ, σ 2 ).Suppose thatσ2T (X) X n N (µ, σ /n).2is known. Thus, theLet us calculate then (xi xn xn µ)2 n(xn µ)2i 1 thenThus, by observingFirst, note that contains the samecompletely.be a random sample fromgiven(X, T (X))But if we knowwe cannot say whether one particular value of parameterTherefore, once we knowLet(X, T (X)).is a function ofno value for us since its conditional distribution givenExamplegivenθ.T (X)addition toXn (xi xn )2 2i 1n n i 1(xi xn )2 .i 11(xi xn )(xn µ)

ThereforefX T (X) (x T (X) T (x))fX (x)fT (T (x)) nexp{ i 1 (xi µ)2 /(2σ 2 )}/((2π)n/2 σ n ) exp{ n(xn µ)2 /(2σ 2 )}/((2π)1/2 σ/n1/2 )n exp{ (xi xn )2 /(2σ 2 )}/((2π)(n 1)/2 σ n 1 /n1/2 ), i 1which is independent ofNote, however, thatXnµ.T (X) X nWe conclude thatis not su cient ifσ2is a su cient statistic for our parametric family.is not known.2 Factorization TheoremThe Factorization Theorem gives a general approach for how to nd a su cient statistic:Theorem 2 (Factorization Theorem). Let f (x θ) be the pdf of X . Then T (X) is a su cient statistic if andonly if there exist functions g(t θ) and h(x) such that f (x θ) g(T (x) θ)h(x).Proof.Letl(t θ)Supposedepend onbe the pdf ofT (X)θ.T (X).is a su cient statistic.Denote it byh(x).ThenThenfX T (X) (x T (X) T (x)) fX (x θ)/l(T (x) θ)f (x θ) l(T (x) θ)h(x).Denotinglbygyields the result in onedirection.In the other direction we will give a sloppy proof. Denote A(x) {y : T (y) T (x)}. l(T (x) θ) A(x)g(T (y) θ)h(y)dy g(T (x) θ)A(x)fX T (X) (x T (X) T (x)) θ.We conclude thatT (X)h(y)dy.A(x)Sowhich is independent ofThen f (y θ)dy f (x θ)l(T (x) θ)g(T (x) θ)h(x) g(T (x) θ) A(x) h(y)dyh(x),h(y)dyA(x) is a su cient statistic.2does not

ExamplefromLet us show how to use the factorization theorem in practice. Let2N (µ, σ )where bothµandf (x θ) σ2are unknown, i.e.exp{ exp{ [x2i 2µi 1Thus, n nT (X) ( i 1 Xi2 , i 1 Xi )θ (µ, σ ).X1 , ., Xnbe a random sampleThenn (xi µ)2 /(2σ 2 )}/((2π)n/2 σ n )i 1n 2n xi nµ2 ]/(2σ 2 )}/((2π)n/2 σ n ).i 1is a su cient statistic (hereh(x) 1andgis the whole thing). Notethat in this example we actually have a pair of su cient statistics. In addition, as we have seen before,f (x θ) exp{ [n (xi xn )2 n(xn µ)2 ]/(2σ 2 )}/((2π)n/2 σ n )i 1 exp{ [(n 1)s2n n(xn µ)2 ]/(2σ 2 )}/((2π)n/2 σ n ).Thus,T (X) (X n , sn2 )Note thatExampleXnis another su cient statistic. Yet another su cient statistic isT (X) (X1 , ., Xn ).is not su cient in this example.A less trivial example: letθ mini Xi maxi Xi 1 θT (X) (X(1) , X(n) )and0X1 , ., Xnbe a random sample fromotherwise. In other words,U [θ, 1 θ].Thenf (x θ) 1f (x θ) I{θ X(1) }I{1 θ X(n) }.ifSois su cient.3 Minimal Su cient StatisticsCould we reduce su cient statisticsay,T (X) T (X).and T (X) r(T (X)).T We say thatinformation reduction thansome functionTis not bigger thansuch that T (X)TSuppose we have two statistics,if there exists some functionwhenever we knowmust change its value as well.In this senseTT (X). rsuch thatIn this case whendoes not give less of anT.A su cient statisticrT In other words, we can calculatechanges its value, statisticDe nition 3.T (X) in the previous example even more?T (X)is calledminimalif for any su cient statisticT (X)there exists T (X) r(T (X)).Thus, in some sense, the minimal su cient statistic gives us the greatest data reduction without a loss ofinformation about parameters. The following theorem gives a characterization of minimal su cient statistics:Theorem 4. Let f (x θ) be the pdf of X and T (X) be such that, for any x, y, statement {f (x θ)/f (y θ) doesnot depend on θ} is equivalent to statement {T (x) T (y)}. Then T (X) is minimal su cient.We will leave this statement unproven here.ExampleLet us now go back to the example withindependent ofThereforeθif and only ifT (X) (X(1) , X(n) )x(1) y(1)andX1 , ., Xn U [θ, 1 θ].x(n) y(n)is minimal su cient.3Ratiof (x θ)/f (y θ)which is the case if and only ifisT (x) T (y).

ExampleX1 , ., XnLetdistribution with the pdftheorem above,θ,be a random sample from the Cauchy distribution with parameterf (x θ) 1/(π(x θ) ).2T (X) (X(1) , ., X(n) )Thenf (x1 , ., xn θ) 1/(πn ni 1 (xii.e.the θ) ).By theθ,then an2is minimal su cient.4 Estimators. Properties of estimators.An estimator is a function of the data (statistic). If we have a parametric family with parameterestimator ofExampleθis usually denoted byFor example, ifθ̂.X1 , ., Xnis a random sample from some distribution with mean2σ , then sample average µ̂ X n is an estimator of the population mean, n2i 1 (Xi X n ) /(n 1) is an estimator of the population variance.µ and varianceand sample varianceσ̂ 2 s2 4.1 UnbiasnessLetXbe our data. LetWe say thatθ̂isθexpectation whenθ̂ T (X)unbiasedforbe an estimator whereθifEθ [T (X)] θis the true parameter value. TheTis some function.for all possible values ofbiasofθ̂θis de ned by Bias(θ̂)wheresampleX1 , ., XnE[µ̂] µand2from some distribution with meanE[s ] σ2µand varianceσdenotes the Eθ [θ̂] θ.Thus, the concept of unbiasness means that we are on average correct. For example, if2EθXis a random, then, as we have already seen,. Thus, sample average and sample variance are unbiased estimators of populationmean and population variance correspondingly.4.2 E ciency: MSEAnother of the concepts that evaluates performance of estimators is the MSE (Mean Squared Error). Byde nition, MSE(θ̂) Eθ [(θ̂ θ)2 ].The theorem below gives a useful decomposition for MSE:Theorem 5. MSE(θ̂) Bias2 (θ̂) V (θ̂).Proof.E[(θ̂ θ)2 ] E[(θ̂ E[θ̂] E[θ̂] θ)2 ] E[(θ̂ E[θ̂])2 (E[θ̂] θ)2 2(θ̂ E[θ̂])(E[θ̂] θ)] V (θ̂) Bias2 (θ̂) 2E[θ̂ E[θ̂]](E[θ̂] θ) V (θ̂) Bias2 (θ̂).Estimators with smaller MSE are considered to be better, or more4e cient.

4.3 Connection between e ciency and su cient statisticsLetX (X1 , ., Xn )T (X)be a random sample from distributionbe a su cient statistic forθ.fθ .Letθ̂ δ(X)be an estimator ofθ.LetAs we have seen already, an MSE provides one way to compare thequality of di erent estimators. In particular, estimators with smaller MSE are said to be more e cient. Onthe other hand, once we knowT (X),X.we can discardθ̂ δ(X),ˆ:e cient as θtheorem below shows that for any estimatoronly throughT (X)Theorem 6(Rao-Blackwell)and is at least asHow do these concepts relate to each other? Thethere is another estimator which depends on data. In the setting above, de neXϕ(T ) E[δ(X) T ]. Then θ̂2 ϕ(T (X)) is anestimator for θ and MSE(θ̂2 ) MSE(θ̂). In addition, if θ̂ is unbiased, then θ̂2 is unbiased as well.Proof.To show thatδ(X)givendepend onis an estimator, we have to check that it does not depend onθ, the conditional distribution of Xsu cient forofθ̂2θ.Tis independent ofThus,ϕ(T (X))θgivenTθ.Xandθ̂2Indeed, sinceTisSo the conditional distributionE[δ(X) T ]as well. In particular, the conditional expectationdepends only on the dataMSE(θ̂)is independent ofθ.does notis an estimator. E[(θ̂ θ̂2 θ̂2 θ)2 ] E[(θ̂ θ̂2 )2 ] 2E[(θ̂ θ̂2 )(θ̂2 θ)] E[(θ̂2 θ)2 ] E[(θ̂ θ̂2 )2 ] 2E[(θ̂ θ̂2 )(θ̂2 θ)] MSE(θ̂2 ) E[(θ̂ θ̂2 )2 ] MSE(θ̂2 ),where in the last line we usedE[(θ̂ θ̂2 )(θ̂2 θ)] sinceE[(δ(X) ϕ(T (X)))(ϕ(T (X)) θ)] E[E[(δ(X) ϕ(T (X)))(ϕ(T (X)) θ) T ]] E[(ϕ(T (X)) θ)E[(δ(X) ϕ(T (X))) T ]] E[(ϕ(T (X)) θ) · (E[δ(X) T ] ϕ(T (X)))] 0,E[δ(X) T ] ϕ(T (X)).To show the last result, we haveE[ϕ(T (X))] E[E[δ(X) T ]] E[δ(X)] θby the law of iterated expectations.Examplem)!))p (1 X1 , ., Xn be a random sample from Binomial(p, k), i.e. P {Xj m} (k!/(m!(k k mp)for any integer m 0. Suppose our parameter of interest is the probability of onesuccess, i.e.θ P {Xj 1} kp(1 p)k 1 .mLetOne possible estimator is5θ̂ ni 1I(Xi 1)/n.This

estimator is unbiased, i.e.E[θ̂] θ.Let us nd a su cient statistic. The joint density of the data isf (x1 , ., xn ) n (k!/(xi !(k xi )!))pxi (1 p)k xii 1 Thus,T ni 1Xifunction(x1 , ., xn )pϕ E[θ̂ T ]ϕ(t) (1 p)nk θ̂ .T.t,nn Xi t]E[I(Xi 1)/n n i 1P {Xi 1 i 1 xiby considering its conditional expectation givendenote this estimator. Then, for any nonnegative integeri 1n Xj t}/nj 1P {X1 1 n Xj t}j 1 nP {X1 1, j 1 Xj t} nP { j 1 Xj t} nP {X1 1, j 2 Xj t 1} nP { j 1 Xj t} nP {X1 1}P { j 2 Xj t 1} nP { j 1 Xj t}kp(1 p)k 1 · (k(n 1))!/((t 1)!(k(n 1) (t 1))!)pt 1 (1 p)k(n 1) (t 1)(kn)!/(t!(kn t)!)pt (1 p)kn tk(k(n 1))!/((t 1)!(k(n 1) (t 1))!)(kn)!/(t!(kn t)!)k(k(n 1))!(kn t)!t(kn)!(kn k 1 t)!where we used the fact thatBinomial(k(nxiis su cient. In fact, it is minimal su cient.Using the Rao-Blackwell theorem, we can improveLet 1), p).X1is independent of(X2 , ., Xn ), ni 1Xi Binomial(kn, p), nandi 2Xi So our new estimator is n nk(k(n 1))!(kn i 1 Xi )!( i 1 Xi ) nθˆ2 ϕ(X1 , ., Xn ) .(kn)!(kn k 1 i 1 Xi )!By the theorem above, it is unbiased and at least as e cient asθˆ.The procedure we just applied is sometimesinformally referred to as Rao-Blackwellization.Note on implementation.t)One may say this is too complicated . We have derivedanalytically in order to calculate a new estimateθ̂2 ϕ(T ) ϕ( ni 1Xi ),ϕ(t) E(θ̂ depend onϕ(T ),p,that is evaluated in one point (the realized value ofso we are free to choose anyp(choosing6pclose toT ).T /(kn)i 1Xi but in real life you may justdo this with Monte-Carlo simulations. Note, that we do not need to calculate the wholeneed only nϕ(t)function weNote also, that the result does notwill give faster calculations).

Choose some Drawp (0, 1). X1b, ., XnbForb 1, ., Bas independent variables from Binomial (p, k ), ifRepeat drawing samples until you get CalculateYb The new estimator is1nrepeat the following: ni 1θ̂2 1B ni 1 ni 1 Xib̸ T , discard this sample. Xib T. I(Xib 1). Bb 1Yb .The accuracy is better for larger number of simulation7B.

MIT OpenCourseWarehttps://ocw.mit.edu14.381 Statistical Method in EconomicsFall 2018For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms

De nition 3. A su cient statistic . T. (X) is called minimal if for any su cient statistic . T (X) there exists some function . r. such that . T. (X) r (T (X)). Thus, in some sense, the minimal su cient statistic gives us the greatest data

Related Documents:

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

The degree of a polynomial is the largest power of xwith a non-zero coe cient, i.e. deg Xd i 0 a ix i! d if a d6 0 : If f(x) Pd i 0 a ixiof degree d, we say that fis monic if a d 1. The leading coe cient of f(x) is the coe cient of xd for d deg(f) and the constant coe cient is the coe cient of x0. For example, take R R.

Lecture 1: A Beginner's Guide Lecture 2: Introduction to Programming Lecture 3: Introduction to C, structure of C programming Lecture 4: Elements of C Lecture 5: Variables, Statements, Expressions Lecture 6: Input-Output in C Lecture 7: Formatted Input-Output Lecture 8: Operators Lecture 9: Operators continued

Lecture 1: Introduction and Orientation. Lecture 2: Overview of Electronic Materials . Lecture 3: Free electron Fermi gas . Lecture 4: Energy bands . Lecture 5: Carrier Concentration in Semiconductors . Lecture 6: Shallow dopants and Deep -level traps . Lecture 7: Silicon Materials . Lecture 8: Oxidation. Lecture

TOEFL Listening Lecture 35 184 TOEFL Listening Lecture 36 189 TOEFL Listening Lecture 37 194 TOEFL Listening Lecture 38 199 TOEFL Listening Lecture 39 204 TOEFL Listening Lecture 40 209 TOEFL Listening Lecture 41 214 TOEFL Listening Lecture 42 219 TOEFL Listening Lecture 43 225 COPYRIGHT 2016

Partial Di erential Equations MSO-203-B T. Muthukumar tmk@iitk.ac.in November 14, 2019 T. Muthukumar tmk@iitk.ac.in Partial Di erential EquationsMSO-203-B November 14, 2019 1/193 1 First Week Lecture One Lecture Two Lecture Three Lecture Four 2 Second Week Lecture Five Lecture Six 3 Third Week Lecture Seven Lecture Eight 4 Fourth Week Lecture .

Statistics Student Version can do all of the statistics in this book. IBM SPSS Statistics GradPack includes the SPSS Base modules as well as advanced statistics, which enable you to do all the statistics in this book plus those in our IBM SPSS for Intermediate Statistics book (Leech et al., in press) and many others. Goals of This Book

EUROPEAN BANKING SYSTEM DECEMBER 2020. RISK SSESSMENT TE EREN NKIN SSTEM 3 Contents Abbreviations 8 Executive summary 10 Introduction 12 1. Macroeconomic environment and market sentiment 13 2. Asset side 22 2.1. Assets: volume and composition 22 2.2. Asset-quality trends 32 3. Liability side: funding and liquidity 44 3.1. Funding 44 3.2. Liquidity 52 4. Capital 57 5. Profitability 65 6 .