Lecture 15: Multivariate Normal Distributions

2y ago
20 Views
2 Downloads
228.19 KB
18 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Mika Lloyd
Transcription

Lecture 15: Multivariate normal distributionsNormal distributions with singular covariance matricesConsider an n-dimensional X N(µ, Σ) with a positive definite Σ and afixed k n matrix A that is not of rank k (so k may be larger than n).The mgf of Y AX is still equal to000MY (t) e(Aµ) t t (AΣA )t/2 ,t RkBut what is the distribution corresponding to this mgf?Lemma.00For any n n non-negative definite matrix Σ and µ R n , e µ t t Σt/2defined for all t R n is the mgf of an n-dimensional random vector X .Proof.From the theory of linear algebra, a non-negative definite matrix Σ ofrank r n satisfies Λ 0C00Σ TT C ΛCT 0 0Dbeamer-tu-logowhere Λ is an r r diagonal matrix whose all diagonal elements areUW-Madison (Statistics)Stat 609 Lecture 1520151 / 18

positive, 0 denotes a matrix of 0’s of an appropriate order, C is an r nmatrix of rank r , T is an n n matrix satisfying TT 0 T 0 T In (theidentity matrix of order n), CC 0 Ir , DC 0 0, DD 0 In r , andC 0 C D 0 D In .Let Y be an r -dimensional random vector N(Cµ, Λ) and define Y0X T C 0 Y D 0 DµDµ00Since Y N(Cµ, Λ), its mgf is MY (s) e(Cµ) s s Λs/2 , s R r and themgf of X is000000MX (t) e(D Dµ) t MY (Ct)) e(D Dµ) t e(Cµ) (Ct) (Ct) Λ(Ct)/20000000 e µ (D D C C)t t C ΛCt/2 e µ t t Σt/2t RnThis completes the proof.DefinitionFor any fixed n n non-negative definite matrix Σ and µ R n , the00distribution of an n-dimensional random vector with mgf e µ t t Σt/2beamer-tu-logoiscalled normal distribution and denoted by N(µ, Σ).UW-Madison (Statistics)Stat 609 Lecture 1520152 / 18

If Σ is positive definition, then this definition is the same as theprevious definition using the pdf.If X N(µ, Σ) and Y AX b, then Y N(Aµ, AΣA0 ), regardlessof whether AΣA0 is singular or not.If X is multivariate normal, then any sub-vector of X is alsonormally distributed.If n-dimensional X N(µ, Σ) and the rank of Σ is r n, thereexists an r n matrix C of rank r and Y CX N(Cµ, CΣC 0 ),where CΣC 0 is a diagonal matrix whose diagonal elements are allpositive, and hence Y has an r -dimensional normal pdf andcomponents of Y are independent.If n-dimensional X N(µ, Σ) and the rank of Σ is r n, then, fromthe previous discussion, X C 0 Y D 0 Dµ, whereY N(Cµ, CΣC 0 ) andE(X ) C 0 E(Y ) D 0 Dµ (C 0 C D 0 D)µ µVar(X ) C 0 Var(Y )C C 0 CΣC 0 C ΣUW-Madison (Statistics)Stat 609 Lecture 15beamer-tu-logo20153 / 18

Thus, µ and Σ in N(µ, Σ) is still the mean and covariance matrix.Furthermore, any two components of X N(µ, Σ) are independent iffthey are uncorrelated.This can be shown as follows.Suppose that X1 and X2 are the first two components of X andCov(X1 , X2 ) 0, i.e., the (1, 2)th and (2, 1)th elements of Σ are 0.Let µ1 and µ2 be the first two components of µ and σ12 and σ22 be thefirst and second diagonal elements of Σ, and let t (t1 , t2 , 0, ., 0),t1 R, t2 R.Then the mgf of (X1 , X2 ) is002 22 2M(X1 ,X2 ) (t1 , t2 ) e µ t t Σt/2 e µ1 t1 σ1 t1 /2 e µ2 t2 σ2 t2 /2t1 R, t2 RBy Theorem M4, X1 and X2 are independent.Theorem.An n-dimensional random vector X N(µ, Σ) (regardless of whether Σis singular or not) iff for any n-dimensional constant vector c,beamer-tu-logoc 0 X N(c 0 µ, c 0 Σc).UW-Madison (Statistics)Stat 609 Lecture 1520154 / 18

Proof.We treat a degenerated X c as N(c, 0).00If X N(µ, Σ), then MX (t) e µ t t Σt/2 .For any c R n , by the properties of mgf, the mgf of c 0 X is0000Mc 0 X (t) MX (ct) e µ (ct) (ct) Σ(ct)/2 e(c µ)t (c Σc)t2 /2t RN(c 0 µ, c 0 Σc).which is the mgf ofBy uniqueness, c 0 X N(c 0 µ, c 0 Σc).If c 0 X N(c 0 µ, c 0 Σc) for any c R n , then t 0 X N(t 0 µ, t 0 Σt) forany t R n and00Mt 0 X (s) e(t µ)s (t Σt)s2 /2s RLetting s 1, we obtain000Mt 0 X (1) e(t µ) (t Σt)/2 E(et X ) MX (t)t RnBy uniqueness, X N(µ, Σ).The condition any c R n is important.UW-Madison (Statistics)Stat 609 Lecture 15beamer-tu-logo20155 / 18

The uniform distribution on [a, b] [c, d]We have shown that the two marginal distributions are uniformdistributions on intervals [a, b] and [c, d].For non-zero constants ξ and ζ , is the distribution of ξ X ζ Y auniform distribution on some interval?If (ebt eat )/t is defined to be b a when t 0 for any constantsa b, thenMX ,Y (t, s) Z bZ d1dxdy(b a)(d c)a c(ebt eat )(eds ecs )s, t R(b a)(d c)tsetx syandMξ X ζ Y (t) E(et(ξ X ζ Y ) ) (ebξ t eaξ t )(edζ t ecζ t )(b a)(d c)ξ ζ t 2t RThis is not a mgf of a uniform distribution on an interval [r , h], which isbeamer-tu-logoof the form (eht ert )/[t(h r )] for t R.UW-Madison (Statistics)Stat 609 Lecture 1520156 / 18

We have shown that if X N(µ, Σ), then any linear function AX b isnormally distributed.The following result concerns the independence of linear functions of anormally distributed random vector.Theorem N1.Let X be an n-dimensional random vector N(µ, Σ) and A be a fixedk n matrix, and B be a fixed l n matrix. Then, AX and BX areindependent iff AΣB 0 0.Proof.Let AAXY X BBXFrom the properties of the multivariate normal distribution, we knowthat Y is multivariate normal with covariance matrix AAΣA0 AΣB 000Σ(A B ) beamer-tu-logoBBΣA0 BΣB 0UW-Madison (Statistics)Stat 609 Lecture 1520157 / 18

Hence, AX and BX are uncorrelated iff AΣB 0 0 and, thus, the only ifpart follows since independence implies no correlation.The proof for the if part is the same as the proof of two uncorrelatedcomponents of X are independent: we can show that if AΣB 0 0, thenthe mgf of (AX , BX ) is a product of an mgf on R k and another mgf onR l , and then apply Theorem M4.Theorem N2.If (X , Y ) is a random vector N(µ, Σ) with Σ11 Σ12µ1,,Σ µ Σ21 Σ22µ2and if Σ is positive definite, then 1Y X N µ2 Σ21 Σ 1(X µ),Σ ΣΣΣ12221121111It follows from the properties of normal distributions thatE(Y X ) µ2 Σ21 Σ 111 (X µ1 ),Var(Y X ) Σ22 Σ21 Σ 111 Σ12beamer-tu-logoWhile the conditional mean depends on X , the conditional covariancematrix does not.UW-Madison (Statistics)Stat 609 Lecture 1520158 / 18

Proof.Consider the transformationU AX Ywith a fixed matrix A chosen so that U and X are independent.From Theorem N1, we need U and X to be uncorrelated.SinceCov(X , U) Cov(X , AX Y ) Cov(X , AX ) Cov(X , Y ) Cov(X , X )A0 Σ12 Σ11 A0 Σ12we choose A Σ21 Σ 111 .Consider the transformation I0VXX, YUAX Y Σ21 Σ 1I11 (U, V ) 1 (X , Y )Let f(X ,Y ) be the pdf of (X , Y ), f(U,V ) be the pdf of (U, V ), fU be the pdfof U and fV be the pdf of V .beamer-tu-logoBy the transformation formula and the independence of U and V X ,UW-Madison (Statistics)Stat 609 Lecture 1520159 / 18

f(X ,Y ) (x, y ) f(U,V ) (u, v ) fU (u)fV (v ) fU (y Σ21 Σ 111 x)fX (x)Then the pdf of Y X isf(X ,Y ) (x, y ) fU (y Σ21 Σ 111 x)fX (x) fU (y Σ21 Σ 111 x)fX (x)fX (x)Since U Σ21 Σ 111 X Y , U is normally distributed. 1E(U) Σ21 Σ 111 E(X ) E(Y ) Σ21 Σ11 µ1 µ2Var(U) Var(AX Y ) Var(AX ) Var(Y ) 2Cov(AX , Y ) AVar(X )A0 Σ22 2ACov(X , Y ) 1 1 Σ21 Σ 111 Σ11 Σ11 Σ12 Σ22 2Σ21 Σ11 Σ12 Σ22 Σ21 Σ 111 Σ12 1Hence, fU is the pdf of N(µ2 Σ21 Σ 111 µ1 , Σ22 Σ21 Σ11 Σ12 ). 1Given X x, Σ21 Σ 111 x is a constant and, hence, fU (y Σ21 Σ11 x) is 1 1the pdf of N(µ2 Σ21 Σ11 (x µ1 ), Σ22 Σ21 Σ11 Σ12 ), considered asabeamer-tu-logofunction of y .UW-Madison (Statistics)Stat 609 Lecture 15201510 / 18

Quadratic formsFor a random vector X and a fixed symmetric matrix A, X 0 AX is calleda quadratic function or quadratic form of X .We now study the distribution of quadratic forms when X is multivariatenormal.Theorem N3.Let X N(µ, In ) and A be a fixed n n symmetric matrix. A necessaryand sufficient condition for X 0 AX is chi-square distributed is A2 A, inwhich case the degrees of freedom of the chi-square distribution is therank of A and the noncentrality parameter µ 0 Aµ.Proof.Sufficiency.If A2 A, then A is a projection matrix and there exists an n n matrixT such that T 0 T TT 0 In and Ik 00A TT C 0C0 0beamer-tu-logowhere k is the rank of A and C is the first k rows of T .UW-Madison (Statistics)Stat 609 Lecture 15201511 / 18

Then X 0 AX (CX )0 (CX ) is simply the sum of the squares of CX , thefirst k components of TX .Since TX N(T µ, TIn T 0 ) N(T µ, In ), by definition X 0 AX has thechi-square distribution with degrees of freedom k and noncentralityparameter (Cµ)0 (Cµ) µ 0 C 0 Cµ µ 0 Aµ.Necessity.Suppose that X 0 AX is chi-square with degrees of freedom m andnoncentrality parameter δ 0.Then A must be nonnegative definite and there exists an n n matrixT such that T 0 T TT 0 In and Λ 00A TT0 0where Λ is a k k diagonal matrix contains k non-zero eigenvalues0 λ1 · · · λk .We still have TX N(T µ, In ).Let Y1 , ., Yk be the first k components of TX .Then Yi2 ’s are independent and Yi2 chi-square with degree of beamer-tu-logofreedom 1 and noncentrality parameter µi2 , where µi is the ithUW-Madison (Statistics)Stat 609 Lecture 15201512 / 18

component of µ, andX 0 AX k λi Yi2i 1Using the mgf formula for noncentral chi-square distributions, the mgf’sof the left and right hand sides are respectively given in the left andright hand sides of the following:2keδ t/(1 2t)eλi µi t/(1 2λi t) (1 2t)m/2 i 1 (1 2λi t)1/2t 1/2Suppose that λk 1.When t (2λk ) 1 , the right hand side of the above equation divergesto whereas the left hand side of the above equation goes to 1 1eδ (2λk ) /(1 λk ) /(1 λk 1 )m/2 , which is a contradiction.Hence λk 1 so that λi 1 for all i.Suppose that λk · · · λl 1 1 λl · · · λ1 0 for a positiveinteger l k , which implies2leδ t/(1 2t)eλi µi t/(1 2λi t) (1 2t)(m k l)/2 i 1 (1 2λi t)1/2UW-Madison (Statistics)Stat 609 Lecture 15t 1/2beamer-tu-logo201513 / 18

When t 1/2, the left hand side of the above equation diverges to ,whereas the right hand side of the above equation converges tol2eλi µi /2(1 λi ) (1 λi )1/2i 1which is a contradiction.Therefore, we must have λ1 · · · λk 1, i.e., A is a projection matrix.Theorem N4 (Cochran’s theorem).Suppose that X is an n-dimensional random vector N(µ, In ) andX 0 X X 0 A1 X · · · X 0 Ak X ,where In is the n n identity matrix and Ai is an n n symmetric matrixwith rank ni , i 1, ., k . A necessary and sufficient condition for(i) X 0 Ai X has the noncentral chi-square distribution with degrees offreedom ni and noncentrality parameter δi , i 1, ., k ,(ii) X 0 Ai X ’s are independent,beamer-tu-logois n n1 · · · nk , in which case δi µ 0 Ai µ and δ1 · · · δk µ 0 µ.UW-Madison (Statistics)Stat 609 Lecture 15201514 / 18

Proof.Suppose that (i)-(ii) hold.Then X 0 X has the chi-square distribution with degrees of freedomn1 · · · nk and noncentrality parameter δ1 · · · δk .By definition, X 0 X has the noncentral chi-square distribution withdegrees of freedom n and noncentrality parameter µ 0 µ.Then we must have n n1 · · · nk and δ1 · · · δk µ 0 µ.Suppose now that n n1 · · · nk .From the theory of linear algebra, for each i there exists cij R n ,j 1, ., ni , such that00X 0 Ai X (ci1X )2 · · · (cinX )2iLet C be the n n matrix whose columns are c11 , ., c1n1 , ., ck 1 , ., cknk ,ThenX 0 X X 0 C C 0 Xwith an n n diagonal matrix whose diagonal elements are 1. 1 ,This implies C C 0 In and thus C is of full rank and C 1 (C 0 )beamer-tu-logowhich is positive definite.UW-Madison (Statistics)Stat 609 Lecture 15201515 / 18

This shows In , which implies C 0 C CC 0 In andX 0 Ai X n1 ··· ni 1 ni Yj2 ,j n1 ··· ni 1 1where Yj is the jth component of Y C 0 X N(C 0 µ, In ).Hence Yj ’s are independent and Yj N(λj , 1), where λj is the jthcomponent of C 0 µ.This shows that X 0 Ai X , i 1, ., k , are independent and X 0 Ai X has thechi-square distribution with degrees of freedom ni and noncentralityparameter δi λn21 ··· ni 1 1 · · · λn21 ··· ni 1 ni .Letting X µ and Y C 0 X C 0 µ, we obtain that δi µ 0 Ai µ andδ1 · · · δk µ 0 CC 0 µ µ 0 µ.This completes the proof.Theorem N5.Let X be an n-dimensional random vector N(µ, In ) and A1 and A2 ben n projection matrices. Then a necessary and sufficient conditionbeamer-tu-logothat X 0 A1 X and X 0 A2 X are independent is A1 A2 0.UW-Madison (Statistics)Stat 609 Lecture 15201516 / 18

Proof.If A1 A2 0, then(In A1 A2 )2 In A1 A2 A1 A21 A2 A1 A2 A1 A2 A22 In A1 A2 ,i.e., In A1 A2 is a projection matrix with rank trace(In A1 A2 ) n r1 r2 , where ri trace(Ai ) is the rank of Ai , i 1, 2.By Cochran’s theorem andX 0 X X 0 A1 X X 0 A2 X X 0 (In A1 A2 )X ,X 0 A1 X and X 0 A2 X are independent.This proves the sufficiency.Assume that X 0 A1 X and X 0 A2 X are independent.Since X 0 Ai X has the noncentral chi-square distribution with degrees offreedom ri the rank of Ai and noncentrality parameter δi µ 0 Ai µ,X 0 (A1 A2 )X has the noncentral chi-square distribution with degreesbeamer-tu-logoof freedom r1 r2 and noncentrality parameter δ1 δ2 .UW-Madison (Statistics)Stat 609 Lecture 15201517 / 18

Consequently, A1 A2 is a projection matrix, i.e.,(A1 A2 )2 A1 A2 ,which impliesSinceA21A1 A2 A2 A1 0. A1 , we obtain that0 A1 (A1 A2 A2 A1 ) A1 A2 A1 A2 A1and0 A1 (A1 A2 A2 A1 )A1 2A1 A2 A1 ,which imply A1 A2 0.This proves the necessity.beamer-tu-logoUW-Madison (Statistics)Stat 609 Lecture 15201518 / 18

exists an r n matrix C of rank r and Y CX N(Cm;C C0), where C C0is a diagonal matrix whose diagonal elements are all positive, and hence Y has an r-dimensional normal pdf and . then any linear function AX b is normally distributed. The following result concerns the independence of

Related Documents:

Introduction of Chemical Reaction Engineering Introduction about Chemical Engineering 0:31:15 0:31:09. Lecture 14 Lecture 15 Lecture 16 Lecture 17 Lecture 18 Lecture 19 Lecture 20 Lecture 21 Lecture 22 Lecture 23 Lecture 24 Lecture 25 Lecture 26 Lecture 27 Lecture 28 Lecture

6.7.1 Multivariate projection 150 6.7.2 Validation scores 150 6.8 Exercise—detecting outliers (Troodos) 152 6.8.1 Purpose 152 6.8.2 Dataset 152 6.8.3 Analysis 153 6.8.4 Summary 156 6.9 Summary:PCAin practice 156 6.10 References 157 7. Multivariate calibration 158 7.1 Multivariate modelling (X, Y): the calibration stage 158 7.2 Multivariate .

Introduction to Multivariate methodsIntroduction to Multivariate methods – Data tables and Notation – What is a projection? – Concept of Latent Variable –“Omics” Introduction to principal component analysis 8/15/2008 3 Background Needs for multivariate data analysis Most data sets today are multivariate – due todue to

An Introduction to Multivariate Design . This simplified example represents a bivariate analysis because the design consists of exactly two dependent or measured variables. The Tricky Definition of the Multivariate Domain Some Alternative Definitions of the Multivariate Domain . “With multivariate statistics, you simultaneously analyze

Motivation Intro. toMultivariateNormal BivariateNormal MoreProperties Estimation CLT Others Outline Motivation The multivariate normal distribution The Bivariate Normal Distribution More properties of multivariate normal Estimation of µand Σ Central Limit Theorem Reading: Johnson & Wichern pages 149-176

Multivariate Statistics 1.1 Introduction 1 1.2 Population Versus Sample 2 1.3 Elementary Tools for Understanding Multivariate Data 3 1.4 Data Reduction, Description, and Estimation 6 1.5 Concepts from Matrix Algebra 7 1.6 Multivariate Normal Distribution 21 1.7 Concluding Remarks 23 1.1 Introduction Data are information.

Lecture 1: A Beginner's Guide Lecture 2: Introduction to Programming Lecture 3: Introduction to C, structure of C programming Lecture 4: Elements of C Lecture 5: Variables, Statements, Expressions Lecture 6: Input-Output in C Lecture 7: Formatted Input-Output Lecture 8: Operators Lecture 9: Operators continued

De nition 2. A random vector X2Rphas a multivariate normal distribution if t0Xis an univariate normal for all t 2Rp. The de nition says that Xis MVN if every projection of Xonto a 1-dimensional subspace is normal, with a convention that a degenerate distribution chas a norm