3y ago

82 Views

3 Downloads

676.45 KB

72 Pages

Transcription

The Matrix Cookbook[ http://matrixcookbook.com ]Kaare Brandt PetersenMichael Syskind PedersenVersion: November 15, 20121

IntroductionWhat is this? These pages are a collection of facts (identities, approximations, inequalities, relations, .) about matrices and matters relating to them.It is collected in this form for the convenience of anyone who wants a quickdesktop reference .Disclaimer: The identities, approximations and relations presented here wereobviously not invented but collected, borrowed and copied from a large amountof sources. These sources include similar but shorter notes found on the internetand appendices in books - see the references for a full list.Errors: Very likely there are errors, typos, and mistakes for which we apologize and would be grateful to receive corrections at cookbook@2302.dk.Its ongoing: The project of keeping a large repository of relations involvingmatrices is naturally ongoing and the version will be apparent from the date inthe header.Suggestions: Your suggestion for additional content or elaboration of sometopics is most welcome acookbook@2302.dk.Keywords: Matrix algebra, matrix relations, matrix identities, derivative ofdeterminant, derivative of inverse matrix, differentiate a matrix.Acknowledgements: We would like to thank the following for contributionsand suggestions: Bill Baxter, Brian Templeton, Christian Rishøj, ChristianSchröppel, Dan Boley, Douglas L. Theobald, Esben Hoegh-Rasmussen, EvripidisKarseras, Georg Martius, Glynne Casteel, Jan Larsen, Jun Bin Gao, JürgenStruckmeier, Kamil Dedecius, Karim T. Abou-Moustafa, Korbinian Strimmer,Lars Christiansen, Lars Kai Hansen, Leland Wilkinson, Liguo He, Loic Thibaut,Markus Froeb, Michael Hubatka, Miguel Barão, Ole Winther, Pavel Sakov,Stephan Hattinger, Troels Pedersen, Vasile Sima, Vincent Rabaud, ZhaoshuiHe. We would also like thank The Oticon Foundation for funding our PhDstudies.Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 2

CONTENTSCONTENTSContents1 Basics1.1 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.2 Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.3 The Special Case 2x2 . . . . . . . . . . . . . . . . . . . . . . . . .2 Derivatives2.1 Derivatives2.2 Derivatives2.3 Derivatives2.4 Derivatives2.5 Derivatives2.6 Derivatives2.7 Derivatives2.8 Derivativesofofofofofofofofa Determinant . . . . . . . . . . . .an Inverse . . . . . . . . . . . . . . .Eigenvalues . . . . . . . . . . . . . .Matrices, Vectors and Scalar FormsTraces . . . . . . . . . . . . . . . . .vector norms . . . . . . . . . . . . .matrix norms . . . . . . . . . . . . .Structured Matrices . . . . . . . . .3 Inverses3.1 Basic . . . . . . . . . . .3.2 Exact Relations . . . . .3.3 Implication on Inverses .3.4 Approximations . . . . .3.5 Generalized Inverse . . .3.6 Pseudo Inverse . . . . .6667.889101012141414.171718202021214 Complex Matrices244.1 Complex Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 244.2 Higher order and non-linear derivatives . . . . . . . . . . . . . . . 264.3 Inverse of complex sum . . . . . . . . . . . . . . . . . . . . . . . 275 Solutions and Decompositions5.1 Solutions to linear equations .5.2 Eigenvalues and Eigenvectors5.3 Singular Value Decomposition5.4 Triangular Decomposition . .5.5 LU decomposition . . . . . .5.6 LDM decomposition . . . . .5.7 LDL decompositions . . . . .28283031323233336 Statistics and Probability346.1 Definition of Moments . . . . . . . . . . . . . . . . . . . . . . . . 346.2 Expectation of Linear Combinations . . . . . . . . . . . . . . . . 356.3 Weighted Scalar Variable . . . . . . . . . . . . . . . . . . . . . . 367 Multivariate Distributions7.1 Cauchy . . . . . . . . . .7.2 Dirichlet . . . . . . . . . .7.3 Normal . . . . . . . . . .7.4 Normal-Inverse Gamma .7.5 Gaussian . . . . . . . . . .7.6 Multinomial . . . . . . . .37373737373737Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 3

CONTENTS7.77.87.9CONTENTSStudent’s t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Wishart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Wishart, Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . .8 Gaussians8.1 Basics . . . . . . . .8.2 Moments . . . . . .8.3 Miscellaneous . . . .8.4 Mixture of Gaussians373839.40404244449 Special Matrices9.1 Block matrices . . . . . . . . . . . . . . . .9.2 Discrete Fourier Transform Matrix, The . .9.3 Hermitian Matrices and skew-Hermitian . .9.4 Idempotent Matrices . . . . . . . . . . . . .9.5 Orthogonal matrices . . . . . . . . . . . . .9.6 Positive Definite and Semi-definite Matrices9.7 Singleentry Matrix, The . . . . . . . . . . .9.8 Symmetric, Skew-symmetric/Antisymmetric9.9 Toeplitz Matrices . . . . . . . . . . . . . . .9.10 Transition matrices . . . . . . . . . . . . . .9.11 Units, Permutation and Shift . . . . . . . .9.12 Vandermonde Matrices . . . . . . . . . . . .4646474849495052545455565710 Functions and Operators10.1 Functions and Series . . . . .10.2 Kronecker and Vec Operator10.3 Vector Norms . . . . . . . . .10.4 Matrix Norms . . . . . . . . .10.5 Rank . . . . . . . . . . . . . .10.6 Integral Involving Dirac Delta10.7 Miscellaneous . . . . . . . . .5858596161626263. . . . . . . . . . . . . . . . . . . . . . . . . .Functions. . . . . .A One-dimensional Results64A.1 Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64A.2 One Dimensional Mixture of Gaussians . . . . . . . . . . . . . . . 65B Proofs and Details66B.1 Misc Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 4

CONTENTSCONTENTSNotation and NomenclatureAAijAiAijAnA 1A A1/2(A)ijAij[A]ijaaiaia z z Z z z ZMatrixMatrix indexed for some purposeMatrix indexed for some purposeMatrix indexed for some purposeMatrix indexed for some purpose orThe n.th power of a square matrixThe inverse matrix of the matrix AThe pseudo inverse matrix of the matrix A (see Sec. 3.6)The square root of a matrix (if unique), not elementwiseThe (i, j).th entry of the matrix AThe (i, j).th entry of the matrix AThe ij-submatrix, i.e. A with i.th row and j.th column deletedVector (column-vector)Vector indexed for some purposeThe i.th element of the vector aScalarReal part of a scalarReal part of a vectorReal part of a matrixImaginary part of a scalarImaginary part of a vectorImaginary part of a matrixdet(A)Tr(A)diag(A)eig(A)vec(A)sup A ATA TA AHDeterminant of ATrace of the matrix ADiagonal matrix of the matrix A, i.e. (diag(A))ij δij AijEigenvalues of the matrix AThe vector-version of the matrix A (see Sec. 10.2.2)Supremum of a setMatrix norm (subscript if any denotes what norm)Transposed matrixThe inverse of the transposed and vice versa, A T (A 1 )T (AT ) 1 .Complex conjugated matrixTransposed and complex conjugated matrix (Hermitian)A BA BHadamard (elementwise) productKronecker product0IJijΣΛThe null matrix. Zero in all entries.The identity matrixThe single-entry matrix, 1 at (i, j) and zero elsewhereA positive definite matrixA diagonal matrixPetersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 5

11Basics(AB) 1(ABC.) 1T 1(A )1.2B 1 A 1 .C 1(1) 1B 1A 1 T)(2) (AT TA B(4)(AB)T BT AT(5)(ABC.)T .CT BT AT(6)(A B)1.1BASICS(3)TH 1 1 H(A )(A B)H (A )AH BH(AB)H BH AHHH(ABC.) Tr(A) PTr(A) (7)(8)(9)H.C B AH(10)TracePiAii(11)i λi ,Tλi eig(A)(12)Tr(A) Tr(A )(13)Tr(AB) Tr(BA)(14)Tr(A B) Tr(A) Tr(B)(15)Tr(ABC) Tr(BCA) Tr(CAB)(16)aT a Tr(aaT )(17)DeterminantLet A be an n n matrix.det(A) Qi λiλi eig(A)det(cA) cn det(A),det(AT ) if A Rn ndet(A)(18)(19)(20)det(AB) det(A) det(B)(21)det(A 1 ) 1/ det(A)(22)det(An ) det(A)n(23)Tdet(I uv ) T1 u v(24)det(I A) 1 det(A) Tr(A)(25)11det(I A) 1 det(A) Tr(A) Tr(A)2 Tr(A2 )22(26)For n 2:For n 3:Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 6

1.3The Special Case 2x21BASICSFor n 4:det(I A) 1 det(A) Tr(A) 121 Tr(A)2 Tr(A2 )21113 Tr(A) Tr(A)Tr(A2 ) Tr(A3 )623(27)For small ε, the following approximation holds11det(I εA) 1 det(A) εTr(A) ε2 Tr(A)2 ε2 Tr(A2 )221.3(28)The Special Case 2x2Consider the matrix A A A11A21A12A22 Determinant and tracedet(A) A11 A22 A12 A21(29)Tr(A) A11 A22(30)Eigenvaluesλ2 λ · Tr(A) det(A) 0λ1 Tr(A) pTr(A)2 4 det(A)2λ1 λ2 Tr(A)λ2 Tr(A) pTr(A)2 4 det(A)2λ1 λ2 det(A)Eigenvectors v1 A12λ1 A11InverseA 1 1 det(A) v2 A22 A21A12λ2 A11 A12A11 (31)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 7

22DERIVATIVESDerivativesThis section is covering differentiation of a number of expressions with respect toa matrix X. Note that it is always assumed that X has no special structure, i.e.that the elements of X are independent (e.g. not symmetric, Toeplitz, positivedefinite). See section 2.8 for differentiation of structured matrices. The basicassumptions can be written in a formula as Xkl δik δlj Xijthat is for e.g. vector forms, x xi y i y x y i(32) x yi x y ij xi yjThe following rules are general and very useful when deriving the differential ofan expression ([19]): A (αX) (X Y) (Tr(X)) (XY) (X Y) (X Y) (X 1 ) (det(X)) (det(X)) (ln(det(X))) XT XH2.12.1.1 0α X X YTr( X)( X)Y X( Y)( X) Y X ( Y)( X) Y X ( Y) X 1 ( X)X 1Tr(adj(X) X)det(X)Tr(X 1 X)Tr(X 1 X)( X)T( X)H(A is a 43)(44)(45)Derivatives of a DeterminantGeneral form det(Y) xX det(X)Xjk Xik 1 Ydet(Y)Tr Y x(46) δij det(X)(47)k 2 det(Y) x2" "det(Y) Tr Y Y 1 x# x Y Y Tr Y 1Tr Y 1 x x # 1 Y 1 YY Tr Y x x(48)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 8

2.2Derivatives of an Inverse2.1.22DERIVATIVESLinear forms det(X) XX det(X)Xjk Xik det(X)(X 1 )T δij det(X)(49)(50)k det(AXB) X2.1.3 det(AXB)(X 1 )T det(AXB)(XT ) 1(51)Square formsIf X is square and invertible, then det(XT AX) 2 det(XT AX)X T X(52)If X is not square but A is symmetric, then det(XT AX) 2 det(XT AX)AX(XT AX) 1 X(53)If X is not square and A is not symmetric, then det(XT AX) det(XT AX)(AX(XT AX) 1 AT X(XT AT X) 1 ) X2.1.4(54)Other nonlinear formsSome special cases are (See [9, 7]) ln det(XT X) X ln det(XT X) X ln det(X) X det(Xk ) X2.2 2(X )T 2XT (X 1 )T (XT ) 1 k det(Xk )X T(55)(56)(57)(58)Derivatives of an InverseFrom [27] we have the basic identity Y 1 Y 1 Y 1Y x x(59)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 9

2.3Derivatives of Eigenvalues2DERIVATIVESfrom which it follows (X 1 )kl Xij (X 1 )ki (X 1 )jl(60) aT X 1 b X T abT X T(61) X det(X 1 ) det(X 1 )(X 1 )T(62) X Tr(AX 1 B) (X 1 BAX 1 )T(63) X Tr((X A) 1 ) ((X A) 1 (X A) 1 )T(64) XFrom [32] we have the following result: Let A be an n n invertible squarematrix, W be the inverse of A, and J(A) is an n n -variate and differentiablefunction with respect to A, then the partial differentials of J with respect to Aand W satisfy J T J A TA A W2.3Derivatives of Eigenvalues Xeig(X) Tr(X) I(65) X X Y eig(X) det(X) det(X)X T(66) X XIf A is real and symmetric, λi and vi are distinct eigenvalues and eigenvectorsof A (see (276)) with viT vi 1, then [33] λi vi2.42.4.1 viT (A)vi (67) (λi I A) (A)vi(68)Derivatives of Matrices, Vectors and Scalar FormsFirst Order xT a x aT Xb X aT XT b X aT Xa X X Xij (XA)ij Xmn (XT A)ij Xmn aT x x abT(70) baT(71) aT XT a X Jij δim (A)nj (Jmn A)ij(74) δin (A)mj (Jnm A)ij(75) a (69)aaT(72)(73)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 10

2.4Derivatives of Matrices, Vectors and Scalar Forms2.4.22DERIVATIVESSecond Order XXkl Xmn Xij 2Xklmn bT XT Xc X (Bx b)T C(Dx d) x (XT BX)kl Xij (XT BX) XijXkl(76)kl X(bcT cbT )(77) BT C(Dx d) DT CT (Bx b)(78) δlj (XT B)ki δkj (BX)il(79) XT BJij Jji BX(Jij )kl δik δjl (80)See Sec 9.7 for useful properties of the Single-entry matrix Jij xT Bx x bT XT DXc X (Xb c)T D(Xb c) X (B BT )x DT XbcT DXcbT (D DT )(Xb c)bT(81)(82)(83)Assume W is symmetric, then (x As)T W(x As) 2AT W(x As) s (x s)T W(x s) 2W(x s) x (x s)T W(x s) 2W(x s) s (x As)T W(x As) 2W(x As) x (x As)T W(x As) 2W(x As)sT A(84)(85)(86)(87)(88)As a case with complex values the following holds (a xH b)2 x 2b(a xH b) (89)This formula is also known from the LMS algorithm [14]2.4.3Higher-order and non-linearn 1X (Xn )kl (Xr Jij Xn 1 r )kl Xijr 0(90)For proof of the above, see B.1.3.n 1X T na X b (Xr )T abT (Xn 1 r )T Xr 0(91)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 11

2.5Derivatives of Traces2DERIVATIVESn 1Xh T n T na (X ) X b XXn 1 r abT (Xn )T Xrr 0 (Xr )T Xn abT (Xn 1 r )Ti(92)See B.1.3 for a proof.Assume s and r are functions of x, i.e. s s(x), r r(x), and that A is aconstant, then T T r s TAr AT s(93)s Ar x x x (Ax)T (Ax) x (Bx)T (Bx) 2.4.4 xT AT Ax x xT BT BxAT AxxT AT AxBT Bx2 T 2x BBx(xT BT Bx)2(94)(95)Gradient and HessianUsing the above we have for the gradient and the Hessianf f x f x 2f x xT2.5 xT Ax bT x(96) (A AT )x b(97) A AT(98)Derivatives of TracesAssume F (X) to be a differentiable function of each of the elements of X. Itthen holds that Tr(F (X)) f (X)T Xwhere f (·) is the scalar derivative of F (·).2.5.1First Order Tr(X) I X Tr(XA) AT X Tr(AXB) AT BT X Tr(AXT B) BA X Tr(XT A) A X Tr(AXT ) A X Tr(A X) Tr(A)I X(99)(100)(101)(102)(103)(104)(105)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 12

2.5Derivatives of Traces2.5.22DERIVATIVESSecond Order Tr(X2 ) 2XT X Tr(X2 B) (XB BX)T X Tr(XT BX) X Tr(BXXT ) X Tr(XXT B) X Tr(XBXT ) X Tr(BXT X) X Tr(XT XB) X Tr(AXBX) X Tr(XT X) X Tr(BT XT CXB) X Tr XT BXC X Tr(AXBXT C) Xhi Tr (AXB C)(AXB C)T X Tr(X X) X(106)(107) BX BT X(108) BX BT X(109) BX BT X(110) XBT XB(111) XBT XB(112) XBT XB(113) AT XT BT BT XT AT(114) Tr(XXT ) X(115) CT XBBT CXBBT 2X BXC BT XCT(116)(117) AT CT XBT CAXB(118) 2AT (AXB C)BT(119) Tr(X)Tr(X) 2Tr(X)I(120) XSee [7].2.5.3Higher Order Tr(Xk ) X k(Xk 1 )Tk 1X Tr(AXk ) (Xr AXk r 1 )T Xr 0 T T T CXXT CXBBT X Tr B X CXX CXB(121)(122) CT XBBT XT CT X CXBBT XT CX CT XXT CT XBBT(123)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 13

2.6Derivatives of vector norms2DERIVATIVES2.5.4Other Tr(AX 1 B) (X 1 BAX 1 )T X T AT BT X T XAssume B and C to be symmetric, thenhi Tr (XT CX) 1 A Xhi Tr (XT CX) 1 (XT BX) X(124) (CX(XT CX) 1 )(A AT )(XT CX) 1 (125) 2CX(XT CX) 1 XT BX(XT CX) 1 2BX(XT CX) 1hi Tr (A XT CX) 1 (XT BX) X(126) 2CX(A XT CX) 1 XT BX(A XT CX) 1 2BX(A XT CX) 1(127)See [7]. Tr(sin(X)) X2.62.6.12.7 cos(X)T(128)Derivatives of vector normsTwo-norm x a x a 2 x x a 2(129) x aI(x a)(x a)T x kx ak2kx ak2kx ak32(130) xT x 2 x 22 2x x x(131)Derivatives of matrix normsFor more on matrix norms, see Sec. 10.4.2.7.1Frobenius norm X 2F Tr(XXH ) 2X(132) X XSee (248). Note that this is also a special case of the result in equation 119.2.8Derivatives of Structured MatricesAssume that the matrix A has some structure, i.e. symmetric, toeplitz, etc.In that case the derivatives of the previous section does not apply in general.Instead, consider the following general rule for differentiating a scalar functionf (A)" # TX f Akldf f A Tr(133)dAij Akl Aij A AijklPetersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 14

2.8Derivatives of Structured Matrices2DERIVATIVESThe matrix differentiated with respect to itself is in this document referred toas the structure matrix of A and is defined simply by A Sij Aij(134)If A has no special structure we have simply Sij Jij , that is, the structurematrix is simply the single-entry matrix. Many structures have a representationin singleentry matrices, see Sec. 9.7.6 for more examples of structure matrices.2.8.1The Chain RuleSometimes the objective is to find the derivative of a matrix which is a functionof another matrix. Let U f (X), the goal is to find the derivative of thefunction g(U) with respect to X: g(f (X)) g(U) X X(135)Then the Chain Rule can then be written the following way:MN g(U) X X g(U) ukl g(U) X xij ukl xij(136)k 1 l 1Using matrix notation, this can be written as:h g(U) U i g(U))T Tr (. Xij U Xij2.8.2(137)SymmetricIf A is symmetric, then Sij Jij Jji Jij Jij and therefore T f f fdf diagdA A A A(138)That is, e.g., ([5]): Tr(AX) X det(X) X ln det(X) X2.8.3 A AT (A I), see (142)(139) det(X)(2X 1 (X 1 I))(140) 2X 1 (X 1 I)(141)DiagonalIf X is diagonal, then ([19]): Tr(AX) X A I(142)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 15

2.8Derivatives of Structured Matrices2.8.42DERIVATIVESToeplitzLike symmetric matrices and diagonal matrices also Toeplitz matrices has aspecial structure which should be taken into account when the derivative withrespect to a matrix with Toeplitz structure. Tr(AT) T Tr(TA) T(143)Tr(A)Tr([AT ]n1 )Tr([AT ]1n ))Tr(A)Tr([[AT ]1n ]n 1,2 ). .Tr([[AT ]1n ]2,n 1 ).A1n.···.···.Tr([[AT ]1n ]2,n 1 ).Tr([AT ]1n )) An1Tr([[AT ]1n ]n 1,2 )Tr([AT ]n1 )Tr(A) α(A)As it can be seen, the derivative α(A) also has a Toeplitz structure. Each valuein the diagonal is the sum of all the diagonal valued in A, the values in thediagonals next to the main diagonal equal the sum of the diagonal next to themain diagonal in AT . This result is only valid for the unconstrained Toeplitzmatrix. If the Toeplitz matrix also is symmetric, the same derivative yields Tr(TA) Tr(AT) α(A) α(A)T α(A) I T T(144)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 16

333.13.1.1INVERSESInversesBasicDefinitionThe inverse A 1 of a matrix A Cn n is defined such thatAA 1 A 1 A I,(145)where I is the n n identity matrix. If A 1 exists, A is said to be nonsingular.Otherwise, A is said to be singular (see e.g. [12]).3.1.2Cofactors and AdjointThe submatrix of a matrix A, denoted by [A]ij is a (n 1) (n 1) matrixobtained by deleting the ith row and the jth column of A. The (i, j) cofactorof a matrix is defined ascof(A, i, j) ( 1)i j det([A]ij ),The matrix of cofactors can be created from the cofactors cof(A, 1, 1)···cof(A, 1, n) .cof(A) .cof(A,i,j). cof(A, n, 1)···cof(A, n, n)(146)(147)The adjoint matrix is the transpose of the cofactor matrixadj(A) (cof(A))T ,3.1.3(148)DeterminantThe determinant of a matrix A Cn n is defined as (see [12])det(A) nX( 1)j 1 A1j det ([A]1j )j 1nXA1j cof(A, 1, j).(149)(150)j 13.1.4ConstructionThe inverse matrix can be constructed, using the adjoint matrix, byA 1 1· adj(A)det(A)(151)For the case of 2 2 matrices, see section 1.3.Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 17

3.2Exact Relations3.1.53INVERSESCondition numberThe condition number of a matrix c(A)

CONTENTS CONTENTS Notation and Nomenclature A Matrix A ij Matrix indexed for some purpose A i Matrix indexed for some purpose Aij Matrix indexed for some purpose An Matrix indexed for some purpose or The n.th power of a square matrix A 1 The inverse matrix of the matrix A A The pseudo inverse matrix of the matrix A (see Sec. 3.6) A1 2 The sq

Related Documents: