Matrix Calculus - CCRMA

2y ago

121 Views

2 Downloads

327.69 KB

23 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Aliana Wahl

Report this link

Download PDF

Transcription

Appendix DMatrix CalculusFrom too much study, and from extreme passion, cometh madnesse. Isaac Newton [205, §5]D.1D.1.1Gradient, Directional derivative, Taylor seriesGradientsGradient of a differentiable real function f (x) : RK R with respect to its vectorargument is defined uniquely in terms of partial derivatives f (x) , f (x) x1 f (x) x2. f (x) xK RK (2053)while the second-order gradient of the twice differentiable real function with respect to itsvector argument is traditionally called the Hessian ; interpreted 2 f (x) , 2f (x) x21 2f (x) x1 x2 2f (x) x2 x1 2f (x) x22 2f (x) xK x1 2f (x) xK x2.······.··· 2f (x) x1 xK 2f (x) x2 xK. 2f (x)2 xK SK ³³ (x)(x) f f x1 x2 2f (x) 2f (x) x1 x2 x2 x1 x2 x1Dattorro, Convex Optimization Euclidean Distance Geometry, Mεβoo, 2005, v2020.02.29.(2054)(2055)599

600APPENDIX D. MATRIX CALCULUSThe gradient of vector-valued function v(x) : R RN on real domain is a row vector v(x) ,h v1 (x) x v2 (x) xwhile the second-order gradient ish 2v1 (x) 2 v(x) , x2 2 v2 (x) x2i vN (x) x··· 2 vN (x) x2··· RN(2056)i(2057) RNGradient of vector-valued function h(x) : RK RN on vector domain is h(x) , h1 (x) x1 h2 (x) x1··· hN (x) x1 h1 (x) x2 h2 (x) x2··· hN (x) x2 h1 (x) xK h2 (x) xK··· hN (x) xK. (2058) [ h1 (x) h2 (x) · · · hN (x) ] RK Nwhile the second-order gradient has a three-dimensional written representation dubbedcubix ;D.11 (x) h x1 h1 (x) x2 2 h(x) , . .(x) h x1K 2 (x) h x12 (x) h x2.(x) h x2K··· hN (x) x1··· hN (x) x2··· hN (x) xK. (2059) 2 h1 (x) 2 h2 (x) · · · 2 hN (x) RK N Kwhere the gradient of each real entry is with respect to vector x as in (2053).The gradient of real function g(X) : RK L R on matrix domain is g(X) , g(X) X11 g(X) X12··· g(X) X1L g(X) X21 g(X) X22··· g(X) X2L g(X) XK1 g(X) XK2··· g(X) XKL. RK L (2060) X(:,1) g(X) X(:,2) g(X). RK 1 L X(:,L) g(X) where gradient X(:, i) is with respect to the i th column of X . The strange appearance of(2060) in RK 1 L is meant to suggest a third dimension perpendicular to the page (notD.1 The word matrix comes from the Latin for womb ; related to the prefix matri- derived from matermeaning mother.

D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES601a diagonal matrix). The second-order gradient has representation g(X) X11 g(X)· · · g(X) X12 X1L g(X) g(X) ··· X21 g(X) 2 X X222L g(X) , RK L K L. . g(X) g(X) g(X) XK1 XK2 · · · XKL(2061) X(:,1) g(X) X(:,2) g(X). X(:,L) g(X) RK 1 L K L where the gradient is with respect to matrix X .Gradient of vector-valued function g(X) : RK L RN on matrix domain is a cubix X(:,1) g1 (X) X(:,1) g2 (X) · · · X(:,1) gN (X) g(X) , X(:,2) g1 (X) X(:,2) g2 (X) · · · X(:,2) gN (X). X(:,L) g1 (X) X(:,L) g2 (X) · · · X(:,L) gN (X) (2062) [ g1 (X) g2 (X) · · · gN (X) ] RK N Lwhile the second-order gradient has a five-dimensional representation; X(:,1) g1 (X) X(:,1) g2 (X) · · · X(:,1) gN (X) 2 g(X) , X(:,2) g1 (X) X(:,2) g2 (X) · · · X(:,2) gN (X). X(:,L) g1 (X) X(:,L) g2 (X) · · · X(:,L) gN (X)(2063) 2 g1 (X) 2 g2 (X) · · · 2 gN (X) RK N L K LThe gradient of matrix-valued function g(X) : RK L RM N on matrix domain hasa four-dimensional representation called quartix (fourth-order tensor ) g11 (X) g12 (X) · · · g1N (X) g21 (X) g22 (X) · · · g2N (X) g(X) , (2064) RM N K L. . gM 1 (X) gM 2 (X) · · · gMN (X)while the second-order gradient has a six-dimensional representation 2 g11 (X) 2 g12 (X) · · · 2 g1N (X) 2 g21 (X) 2 g22 (X) · · · 2 g2N (X) 2 g(X) , RM N K L K L (2065). .222 gM 1 (X) gM 2 (X) · · · gMN (X)and so on.

602D.1.2APPENDIX D. MATRIX CALCULUSProduct rules for matrix-functionsGiven dimensionally compatible matrix-valued functions of matrix variable f (X) and g(X)¡ X f (X)T g(X) X (f ) g X (g) f(2066)while [65, §8.3] [420]³ ¡¡ ¡ X tr f (X)T g(X) X tr f (X)T g(Z ) tr g(X) f (Z )T Z X(2067)These expressions implicitly apply as well to scalar-, vector-, or matrix-valued functionsof scalar, vector, or matrix arguments.D.1.2.0.1 Example. Cubix.Suppose f (X) : R2 2 R2 X Ta and g(X) : R2 2 R2 Xb . We wish to find¡ X f (X)T g(X) X aTX 2 b(2068)using the product rule. Formula (2066) calls for X aTX 2 b X (X Ta) Xb X (Xb) X Ta(2069)Consider the first of the two terms: X (f ) g X (X Ta) Xb (X Ta)1 (X Ta)2 Xb(2070)The gradient of X Ta forms a cubix in R2 2 2 ; a.k.a, third-order tensor. T X (X a) Xb (X Ta)2 X11 (X Ta)1 X11JJJJJJ (X Ta)1 X12 (X Ta)1 X21JJJJJJJJJJJJ (X Ta)2 X12 (X Ta)2 X21 (X Ta)1 X22JJJJJJ (X Ta)2 X22(2071) (Xb)1 R2 1 2 (Xb) 2 Because gradient of the product (2068) requires total change with respect to change ineach entry of matrix X , the Xb vector must make an inner product with each vector inthat second dimension of the cubix indicated by dotted line segments; a10· 0a1 b1 X11 b2 X122 1 2T X (X a) Xb b1 X21 b2 X22 Ra200a2(2072)· a1 (b1 X11 b2 X12 ) a1 (b1 X21 b2 X22 ) R2 2a2 (b1 X11 b2 X12 ) a2 (b1 X21 b2 X22 ) abTX Twhere the cubix appears as a complete 2 2 2 matrix. In like manner for the secondterm X (g) f

603D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES b1 X (Xb) X a 0TThe solution0b2b10 X TabT R2 2 0 b2·X11 a1 X21 a2X12 a1 X22 a2 R2 1 2(2073) X aTX 2 b abTX T X TabT(2074)can be found from Table D.2.1 or verified using (2067).D.1.2.12Kronecker productA partial remedy for venturing into hyperdimensional matrix representations, such asthe cubix or quartix, is to first vectorize matrices as in (39). This device gives riseto the Kronecker product of matrices ; a.k.a, tensor product (kron() in Matlab).Although its definition sees reversal in the literature, [434, §2.1] Kronecker product is notcommutative (B A 6 A B). We adopt the definition: for A Rm n and B Rp q B11 A B12 A · · · B1q A B21 A B22 A · · · B2q A (2075)B A , Rpm q n. .Bp1 A Bp2 A · · · Bpq Afor which A 1 1 A A (real unity acts like Identity).One advantage to vectorization is existence of the traditional two-dimensional matrixrepresentation (second-order tensor ) for the second-order gradient of a real function withrespect to a vectorized matrix. From §A.1.1 no.36 (§D.2.1) for square A , B Rn n , forexample [220, §5.2] [15, §3]222nTTTTT vecX tr(AXBX ) vec X vec(X) (B A) vec X B A B A R n2(2076)To disadvantage is a large new but known set of algebraic rules (§A.1.1) and the factthat its mere use does not generally guarantee two-dimensional matrix representation ofgradients.Another application of the Kronecker product is to reverse order of appearance ina matrix product: Suppose we wish to weight the columns of a matrix S RM N , forexample, by respective entries wi from the main diagonal in w10. SNW , (2077)0wNA conventional means for accomplishing column weighting is to multiply S by diagonalmatrix W on the right side: w10 . S(: , 1)w1 · · · S(: , N )wN RM NS W S (2078).0wNTo reverse product order such that diagonal matrix W instead appears to the left of S :for I SM (Law) S(: , 1)00. 0S(: , 2) . . S W (δ(W )T I ) (2079) RM N. 000 S(: , N )

604APPENDIX D. MATRIX CALCULUSTo instead weight the rows of S via S(1 , :)0 0S(2 , :) WS . .0D.1.2.2diagonal matrix W SM , for I SN 0. . (δ(W ) I ) RM N. .00 S(M , :)(2080)Hadamard productFor any matrices of like size, S , Y RM N , Hadamard’s product denotes simplemultiplication of corresponding entries (.* in Matlab). It is possible to convert Hadamardproduct into a standard product of matrices: S(: , 1)00. 0S(: , 2) . . S Y δ(Y (: , 1)) · · · δ(Y (: , N )) RM N (2081). .000 S(: , N )In the special case that S s and Y y are vectors in RMD.1.3s y δ(s)y(2082)sT y ysTs y T sy T(2083)Chain rules for composite matrix-functionsGiven dimensionally compatible matrix-valued functions of matrix variable f (X) andg(X) [462, §15.7]¡ X g f (X)T X f T f g(2084)¡ ¡ 222TTT X g f (X) X X f f g X f f g X f f g X f(2085)D.1.3.1D.1.3.1.1Two arguments¡ X g f (X)T , h(X)T X f T f g X hT h gExample.Chain rule for two arguments.¡ Tg f (x)T , h(x)T (f (x) h(x)) A(f (x) h(x))· · x1εx1f (x) ,h(x) εx2x2· · ¡ 1 0ε 0 x g f (x)T , h(x)T (A AT )(f h) (A AT )(f h)0 ε0 1 µ· · ¶·¡ x1εx11 ε0(A AT ) x g f (x)T , h(x)T εx2x201 ε¡ TTTlim x g f (x) , h(x) (A A )xε 0from Table D.2.1.(2086)[51, §1.1](2087)(2088)(2089)(2090)(2091)2These foregoing formulae remain correct when gradient produces hyperdimensionalrepresentation:

D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIESD.1.4605First directional derivativeAssume that a differentiable function g(X) : RK L RM N has continuous first- andsecond-order gradients g and 2 g over dom g which is an open set. We seeksimple expressions for the first and second directional derivatives in direction Y RK L : Y Yrespectively, dg RM N and dg 2 RM N .Assuming that the limit exists, we may state the partial derivative of the mn th entryof g with respect to kl th entry of X ;gmn (X t ek eT gmn (X)l ) gmn (X) lim R t 0 Xkl t(2092)where ek is the k th standard basis vector in RK while el is the l th standard basis vector inRL . Total number of partial derivatives equals KLM N while the gradient is defined intheir terms; mn th entry of the gradient is gmn (X) gmn (X) X11 gmn (X) X12 gmn (X) X21 gmn (X) X22. gmn (X) XK1 gmn (X) XK2··· gmn (X) X1L··· gmn (X) X2L··· gmn (X) XKL. RK L (2093)while the gradient is a quartix g11 (X) g12 (X) g (X) g (X)2122 g(X) . . gM 1 (X) gM 2 (X)········· g1N (X) g2N (X). gMN (X) RM N K L (2094)By simply rotating our perspective of a four-dimensional representation of gradient matrix,we find one of three useful transpositions of this quartix (connoted T1 ): g(X)T1 g(X) X11 g(X) X12 g(X) X21 g(X) X22 g(X) XK1 g(X) XK2.··· g(X) X1L··· g(X) X2L··· g(X) XKL. RK L M N (2095)When a limit for t R exists, it is easy to show by substitution of variables in (2092)gmn (X t Ykl ek eT gmn (X)l ) gmn (X)Ykl lim R t 0 Xkl t(2096)which may be interpreted as the change in gmn at X when the change in Xkl is equalto Ykl the kl th entry of any Y RK L . Because the total change in gmn (X) due to Y isthe sum of change with respect to each and every Xkl , the mn th entry of the directionalderivative is the corresponding total differential [462, §15.8]

606APPENDIX D. MATRIX CALCULUSX gmn (X)dgmn (X) dX Y Xklk, lX k, l¡ Ykl tr gmn (X)T Y(2097)gmn (X t Ykl ek eTl ) gmn (X) t 0 t(2098)limgmn (X t Y ) gmn (X)lim t d gmn (X t Y ) dt t 0 (2099) t 0(2100)where t R . Assuming finite Y , equation (2099) is called the Gâteaux differential[50, App.A.5] [265, §D.2.1] [474, §5.28] whose existence is implied by existence of theFréchet differential (the sum in (2097)). [337, §7.2] Each may be understood as the changein gmn at X when the change in X is equal in magnitude and direction to Y .D.2 Hencethe directional derivative, dg11 (X) dg12 (X) · · · dg1N (X) dg21 (X) dg22 (X) · · · dg2N (X) Y dg (X) , RM N. . dg (X) dg (X) · · · dg (X) M1 M2MN¡ tr g11 (X)T Y¡ tr g21 (X)T Y.¡ tr gM 1 (X)T Y P g11 (X) XklYkl k, l P g (X)21 Ykl k, l X. kl . P gM 1 (X) Xkl Yklk, lfrom which it follows¡ tr g12 (X)T Y¡ tr g22 (X)T Y.¡ tr gM 2 (X)T YP g12 (X)k, l XklP g22 (X)k, lYkl XklYkl XklYkl.P gM 2 (X)k, l Ydg (X) ·········dX Y·········P g1N (X)k, l Xkl XklYkl (2101) Ykl Xkl k, l. . P gMN (X) Xkl YklP g2N (X)k, lX g(X)k, l¡ tr g1N (X)T Y¡ tr g2N (X)T Y.¡ tr gMN (X)T YYkl(2102)Yet for all X dom g , any Y RK L , and some open interval of t R Yg(X t Y ) g(X) t dg (X) O(t2 )(2103)which is the first-order multidimensional Taylor series expansion about X . [462, §18.4][203, §2.3.4] Differentiation with respect to t and subsequent t-zeroing isolates the secondterm of expansion. Thus differentiating and zeroing g(X t Y ) in t is an operationequivalent to individually differentiating and zeroing every entry gmn (X t Y ) as in(2100). So the directional derivative of g(X) : RK L RM N in any direction Y RK Levaluated at X dom g becomes Yd dg (X) g(X t Y ) RM N(2104)dt t 0D.2 AlthoughY is a matrix, we may regard it as a vector in RKL .

D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES T (α , f (α)) 607υf (α t y) υ , x f (α) x f (α)12 df(α) f (x) HFigure 216: Strictly convex quadratic bowl in R2 R ; f (x) xTx : R2 R versus xon some open disc in R2 . Plane slice H is perpendicular to function domain. Sliceintersection with domain connotes bidirectional vector y . Slope of tangent line T atpoint (α , f (α)) is value of directional derivative x f (α)Ty (2129) at α in slice direction y .Negative gradient x f (x) R2 is direction of steepest descent. [74, §9.4.1] [462, §15.6]3[203] [519] When υ R entry υ3 is half directional derivative in gradient direction· vectorυ1 x f (α) , then υ points directly toward bowl bottom.at α and whenυ2[371, §2.1, §5.4.5] [43, §6.3.1] which is simplest. In case of a real function g(X) : RK L R Y¡ dg (X) tr g(X)T Y(2126)In case g(X) : RK R Ydg (X) g(X)T Y(2129)Unlike gradient, directional derivative does not expand dimension; directionalderivative (2104) retains the dimensions of g . The derivative with respect to t makesthe directional derivative resemble ordinary calculus (§D.2); e.g, when g(X) is linear, Ydg (X) g(Y ). [337, §7.2]D.1.4.1Interpretation of directional derivativeIn the case of any differentiable real function g(X) : RK L R , the directional derivativeof g(X) at X in any direction Y yields the slope of g along the line {X t Y t R}through its domain evaluated at t 0. For higher-dimensional functions, by (2101), thisslope interpretation can be applied to each entry of the directional derivative.Figure 216, for example, shows a plane slice of a real convex bowl-shaped functionf (x) along a line {α t y t R} through its domain. The slice reveals a one-dimensionalreal function of t ; f (α t y). The directional derivative at x α in direction y is theslope of f (α t y) with respect to t at t 0. In the case of a real function havingvector argument h(X) : RK R , its directional derivative in the normalized direction ofits gradient is the gradient magnitude. (2129) For a real function of real variable, thedirectional derivative evaluated at any point in the function domain is just the slope ofthat function there scaled by the real direction. (confer §3.6)

608APPENDIX D. MATRIX CALCULUSDirectional derivative generalizes our one-dimensional notion of derivative to amultidimensional domain. When direction Y coincides with a member of the standardCartesian basis ek eTl (63), then a single partial derivative g(X)/ Xkl is obtained fromdirectional derivative (2102); such is each entry of gradient g(X) in equalities (2126)and (2129), for example.D.1.4.1.1 Theorem. Directional derivative optimality condition.[337, §7.4]Suppose f (X) : RK L R is minimized on convex set C RK L by X , and thedirectional derivative of f exists there. Then for all X C X X df (X) 0(2105) D.1.4.1.2 Example. Simple bowl.Bowl function (Figure 216)f (x) : RK R , (x a)T (x a) bhas function offset(2054) everywherequadratic f (x) hasanywhere in dom f(2106) b R , axis of revolution at x a , and positive definite Hessianin its domain (an open hyperdisc in RK ); id est, strictly convexunique global minimum equal to b at x a . A vector υ based R pointing toward the unique bowl-bottom is specified: ·x a RK R(2107)υ f (x) bSuch a vector is υ since the gradient is x f (x) x f (x)12 df(x) (2108) x f (x) 2(x a)(2109)and the directional derivative in direction of the gradient is (2129) x f (x)df(x) x f (x)T x f (x) 4(x a)T (x a) 4(f (x) b)(2110)2D.1.5Second directional derivativeBy similar argument, it so happens: the second directional derivative is equally simple.Given g(X) : RK L RM N on open domain, gmn (X) gmn (X) Xkl Xkl 2gmn (X) Xkl X11 2gmn (X) Xkl X12 2gmn (X) Xkl X21 2gmn (X) Xkl X22 2gmn (X) Xkl XK1 2gmn (X) Xkl XK2.········· 2gmn (X) Xkl X1L 2gmn (X) Xkl X2L. 2gmn (X) Xkl XKL RK L (2111)

D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES mn (X) g X11 gmn (X) 2 X21 gmn (X) . . mn (X) g XK1 mn (X) g X12mn (X) g X22.mn (X) g XK2 gmn (X) X11 gmn (X) X12 gmn (X) X21 gmn (X) X22 gmn (X) XK1 gmn (X) XK2.·········mn (X) g X1Lmn (X) g X2L.mn (X) g XKL gmn (X) X1L······ gmn (X) X2L··· gmn (X) XKL.609 RK L K L (2112) Rotating our perspective, we get several views of the second-order gradient: 2 g11 (X) 2 g12 (X) 2 g (X)21 2 g(X) . . 2 g22 (X). 2 gM 1 (X)2 g(X)2T2 g(X) ··· 2 gM 2 (X) g(X) X11 g(X) X21 . . g(X) XK1T1······ g(X) X12 g(X) X22. 2 g1N (X) 2 g2N (X). 2 gMN (X)······ g(X) XK2··· g(X) X11 g(X) X12 g(X) X21 g(X) X22··· g(X) XK1 g(X) XK2. RM N K L K L g(X) X1L g(X) X2L RK L M N K L. . g(X) XKL g(X) X1L··· g(X) X2L··· g(X) XKL. RK L K L M N (2113)(2114)(2115)Assuming the limits to exist, we may state the partial derivative of the mn th entry of gwith respect to kl th and ij th entries of X ; 2gmn (X) Xkl Xij lim³ gmn (X) Xij Xkl gmn (X t ek eTl ) gmn (X) X tij t 0 lim(gmn (X t ek eTl τ ei eTj ) gmn (X t ek eTl )) (gmn (X τ ei eTj ) gmn (X))(2116) τ t τ, t 0Differentiating (2096) and then scaling by Yij 2gmn (X) Xkl Xij Ykl Yij lim gmn (X t Ykl ek eTl ) gmn (X)Yij X tij t 0 lim(2117)(gmn (X t Ykl ek eTl τ Yij ei eTj ) gmn (X t Ykl ek eTl )) (gmn (X τ Yij ei eTj ) gmn (X)) τ, t 0 τ twhich can be proved by substitution of variables in (2116). The mn th second-order totaldifferential due to any Y RK L is

610APPENDIX D. MATRIX CALCULUSd 2gmn (X) dX Y ³X X 2gmn (X)¡ T Ykl Yij tr X tr gmn (X)T Y Y Xkl Xiji,j(2118)k, l Xi,j lim t 0lim gmn (X t Y ) gmn (X)Yij Xij t(2119)gmn (X 2 t Y ) 2gmn (X t Y ) gmn (X) t2(2120) t 0 2 d gmn (X t Y )dt2 t 0Hence the second directional derivative, 2d g11 (X) d 2g12 (X) d 2g21 (X) d 2g22 (X) Y dg 2(X) , . .d 2gM 1 (X)d 2gM 2 (X)(2121)·········d 2g1N (X)d 2g2N (X). RM N 2d gMN (X) dX Y³³³¡ T ¡ T ¡ T tr tr g12 (X)T Y Y · · · tr tr g1N (X)T Y Ytr tr g11 (X)T Y Y ³³³¡ T ¡ T ¡ T tr tr g21 (X)T Y Ytr tr g22 (X)T Y Y · · · tr tr g2N (X)T Y Y . . ³³¡ T ³¡ T ¡ T TTTtr tr gM 1 (X) Y Ytr tr gM 2 (X) Y Y · · · tr tr gMN (X) Y Y PP 2 g11 (X)Ykl Yij i,j k, l Xkl Xij P P 2g21 (X) Xkl Xij Ykl Yij i,j k, l. . P P 2gM 1 (X) Xkl Xij Ykl Yiji,j k, lPPi,j k, l 2g12 (X) Xkl Xij Ykl Yij2PP g22 (X) Xkl Xij Ykl YijPP 2gM 2 (X) Xkl Xij Ykl Yiji,j k, li,j k, l···.······PPi,j k, lPPi,j k, l 2g1N (X) Xkl Xij Ykl Yij2 g2N (X) Xkl Xij Ykl Yij.P P 2gMN (X)i,j k, l Xkl XijYkl Yij (2122)from which it follows Y2dg (X) X X 2g(X)X YYkl Yij dg (X) Yij Xkl Xij Xiji,ji,j(2123)k, lYet for all X dom g , any Y RK L , and some open interval of t R Yg(X t Y ) g(X) t dg (X) 1 2 Y2t dg (X) O(t3 )2!(2124)which is the second-order multidimensional Taylor series expansion about X . [462, §18.4][203, §2.3.4] Differentiating twice with respect to t and subsequent t-zeroing isolates thethird term of the expansion. Thus differentiating and zeroing g(X t Y ) in t is anoperation equivalent to individually differentiating and zeroing every entry gmn (X t Y )as in (2121). So the second directional derivative of g(X) : RK L RM N becomes[371, §2.1, §5.4.5] [43, §6.3.1] Yd 2 2(2125)dg (X) 2 g(X t Y ) RM Ndt t 0which is again simplest. (confer (2104)) Directional derivative retains the dimensions of g .

D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIESD.1.6611directional derivative expressionsIn the case of a real function g(X) : RK L R , all its directional derivatives are in R : Y¡ dg (X) tr g(X)T Y Y2(2126)µ¶³ Y¡ T TTdg (X) tr X tr g(X) Y Y tr X dg (X) YÃ!µ¶³ Y¡ T TT2Tdg (X) tr X tr X tr g(X) Y Y Y tr X dg (X) Y Y3(2127)(2128)In the case g(X) : RK R has vector argument, they further simplify: Ydg (X) g(X)T Y(2129) Y2dg (X) Y T 2 g(X)Y(2130)¡ Tdg (X) X Y T 2 g(X)Y Y(2131) Y3and so on.D.1.7higher-order multidimensional Taylor seriesSeries expansions of the differentiable matrix-valued function g(X) , of matrix argument,were given earlier in (2103) and (2124). Assume that g(X) has continuous first-, second-,and third-order gradients over open set dom g . Then, for X dom g and any Y RK L ,the Taylor series is expressed on some open interval of µ R Yg(X µY ) g(X) µ dg (X) 1 3 Y31 2 Y2µ dg (X) µ dg (X) O(µ4 )2!3!(2132)or on some open interval of kY k2 Y Xg(Y ) g(X) dg(X) 1 Y3 X1 Y2 Xdg (X) dg (X) O(kY k4 )2!3!(2133)which are third-order expansions about X . The mean value theorem from calculus is whatinsures finite order of the series. [462] [51, §1.1] [50, App.A.5] [265, §0.4] These somewhatunbelievable formulaeD.3 imply that a function can be determined over the whole of itsdomain by knowing its value and all its directional derivatives at a single point X .D.1.7.0.1 Example. Inverse-matrix function.Say g(Y ) Y 1 . From the table on page 616, Yd g(X t Y ) X 1 Y X 1dg (X) dt t 0(2134) Y2 d 2 dg (X) 2 g(X t Y ) 2X 1 Y X 1 Y X 1dt t 0(2135)2D.3 e.g, real continuous and differentiable function of real variable f (x) e 1/x has no Taylor seriesexpansion about x 0 , of any practical use, because each derivative equals 0 there.

612APPENDIX D. MATRIX CALCULUS Y3 d 3 dg (X) 3 g(X t Y ) 6X 1 Y X 1 Y X 1 Y X 1dt t 0(2136)Let’s find the Taylor series expansion of g about X I : Since g(I ) I , for kY k2 1(µ 1 in (2132))g(I Y ) (I Y ) 1 I Y Y 2 Y 3 . . .(2137)If Y is small, (I Y ) 1 I Y .D.4 Now we find Taylor series expansion about X :g(X Y ) (X Y ) 1 X 1 X 1 Y X 1 2X 1 Y X 1 Y X 1 . . .If Y is small, (X Y ) 1 X 1 X 1 Y X 1 .(2138)2D.1.7.0.2 Exercise. log det .(confer [74, p.644])Find the first three terms of a Taylor series expansion for log det Y . Specify an openinterval over which the expansion holds in vicinity of X .HD.1.8Correspondence of gradient to derivativeFrom the foregoing expressions for directional derivative, we derive a relationship betweengradient with respect to matrix X and derivative with respect to real variable t :D.1.8.1first-orderRemoving evaluation at t 0 from (2104),D.5 we find an expression for the directionalderivative of g(X) in direction Y evaluated anywhere along a line {X t Y t R}intersecting dom g Yd(2139)dg (X t Y ) g(X t Y )dtIn the general case g(X) : RK L RM N , from (2097) and (2100) we find¡ dtr X gmn (X t Y )T Y gmn (X t Y )dt(2140)which is valid at t 0 , of course, when X dom g . In the important case of a realfunction g(X) : RK L R , from (2126) we have simply¡ dtr X g(X t Y )T Y g(X t Y )dt(2141)When g(X) : RK R has vector argument, X g(X t Y )T Y D.4 Haddg(X t Y )dt(2142)we instead set g(Y ) (I Y ) 1 , then the equivalent expansion would have been about X 0.by replacing X with X t Y in (2097)-(2099); beginning,D.5 Justifieddgmn (X t Y ) dX Y X gmn (X t Y )Ykl Xklk, l

613D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIESD.1.8.1.1 Example. Gradient.g(X) wTX TXw , X RK L , w RL . Using the tables in §D.2,¡ ¡ tr X g(X t Y )T Y tr 2wwT(X T t Y T )YTTT 2w (X Y t Y Y )w(2143)(2144)Applying equivalence (2141),dd Tg(X t Y ) w (X t Y )T (X t Y )wdtdt ¡ wT X T Y Y TX 2t Y T Y wTTT 2w (X Y t Y Y )w(2145)(2146)(2147)which is the same as (2144). Hence, the equivalence is demonstrated.It is easy to extract g(X) from (2147) knowing only (2141):¡ tr X g(X t Y )T Y ¡ tr X g(X)T Y X g(X) 2wT¡(X T Y t Y T Y )w 2 tr wwT(X T t Y T )Y¡ 2 tr wwTX T Y(2148)2XwwT2D.1.8.2second-orderLikewise removing the evaluation at t 0 from (2125), Y2dg (X t Y ) d2g(X t Y )dt2(2149)we can find a similar relationship between second-order gradient and second derivative: Inthe general case g(X) : RK L RM N from (2118) and (2121),³¡ T d2tr X tr X gmn (X t Y )T Y Y 2 gmn (X t Y )dt(2150)In the case of a real function g(X) : RK L R we have, of course,³¡ T d2tr X tr X g(X t Y )T Y Y 2 g(X t Y )dt(2151)From (2130), the simpler case, where real function g(X) : RK R has vector argument,Y T X2 g(X t Y )Y d2g(X t Y )dt2(2152)D.1.8.2.1 Example. Second-order gradient.We want to find 2 g(X) RK K K K given real function g(X) log det X havingdomain intr SK . From the tables in §D.2,h(X) , g(X) X 1 intr SK (2153)

614APPENDIX D. MATRIX CALCULUSso 2 g(X) h(X). By (2140) and (2103), for Y SK ¡ d hmn (X t Y )tr hmn (X)T Y dt t 0¶µ d h(X tY) dt mnµ t 0¶d 1 (X t Y )dt t 0mn¡ X 1 Y X 1 mn(2154)(2155)(2156)(2157)K KSetting Y to a member of {ek eT k , l 1 . . . K } , and employing a property (41)l Rof the trace function we find¡ ¡ 1 2 g(X)mnkl tr hmn (X)T ek eT hmn (X)kl X 1 ek eT(2158)ll Xmn¡ 1 2 g(X)kl h(X)kl X 1 ek eT RK K(2159)l X2From all these first- and second-order expressions, we may generate new ones by evaluatingboth sides at arbitrary t (in some open interval) but only after differentiation.D.2Tables of gradients and derivatives Results may be validated numerically via Richardson extrapolation. [332, §5.4] [146]When algebraically proving results for symmetric matrices, it is critical to takegradients ignoring symmetry and to then substitute symmetric entries afterward.[220] [78] i, j , k , ℓ , K , L , m , n , M , N are integers, unless otherwise noted, a , b Rn , x , y Rk ,A , B Rm n , X , Y RK L , t , µ R . x µ means δ(δ(x)µ ) for µ R ; id est, entrywise vector exponentiation. δ is themain-diagonal linear operator (1681). x0 , 1 , X 0 , I if square. d dx1 ddx , .ddxk y y , dg(x) , dg 2(x) (directional derivatives §D.1), log x , e x , x , x/y (Hadamard quotient), sgn x , x (entrywise square root), etcetera, are mapskkf : R R that maintain dimension; e.g, (§A.1.1)d 1x, x 1T δ(x) 1 1dx(2160) For A a scalar or square matrix, we have the Taylor series [98, §3.6]eA ,Further, [440, §5.4] X1 kAk!(2161)k 0eA 0 A Sm(2162) For all square A and integer kdetk A det Ak(2163)

D.2. TABLES OF GRADIENTS AND DERIVATIVESD.2.1615algebraic x x x xT I Rk k x 1T x x xT 1 1 Rk X X X X T , I RK L K L(Identity) X 1TX 1 X 1TX T 1 11T RK LT x (Ax¡ T b) T A x x A b A x (Ax b)T(Ax b) 2AT(Ax b) x2p(Ax b)T(Ax b) 2ATA x (Ax b)T(Ax b) AT(Ax b)/kAx bk2 x kAx bk2 x z T Ax b AT δ(z) sgn(Ax b) , zi 6 0 (Ax b)i 6 0 x 1T Ax b AT sgn(Axµ b) x kAx¶ bk1 (y) x 1T f ( Ax b ) AT δ dfdysgn(Ax b) y Ax b ¡ x xTAx 2xTB y y TC y A AT x 2B y¡ x (x y)TA(x y) A AT (x y)¡ T2 x x Ax 2xTB y y TC y A AT¡ X aTXb X bTX Ta abT X aTX 2 b X T abT abT X T X aTX 1 b X T abT X T x aTxTxb 2xaT bconfer X 1 1(2095) X 1 ek eTX,l Xkl(2159) X aTX TXb X(abT baT ) x aTxxT b (abT baT )x X aTXX T b (abT baT )X x aTxTxa 2xaTa X aTX TXa 2XaaT x aTxxTa 2aaTx X aTXX T a 2aaT X x aTyxT b b aTy X aT Y X T b baT Y x aTy Tx b y bTa X aT Y TXb Y abT x aTxy T b a bTy X aTXY T b abT Y x aTxTy b y aT b X aTX T Y b Y baT X (X 1 )kl

616APPENDIX D. MATRIX CALCULUSalgebraic continuedddt (X tY ) YdTdt B (X dTdt B (X dTdt B (X t Y ) 1 A B T (X t Y ) 1 Y (X t Y ) 1 At Y ) TA B T (X t Y ) T Y T (X t Y ) TAt Y )µ A . . . , 1 µ 1 , X , Y SM d2B T (X dt2d3B T (X dt3t Y ) 1 A 1tY )2B T (X t Y ) 1 Y (X t Y ) 1 Y (X t Y ) 1 AA 6B T (X t Y ) 1 Y (X t Y ) 1 Y (X t Y ) 1 Y (X t Y ) 1 Ad(X t Y )TA(X t Y ) Y TAX dt ¡ d2(X t Y )TA(X t Y ) 2 Y TAYdt2¡ 1dTdt (X¡ t Y ) A(X t Y ) 1 TT¡ X TAY 2 t Y TAY¡ 1 (X t Y ) A(X t Y ) (Y AX X TAY 2 t Y TAY ) (X t Y )TA(X t Y )ddt ((X t Y )A(X t Y )) YAX XAY 2 t YAYd2((X t Y )A(X t Y )) 2 YAYdt2D.2.2trace Kronecker vec X tr(A XBX T ) vec X vec(X)T (B T A) vec X (B AT B T A) vec X22TTTTT vecX tr(A XBX ) vec X vec(X) (B A) vec X B A B A (2076)

617D.2. TABLES OF GRADIENTS AND DERIVATIVESD.2.3trace x µ x µI X tr µX X µ tr X µId 1 x 1T δ(x) 1 1 dxx x 2T 1 x 1 δ(x) y δ(x) 2 y X tr X 1 X 2T X tr(X 1 Y ) X tr(Y X 1 ) X T Y TX Td µdx x X tr X µ µX µ 1 , µx µ 1X SM X tr X j jX (j 1)T x (b aTx) 1 (b aTx) 2 a x (b aTx)µ µ(b aTx)µ 1 a x xTy x y Tx y x xTx 2x¡ ¡ T X tr (B AX) 1 (B AX) 2 A X tr(X T Y ) X tr(Y X T ) X tr(Y TX) X tr(XY T ) Y X tr(X TX ) X tr(XX T ) 2X X tr(AXBX T ) X tr(XBX TA) ATXB T AXB X tr(AXBX) X tr(XBXA) ATX TB T B TX TAT X tr(AXAXAXAX) X tr(AXAXAX) X tr(AXAX) X tr(AX) X tr(XAXAXAXA) X tr(XAXAXA) X tr(XAXA) X tr(XA) X tr(Y X k ) X tr(X k Y ) k 1P¡ X i Y X k 1 ii 04(AXAXAXA )T3(AXAXA )T2(AXA )TAT T X tr(X T Y Y TXX T Y Y TX) 4Y Y TXX T Y Y TX X tr(XY Y TX TXY Y TX T ) 4XY Y TX TXY Y T X tr(Y TXX T Y ) X tr(X T Y Y TX) 2Y Y TX X tr(Y

Matrix Calculus From too much study, and from extreme passion, cometh madnesse. Isaac Newton [205, § 5] D.1 Gradient, Directional derivative, Taylor series D.1.1 Gradients Gradient of a diﬀerentiable real function f(x) : RK R with respect to its vector argument i

Related Documents:

The Matrix Cookbook - Technical University of Denmark

CONTENTS CONTENTS Notation and Nomenclature A Matrix A ij Matrix indexed for some purpose A i Matrix indexed for some purpose Aij Matrix indexed for some purpose An Matrix indexed for some purpose or The n.th power of a square matrix A 1 The inverse matrix of the matrix A A The pseudo inverse matrix of the matrix A (see Sec. 3.6) A1 2 The square root of a matrix (if unique), not elementwise

104 Views

3y ago

The Matrix Cookbook - Massachusetts Institute of Technology

A Matrix A ij Matrix indexed for some purpose A i Matrix indexed for some purpose Aij Matrix indexed for some purpose An Matrix indexed for some purpose or The n.th power of a square matrix A 1 The inverse matrix of the matrix A A The pseudo inverse matrix of the matrix A (see Sec. 3.6) A1/2 The square root of a matrix (if unique), not .

20 Views

11m ago

The Matrix Cookbook - University of Waterloo

CONTENTS CONTENTS Notation and Nomenclature A Matrix Aij Matrix indexed for some purpose Ai Matrix indexed for some purpose Aij Matrix indexed for some purpose An Matrix indexed for some purpose or The n.th power of a square matrix A 1 The inverse matrix of the matrix A A The pseudo inverse matrix of the matrix A (see Sec. 3.6) A1/2 The square root of a matrix (if unique), not elementwise

10 Views

4m ago

The Matrix Cookbook - Mathematics

77 Views

2y ago

2019-2020 BOOK LIST: Books in RED are NEW this year

5500 AP Calculus AB 9780133311617 Advanced Placement Calculus: Graphical Numerical Algebraic 5511 AP Calculus BC n/a Independent 5495 Calculus 9781133112297 Essential Calculus 5495 Calculus - optional - ebook 9780357700013 Essential Calculus Optional ebook . 9780134296012 Campbell Biology 9th Ed

86 Views

2y ago

Webwork Answers Calculus 1

webwork answers calculus, webwork answers calculus 2, webwork answers calculus 3, webwork solutions calculus 3, webwork answer key calculus In Algebra and Geometry, the WebWork assignment must be finished by 10pm . Students in our Calculus 1-3

65 Views

2y ago

Further Maths Matrix Summary

Further Maths Matrix Summary 1 Further Maths Matrix Summary A matrix is a rectangular array of numbers arranged in rows and columns. The numbers in a matrix are called the elements of the matrix. The order of a matrix is the number of rows and columns in the matrix. Example 1 [is a ] 3 by 2 or matrix as it has 3 rows and 2 columns. Matrices are .

132 Views

3y ago

KURIKULUM PROGRAM STUDI S-1 AKUNTANSI: ANTISIPASI CA

S1 Akuntansi Pendidikan Profesi: PPAk S2 Magister Science, Magister Terapan S3 Ilmu Akuntansi Pendidikan IAI: KAPd. dan KASP Asosiasi Profesi Akuntansi: IAPI dan IAMI Asosiasi Profesi lain terkait akuntansi dan Internasional –Internal Auditor, CISA, ACCA, CMA, CIMA, CPA Negara lain Asosiasi Profesi PPAJP Kemenkeu Kemendiknas - DIKTI BNSP OJK Internasional .

93 Views

3y ago

Recent Views

Dear Members of the Harvard Community,

Life science graduate education at Harvard is comprised of 14 Ph.D. programs of study across four Harvard faculties—Harvard Faculty of Arts and Sciences, Harvard T. H. Chan School of Public Health, Harvard Medical School, and Harvard School of Dental Medicine. These 14 programs make up the Harvard Integrated Life Sciences (HILS).

3y ago

182 Views

Xavier Du Maine, Lara Roach, Perspectives - Harvard University

Sciences at Harvard University Richard A. and Susan F. Smith Campus Center 1350 Massachusetts Avenue, Suite 350 Cambridge, MA 02138 617-495-5315 gsas.harvard.edu Office of Diversity and Minority Affairs minrec@fas.harvard.edu gsas.harvard.edu/diversity Office of Admissions and Financial Aid admiss@fas.harvard.edu gsas.harvard.edu/apply

1y ago

146 Views

PROGRAM ON CRISIS LEADERSHIP - Harvard Kennedy School

Harvard Kennedy School Arnold M. Howitt Harvard Kennedy School Philip B. Heymann Harvard Law School April 2014 An earlier version of this white paper provided background for an expert dialogue on lessons learned from the events of the Boston Marathon bombing that was held at the John F. Kennedy School of Government at Harvard

2y ago

330 Views

Harvard Law School - WordPress

Law & Business, Harvard Law School, and H. Douglas Weaver Professor of Business Law. Harvard Business School. 10.30-10.55h. 13th Lecture "Cross-border Insolvency: the New European Regime". Pedro de Miguel Asensio. Full Professor of Private International Law. UCM. 11.00-12.00h. Round Table. "Latest reforms and tendencies on Insolvency Law".

1y ago

145 Views

Harvard Buildings Emergency Phones Harvard University .

Faculty of Arts and Sciences, Harvard University Class of 2018 LEGEND Harvard Buildings Emergency Phones Harvard University Police Department Designated Pathways Harvard Shuttle Bus Stops l e s R i v e r a C h r YOKE ST YMOR E DRIVE BEACON STREET OXFORD ST VENUE CAMBRIDGE STREET KIRKLAND STREET AUBURN STREET VE MEMORIAL

3y ago

171 Views

THE FIRST CENTURY OF THE AMERICAN . - Princeton

Harvard University Press, 1935) and Harvard College in the Seventeenth Century (Cambridge: Harvard University Press, 1936). Quotes, Founding of Harvard, 168, 449. These works are summarized in Three Centuries of Harvard (Cambridge: Harvard U

2y ago

225 Views

Catherine G. Barrera HARVARD UNIVERSITY

danbjork@fas.harvard.edu HARVARD UNIVERSITY Placement Director: Gita Gopinath GOPINATH@HARVARD.EDU 617-495-8161 Placement Director: Nathan Nunn NNUNN@FAS.HARVARD.EDU 617-496-4958 Graduate Administrator: Brenda Piquet BPIQUET@FAS.HARVARD.EDU 617-495-8927 Office Contact Information Department of Economics

2y ago

363 Views

SEAS Lab Safety Officer Orientation

Kuan ebrandin@harvard.edu akuan@fas.harvard.edu Donhee Ham MD B129, MDB132 Dongwan Ha dha@seas.harvard.edu Lene Hau Cruft 112-116 Danny Kim dannykim@seas.harvard.edu Robert Howe 60 Oxford, 312-317,319-321 Paul Loschak loschak@seas.harvard.edu Evelyn Hu McKay 222,226,232 Kathryn Greenberg greenber@fas.harvard.edu

2y ago

359 Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

745 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

496 Views

Dangerous Defendants - Yale Law Journal

Law School, Louisiana State University Paul M. Hebert Law Center, Roger Williams University School of Law, Rutgers Law School, Sandra Day O'Connor College of Law, Southern Methodist University Dedman School of Law, University of Georgia School of Law, and University of Utah S.J. Quinney College of Law. For institutional support, I am grateful .

1y ago

169 Views

EMPLOYER GUIDE - Harvard Kennedy School

HARVARD KENNEDY SCHOOL EMPLOYER GUIDE At Harvard Kennedy School, our students are being trained in public policy . Post in the HKS JACK job bank or send to HKS_Career@hks.harvard.edu 2. Browse our resume book to identify a student with the skills and experience you need 3. Visit us and meet our talent HARVARD

2y ago

138 Views

2008-2009 FACT BOOK - Harvard University

Harvard Business School Harvard Medical School Harvard Faculty of Arts and Sciences Harvard School of Public . Publishing Division Joint Center for Housing Studies American Repertory Theatre . WIDE is the Wide-scale Interactive Development for Educators. (5) The Nanoscale Science and Engineering Center is a joint program with M.I.T., U.C.S .

2y ago

364 Views

HARVARD UNIVERSITY 2007-08

Harvard Business School Harvard Medical School Harvard Faculty of Arts and Sciences Harvard School of Public . Publishing Division Joint Center for Housing Studies* American Repertory Theatre . WIDE is the Wide-scale Interactive Development for Educators. (5) The Nanoscale Science and Engineering Center is a joint program with M.I.T., U.C.S .

2y ago

314 Views

ANNA BRADY - Harvard University

Jun 02, 2008 · ANNA BRADY 12 Oxford Street Apt. 9 Cambridge, MA 02138 (617) 495-3108 abrady@jd11.law.harvard.edu EDUCATION HARVARD LAW SCHOOL, Candidate for J.D., June 2011 Activities: Harvard Civil Rights-Civil Liberties Law Review UNIVERSITY OF CHICAGO, B.A. i

2y ago

137 Views

Matrix Calculus - CCRMA

It looks like you're using an ad-blocker