Matrix Calculus - CCRMA

2y ago
121 Views
2 Downloads
327.69 KB
23 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Aliana Wahl
Transcription

Appendix DMatrix CalculusFrom too much study, and from extreme passion, cometh madnesse. Isaac Newton [205, §5]D.1D.1.1Gradient, Directional derivative, Taylor seriesGradientsGradient of a differentiable real function f (x) : RK R with respect to its vectorargument is defined uniquely in terms of partial derivatives f (x) , f (x) x1 f (x) x2. f (x) xK RK (2053)while the second-order gradient of the twice differentiable real function with respect to itsvector argument is traditionally called the Hessian ; interpreted 2 f (x) , 2f (x) x21 2f (x) x1 x2 2f (x) x2 x1 2f (x) x22 2f (x) xK x1 2f (x) xK x2.······.··· 2f (x) x1 xK 2f (x) x2 xK. 2f (x)2 xK SK ³³ (x)(x) f f x1 x2 2f (x) 2f (x) x1 x2 x2 x1 x2 x1Dattorro, Convex Optimization Euclidean Distance Geometry, Mεβoo, 2005, v2020.02.29.(2054)(2055)599

600APPENDIX D. MATRIX CALCULUSThe gradient of vector-valued function v(x) : R RN on real domain is a row vector v(x) ,h v1 (x) x v2 (x) xwhile the second-order gradient ish 2v1 (x) 2 v(x) , x2 2 v2 (x) x2i vN (x) x··· 2 vN (x) x2··· RN(2056)i(2057) RNGradient of vector-valued function h(x) : RK RN on vector domain is h(x) , h1 (x) x1 h2 (x) x1··· hN (x) x1 h1 (x) x2 h2 (x) x2··· hN (x) x2 h1 (x) xK h2 (x) xK··· hN (x) xK. (2058) [ h1 (x) h2 (x) · · · hN (x) ] RK Nwhile the second-order gradient has a three-dimensional written representation dubbedcubix ;D.11 (x) h x1 h1 (x) x2 2 h(x) , . .(x) h x1K 2 (x) h x12 (x) h x2.(x) h x2K··· hN (x) x1··· hN (x) x2··· hN (x) xK. (2059) 2 h1 (x) 2 h2 (x) · · · 2 hN (x) RK N Kwhere the gradient of each real entry is with respect to vector x as in (2053).The gradient of real function g(X) : RK L R on matrix domain is g(X) , g(X) X11 g(X) X12··· g(X) X1L g(X) X21 g(X) X22··· g(X) X2L g(X) XK1 g(X) XK2··· g(X) XKL. RK L (2060) X(:,1) g(X) X(:,2) g(X). RK 1 L X(:,L) g(X) where gradient X(:, i) is with respect to the i th column of X . The strange appearance of(2060) in RK 1 L is meant to suggest a third dimension perpendicular to the page (notD.1 The word matrix comes from the Latin for womb ; related to the prefix matri- derived from matermeaning mother.

D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES601a diagonal matrix). The second-order gradient has representation g(X) X11 g(X)· · · g(X) X12 X1L g(X) g(X) ··· X21 g(X) 2 X X222L g(X) , RK L K L. . g(X) g(X) g(X) XK1 XK2 · · · XKL(2061) X(:,1) g(X) X(:,2) g(X). X(:,L) g(X) RK 1 L K L where the gradient is with respect to matrix X .Gradient of vector-valued function g(X) : RK L RN on matrix domain is a cubix X(:,1) g1 (X) X(:,1) g2 (X) · · · X(:,1) gN (X) g(X) , X(:,2) g1 (X) X(:,2) g2 (X) · · · X(:,2) gN (X). X(:,L) g1 (X) X(:,L) g2 (X) · · · X(:,L) gN (X) (2062) [ g1 (X) g2 (X) · · · gN (X) ] RK N Lwhile the second-order gradient has a five-dimensional representation; X(:,1) g1 (X) X(:,1) g2 (X) · · · X(:,1) gN (X) 2 g(X) , X(:,2) g1 (X) X(:,2) g2 (X) · · · X(:,2) gN (X). X(:,L) g1 (X) X(:,L) g2 (X) · · · X(:,L) gN (X)(2063) 2 g1 (X) 2 g2 (X) · · · 2 gN (X) RK N L K LThe gradient of matrix-valued function g(X) : RK L RM N on matrix domain hasa four-dimensional representation called quartix (fourth-order tensor ) g11 (X) g12 (X) · · · g1N (X) g21 (X) g22 (X) · · · g2N (X) g(X) , (2064) RM N K L. . gM 1 (X) gM 2 (X) · · · gMN (X)while the second-order gradient has a six-dimensional representation 2 g11 (X) 2 g12 (X) · · · 2 g1N (X) 2 g21 (X) 2 g22 (X) · · · 2 g2N (X) 2 g(X) , RM N K L K L (2065). .222 gM 1 (X) gM 2 (X) · · · gMN (X)and so on.

602D.1.2APPENDIX D. MATRIX CALCULUSProduct rules for matrix-functionsGiven dimensionally compatible matrix-valued functions of matrix variable f (X) and g(X)¡ X f (X)T g(X) X (f ) g X (g) f(2066)while [65, §8.3] [420]³ ¡¡ ¡ X tr f (X)T g(X) X tr f (X)T g(Z ) tr g(X) f (Z )T Z X(2067)These expressions implicitly apply as well to scalar-, vector-, or matrix-valued functionsof scalar, vector, or matrix arguments.D.1.2.0.1 Example. Cubix.Suppose f (X) : R2 2 R2 X Ta and g(X) : R2 2 R2 Xb . We wish to find¡ X f (X)T g(X) X aTX 2 b(2068)using the product rule. Formula (2066) calls for X aTX 2 b X (X Ta) Xb X (Xb) X Ta(2069)Consider the first of the two terms: X (f ) g X (X Ta) Xb (X Ta)1 (X Ta)2 Xb(2070)The gradient of X Ta forms a cubix in R2 2 2 ; a.k.a, third-order tensor. T X (X a) Xb (X Ta)2 X11 (X Ta)1 X11JJJJJJ (X Ta)1 X12 (X Ta)1 X21JJJJJJJJJJJJ (X Ta)2 X12 (X Ta)2 X21 (X Ta)1 X22JJJJJJ (X Ta)2 X22(2071) (Xb)1 R2 1 2 (Xb) 2 Because gradient of the product (2068) requires total change with respect to change ineach entry of matrix X , the Xb vector must make an inner product with each vector inthat second dimension of the cubix indicated by dotted line segments; a10· 0a1 b1 X11 b2 X122 1 2T X (X a) Xb b1 X21 b2 X22 Ra200a2(2072)· a1 (b1 X11 b2 X12 ) a1 (b1 X21 b2 X22 ) R2 2a2 (b1 X11 b2 X12 ) a2 (b1 X21 b2 X22 ) abTX Twhere the cubix appears as a complete 2 2 2 matrix. In like manner for the secondterm X (g) f

603D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES b1 X (Xb) X a 0TThe solution0b2b10 X TabT R2 2 0 b2·X11 a1 X21 a2X12 a1 X22 a2 R2 1 2(2073) X aTX 2 b abTX T X TabT(2074)can be found from Table D.2.1 or verified using (2067).D.1.2.12Kronecker productA partial remedy for venturing into hyperdimensional matrix representations, such asthe cubix or quartix, is to first vectorize matrices as in (39). This device gives riseto the Kronecker product of matrices ; a.k.a, tensor product (kron() in Matlab).Although its definition sees reversal in the literature, [434, §2.1] Kronecker product is notcommutative (B A 6 A B). We adopt the definition: for A Rm n and B Rp q B11 A B12 A · · · B1q A B21 A B22 A · · · B2q A (2075)B A , Rpm q n. .Bp1 A Bp2 A · · · Bpq Afor which A 1 1 A A (real unity acts like Identity).One advantage to vectorization is existence of the traditional two-dimensional matrixrepresentation (second-order tensor ) for the second-order gradient of a real function withrespect to a vectorized matrix. From §A.1.1 no.36 (§D.2.1) for square A , B Rn n , forexample [220, §5.2] [15, §3]222nTTTTT vecX tr(AXBX ) vec X vec(X) (B A) vec X B A B A R n2(2076)To disadvantage is a large new but known set of algebraic rules (§A.1.1) and the factthat its mere use does not generally guarantee two-dimensional matrix representation ofgradients.Another application of the Kronecker product is to reverse order of appearance ina matrix product: Suppose we wish to weight the columns of a matrix S RM N , forexample, by respective entries wi from the main diagonal in w10. SNW , (2077)0wNA conventional means for accomplishing column weighting is to multiply S by diagonalmatrix W on the right side: w10 . S(: , 1)w1 · · · S(: , N )wN RM NS W S (2078).0wNTo reverse product order such that diagonal matrix W instead appears to the left of S :for I SM (Law) S(: , 1)00. 0S(: , 2) . . S W (δ(W )T I ) (2079) RM N. 000 S(: , N )

604APPENDIX D. MATRIX CALCULUSTo instead weight the rows of S via S(1 , :)0 0S(2 , :) WS . .0D.1.2.2diagonal matrix W SM , for I SN 0. . (δ(W ) I ) RM N. .00 S(M , :)(2080)Hadamard productFor any matrices of like size, S , Y RM N , Hadamard’s product denotes simplemultiplication of corresponding entries (.* in Matlab). It is possible to convert Hadamardproduct into a standard product of matrices: S(: , 1)00. 0S(: , 2) . . S Y δ(Y (: , 1)) · · · δ(Y (: , N )) RM N (2081). .000 S(: , N )In the special case that S s and Y y are vectors in RMD.1.3s y δ(s)y(2082)sT y ysTs y T sy T(2083)Chain rules for composite matrix-functionsGiven dimensionally compatible matrix-valued functions of matrix variable f (X) andg(X) [462, §15.7]¡ X g f (X)T X f T f g(2084)¡ ¡ 222TTT X g f (X) X X f f g X f f g X f f g X f(2085)D.1.3.1D.1.3.1.1Two arguments¡ X g f (X)T , h(X)T X f T f g X hT h gExample.Chain rule for two arguments.¡ Tg f (x)T , h(x)T (f (x) h(x)) A(f (x) h(x))· · x1εx1f (x) ,h(x) εx2x2· · ¡ 1 0ε 0 x g f (x)T , h(x)T (A AT )(f h) (A AT )(f h)0 ε0 1 µ· · ¶·¡ x1εx11 ε0(A AT ) x g f (x)T , h(x)T εx2x201 ε¡ TTTlim x g f (x) , h(x) (A A )xε 0from Table D.2.1.(2086)[51, §1.1](2087)(2088)(2089)(2090)(2091)2These foregoing formulae remain correct when gradient produces hyperdimensionalrepresentation:

D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIESD.1.4605First directional derivativeAssume that a differentiable function g(X) : RK L RM N has continuous first- andsecond-order gradients g and 2 g over dom g which is an open set. We seeksimple expressions for the first and second directional derivatives in direction Y RK L : Y Yrespectively, dg RM N and dg 2 RM N .Assuming that the limit exists, we may state the partial derivative of the mn th entryof g with respect to kl th entry of X ;gmn (X t ek eT gmn (X)l ) gmn (X) lim R t 0 Xkl t(2092)where ek is the k th standard basis vector in RK while el is the l th standard basis vector inRL . Total number of partial derivatives equals KLM N while the gradient is defined intheir terms; mn th entry of the gradient is gmn (X) gmn (X) X11 gmn (X) X12 gmn (X) X21 gmn (X) X22. gmn (X) XK1 gmn (X) XK2··· gmn (X) X1L··· gmn (X) X2L··· gmn (X) XKL. RK L (2093)while the gradient is a quartix g11 (X) g12 (X) g (X) g (X)2122 g(X) . . gM 1 (X) gM 2 (X)········· g1N (X) g2N (X). gMN (X) RM N K L (2094)By simply rotating our perspective of a four-dimensional representation of gradient matrix,we find one of three useful transpositions of this quartix (connoted T1 ): g(X)T1 g(X) X11 g(X) X12 g(X) X21 g(X) X22 g(X) XK1 g(X) XK2.··· g(X) X1L··· g(X) X2L··· g(X) XKL. RK L M N (2095)When a limit for t R exists, it is easy to show by substitution of variables in (2092)gmn (X t Ykl ek eT gmn (X)l ) gmn (X)Ykl lim R t 0 Xkl t(2096)which may be interpreted as the change in gmn at X when the change in Xkl is equalto Ykl the kl th entry of any Y RK L . Because the total change in gmn (X) due to Y isthe sum of change with respect to each and every Xkl , the mn th entry of the directionalderivative is the corresponding total differential [462, §15.8]

606APPENDIX D. MATRIX CALCULUSX gmn (X)dgmn (X) dX Y Xklk, lX k, l¡ Ykl tr gmn (X)T Y(2097)gmn (X t Ykl ek eTl ) gmn (X) t 0 t(2098)limgmn (X t Y ) gmn (X)lim t d gmn (X t Y ) dt t 0 (2099) t 0(2100)where t R . Assuming finite Y , equation (2099) is called the Gâteaux differential[50, App.A.5] [265, §D.2.1] [474, §5.28] whose existence is implied by existence of theFréchet differential (the sum in (2097)). [337, §7.2] Each may be understood as the changein gmn at X when the change in X is equal in magnitude and direction to Y .D.2 Hencethe directional derivative, dg11 (X) dg12 (X) · · · dg1N (X) dg21 (X) dg22 (X) · · · dg2N (X) Y dg (X) , RM N. . dg (X) dg (X) · · · dg (X) M1 M2MN¡ tr g11 (X)T Y¡ tr g21 (X)T Y.¡ tr gM 1 (X)T Y P g11 (X) XklYkl k, l P g (X)21 Ykl k, l X. kl . P gM 1 (X) Xkl Yklk, lfrom which it follows¡ tr g12 (X)T Y¡ tr g22 (X)T Y.¡ tr gM 2 (X)T YP g12 (X)k, l XklP g22 (X)k, lYkl XklYkl XklYkl.P gM 2 (X)k, l Ydg (X) ·········dX Y·········P g1N (X)k, l Xkl XklYkl (2101) Ykl Xkl k, l. . P gMN (X) Xkl YklP g2N (X)k, lX g(X)k, l¡ tr g1N (X)T Y¡ tr g2N (X)T Y.¡ tr gMN (X)T YYkl(2102)Yet for all X dom g , any Y RK L , and some open interval of t R Yg(X t Y ) g(X) t dg (X) O(t2 )(2103)which is the first-order multidimensional Taylor series expansion about X . [462, §18.4][203, §2.3.4] Differentiation with respect to t and subsequent t-zeroing isolates the secondterm of expansion. Thus differentiating and zeroing g(X t Y ) in t is an operationequivalent to individually differentiating and zeroing every entry gmn (X t Y ) as in(2100). So the directional derivative of g(X) : RK L RM N in any direction Y RK Levaluated at X dom g becomes Yd dg (X) g(X t Y ) RM N(2104)dt t 0D.2 AlthoughY is a matrix, we may regard it as a vector in RKL .

D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES T (α , f (α)) 607υf (α t y) υ , x f (α) x f (α)12 df(α) f (x) HFigure 216: Strictly convex quadratic bowl in R2 R ; f (x) xTx : R2 R versus xon some open disc in R2 . Plane slice H is perpendicular to function domain. Sliceintersection with domain connotes bidirectional vector y . Slope of tangent line T atpoint (α , f (α)) is value of directional derivative x f (α)Ty (2129) at α in slice direction y .Negative gradient x f (x) R2 is direction of steepest descent. [74, §9.4.1] [462, §15.6]3[203] [519] When υ R entry υ3 is half directional derivative in gradient direction· vectorυ1 x f (α) , then υ points directly toward bowl bottom.at α and whenυ2[371, §2.1, §5.4.5] [43, §6.3.1] which is simplest. In case of a real function g(X) : RK L R Y¡ dg (X) tr g(X)T Y(2126)In case g(X) : RK R Ydg (X) g(X)T Y(2129)Unlike gradient, directional derivative does not expand dimension; directionalderivative (2104) retains the dimensions of g . The derivative with respect to t makesthe directional derivative resemble ordinary calculus (§D.2); e.g, when g(X) is linear, Ydg (X) g(Y ). [337, §7.2]D.1.4.1Interpretation of directional derivativeIn the case of any differentiable real function g(X) : RK L R , the directional derivativeof g(X) at X in any direction Y yields the slope of g along the line {X t Y t R}through its domain evaluated at t 0. For higher-dimensional functions, by (2101), thisslope interpretation can be applied to each entry of the directional derivative.Figure 216, for example, shows a plane slice of a real convex bowl-shaped functionf (x) along a line {α t y t R} through its domain. The slice reveals a one-dimensionalreal function of t ; f (α t y). The directional derivative at x α in direction y is theslope of f (α t y) with respect to t at t 0. In the case of a real function havingvector argument h(X) : RK R , its directional derivative in the normalized direction ofits gradient is the gradient magnitude. (2129) For a real function of real variable, thedirectional derivative evaluated at any point in the function domain is just the slope ofthat function there scaled by the real direction. (confer §3.6)

608APPENDIX D. MATRIX CALCULUSDirectional derivative generalizes our one-dimensional notion of derivative to amultidimensional domain. When direction Y coincides with a member of the standardCartesian basis ek eTl (63), then a single partial derivative g(X)/ Xkl is obtained fromdirectional derivative (2102); such is each entry of gradient g(X) in equalities (2126)and (2129), for example.D.1.4.1.1 Theorem. Directional derivative optimality condition.[337, §7.4]Suppose f (X) : RK L R is minimized on convex set C RK L by X , and thedirectional derivative of f exists there. Then for all X C X X df (X) 0(2105) D.1.4.1.2 Example. Simple bowl.Bowl function (Figure 216)f (x) : RK R , (x a)T (x a) bhas function offset(2054) everywherequadratic f (x) hasanywhere in dom f(2106) b R , axis of revolution at x a , and positive definite Hessianin its domain (an open hyperdisc in RK ); id est, strictly convexunique global minimum equal to b at x a . A vector υ based R pointing toward the unique bowl-bottom is specified: ·x a RK R(2107)υ f (x) bSuch a vector is υ since the gradient is x f (x) x f (x)12 df(x) (2108) x f (x) 2(x a)(2109)and the directional derivative in direction of the gradient is (2129) x f (x)df(x) x f (x)T x f (x) 4(x a)T (x a) 4(f (x) b)(2110)2D.1.5Second directional derivativeBy similar argument, it so happens: the second directional derivative is equally simple.Given g(X) : RK L RM N on open domain, gmn (X) gmn (X) Xkl Xkl 2gmn (X) Xkl X11 2gmn (X) Xkl X12 2gmn (X) Xkl X21 2gmn (X) Xkl X22 2gmn (X) Xkl XK1 2gmn (X) Xkl XK2.········· 2gmn (X) Xkl X1L 2gmn (X) Xkl X2L. 2gmn (X) Xkl XKL RK L (2111)

D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIES mn (X) g X11 gmn (X) 2 X21 gmn (X) . . mn (X) g XK1 mn (X) g X12mn (X) g X22.mn (X) g XK2 gmn (X) X11 gmn (X) X12 gmn (X) X21 gmn (X) X22 gmn (X) XK1 gmn (X) XK2.·········mn (X) g X1Lmn (X) g X2L.mn (X) g XKL gmn (X) X1L······ gmn (X) X2L··· gmn (X) XKL.609 RK L K L (2112) Rotating our perspective, we get several views of the second-order gradient: 2 g11 (X) 2 g12 (X) 2 g (X)21 2 g(X) . . 2 g22 (X). 2 gM 1 (X)2 g(X)2T2 g(X) ··· 2 gM 2 (X) g(X) X11 g(X) X21 . . g(X) XK1T1······ g(X) X12 g(X) X22. 2 g1N (X) 2 g2N (X). 2 gMN (X)······ g(X) XK2··· g(X) X11 g(X) X12 g(X) X21 g(X) X22··· g(X) XK1 g(X) XK2. RM N K L K L g(X) X1L g(X) X2L RK L M N K L. . g(X) XKL g(X) X1L··· g(X) X2L··· g(X) XKL. RK L K L M N (2113)(2114)(2115)Assuming the limits to exist, we may state the partial derivative of the mn th entry of gwith respect to kl th and ij th entries of X ; 2gmn (X) Xkl Xij lim³ gmn (X) Xij Xkl gmn (X t ek eTl ) gmn (X) X tij t 0 lim(gmn (X t ek eTl τ ei eTj ) gmn (X t ek eTl )) (gmn (X τ ei eTj ) gmn (X))(2116) τ t τ, t 0Differentiating (2096) and then scaling by Yij 2gmn (X) Xkl Xij Ykl Yij lim gmn (X t Ykl ek eTl ) gmn (X)Yij X tij t 0 lim(2117)(gmn (X t Ykl ek eTl τ Yij ei eTj ) gmn (X t Ykl ek eTl )) (gmn (X τ Yij ei eTj ) gmn (X)) τ, t 0 τ twhich can be proved by substitution of variables in (2116). The mn th second-order totaldifferential due to any Y RK L is

610APPENDIX D. MATRIX CALCULUSd 2gmn (X) dX Y ³X X 2gmn (X)¡ T Ykl Yij tr X tr gmn (X)T Y Y Xkl Xiji,j(2118)k, l Xi,j lim t 0lim gmn (X t Y ) gmn (X)Yij Xij t(2119)gmn (X 2 t Y ) 2gmn (X t Y ) gmn (X) t2(2120) t 0 2 d gmn (X t Y )dt2 t 0Hence the second directional derivative, 2d g11 (X) d 2g12 (X) d 2g21 (X) d 2g22 (X) Y dg 2(X) , . .d 2gM 1 (X)d 2gM 2 (X)(2121)·········d 2g1N (X)d 2g2N (X). RM N 2d gMN (X) dX Y³³³¡ T ¡ T ¡ T tr tr g12 (X)T Y Y · · · tr tr g1N (X)T Y Ytr tr g11 (X)T Y Y ³³³¡ T ¡ T ¡ T tr tr g21 (X)T Y Ytr tr g22 (X)T Y Y · · · tr tr g2N (X)T Y Y . . ³³¡ T ³¡ T ¡ T TTTtr tr gM 1 (X) Y Ytr tr gM 2 (X) Y Y · · · tr tr gMN (X) Y Y PP 2 g11 (X)Ykl Yij i,j k, l Xkl Xij P P 2g21 (X) Xkl Xij Ykl Yij i,j k, l. . P P 2gM 1 (X) Xkl Xij Ykl Yiji,j k, lPPi,j k, l 2g12 (X) Xkl Xij Ykl Yij2PP g22 (X) Xkl Xij Ykl YijPP 2gM 2 (X) Xkl Xij Ykl Yiji,j k, li,j k, l···.······PPi,j k, lPPi,j k, l 2g1N (X) Xkl Xij Ykl Yij2 g2N (X) Xkl Xij Ykl Yij.P P 2gMN (X)i,j k, l Xkl XijYkl Yij (2122)from which it follows Y2dg (X) X X 2g(X)X YYkl Yij dg (X) Yij Xkl Xij Xiji,ji,j(2123)k, lYet for all X dom g , any Y RK L , and some open interval of t R Yg(X t Y ) g(X) t dg (X) 1 2 Y2t dg (X) O(t3 )2!(2124)which is the second-order multidimensional Taylor series expansion about X . [462, §18.4][203, §2.3.4] Differentiating twice with respect to t and subsequent t-zeroing isolates thethird term of the expansion. Thus differentiating and zeroing g(X t Y ) in t is anoperation equivalent to individually differentiating and zeroing every entry gmn (X t Y )as in (2121). So the second directional derivative of g(X) : RK L RM N becomes[371, §2.1, §5.4.5] [43, §6.3.1] Yd 2 2(2125)dg (X) 2 g(X t Y ) RM Ndt t 0which is again simplest. (confer (2104)) Directional derivative retains the dimensions of g .

D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIESD.1.6611directional derivative expressionsIn the case of a real function g(X) : RK L R , all its directional derivatives are in R : Y¡ dg (X) tr g(X)T Y Y2(2126)µ¶³ Y¡ T TTdg (X) tr X tr g(X) Y Y tr X dg (X) YÃ!µ¶³ Y¡ T TT2Tdg (X) tr X tr X tr g(X) Y Y Y tr X dg (X) Y Y3(2127)(2128)In the case g(X) : RK R has vector argument, they further simplify: Ydg (X) g(X)T Y(2129) Y2dg (X) Y T 2 g(X)Y(2130)¡ Tdg (X) X Y T 2 g(X)Y Y(2131) Y3and so on.D.1.7higher-order multidimensional Taylor seriesSeries expansions of the differentiable matrix-valued function g(X) , of matrix argument,were given earlier in (2103) and (2124). Assume that g(X) has continuous first-, second-,and third-order gradients over open set dom g . Then, for X dom g and any Y RK L ,the Taylor series is expressed on some open interval of µ R Yg(X µY ) g(X) µ dg (X) 1 3 Y31 2 Y2µ dg (X) µ dg (X) O(µ4 )2!3!(2132)or on some open interval of kY k2 Y Xg(Y ) g(X) dg(X) 1 Y3 X1 Y2 Xdg (X) dg (X) O(kY k4 )2!3!(2133)which are third-order expansions about X . The mean value theorem from calculus is whatinsures finite order of the series. [462] [51, §1.1] [50, App.A.5] [265, §0.4] These somewhatunbelievable formulaeD.3 imply that a function can be determined over the whole of itsdomain by knowing its value and all its directional derivatives at a single point X .D.1.7.0.1 Example. Inverse-matrix function.Say g(Y ) Y 1 . From the table on page 616, Yd g(X t Y ) X 1 Y X 1dg (X) dt t 0(2134) Y2 d 2 dg (X) 2 g(X t Y ) 2X 1 Y X 1 Y X 1dt t 0(2135)2D.3 e.g, real continuous and differentiable function of real variable f (x) e 1/x has no Taylor seriesexpansion about x 0 , of any practical use, because each derivative equals 0 there.

612APPENDIX D. MATRIX CALCULUS Y3 d 3 dg (X) 3 g(X t Y ) 6X 1 Y X 1 Y X 1 Y X 1dt t 0(2136)Let’s find the Taylor series expansion of g about X I : Since g(I ) I , for kY k2 1(µ 1 in (2132))g(I Y ) (I Y ) 1 I Y Y 2 Y 3 . . .(2137)If Y is small, (I Y ) 1 I Y .D.4 Now we find Taylor series expansion about X :g(X Y ) (X Y ) 1 X 1 X 1 Y X 1 2X 1 Y X 1 Y X 1 . . .If Y is small, (X Y ) 1 X 1 X 1 Y X 1 .(2138)2D.1.7.0.2 Exercise. log det .(confer [74, p.644])Find the first three terms of a Taylor series expansion for log det Y . Specify an openinterval over which the expansion holds in vicinity of X .HD.1.8Correspondence of gradient to derivativeFrom the foregoing expressions for directional derivative, we derive a relationship betweengradient with respect to matrix X and derivative with respect to real variable t :D.1.8.1first-orderRemoving evaluation at t 0 from (2104),D.5 we find an expression for the directionalderivative of g(X) in direction Y evaluated anywhere along a line {X t Y t R}intersecting dom g Yd(2139)dg (X t Y ) g(X t Y )dtIn the general case g(X) : RK L RM N , from (2097) and (2100) we find¡ dtr X gmn (X t Y )T Y gmn (X t Y )dt(2140)which is valid at t 0 , of course, when X dom g . In the important case of a realfunction g(X) : RK L R , from (2126) we have simply¡ dtr X g(X t Y )T Y g(X t Y )dt(2141)When g(X) : RK R has vector argument, X g(X t Y )T Y D.4 Haddg(X t Y )dt(2142)we instead set g(Y ) (I Y ) 1 , then the equivalent expansion would have been about X 0.by replacing X with X t Y in (2097)-(2099); beginning,D.5 Justifieddgmn (X t Y ) dX Y X gmn (X t Y )Ykl Xklk, l

613D.1. GRADIENT, DIRECTIONAL DERIVATIVE, TAYLOR SERIESD.1.8.1.1 Example. Gradient.g(X) wTX TXw , X RK L , w RL . Using the tables in §D.2,¡ ¡ tr X g(X t Y )T Y tr 2wwT(X T t Y T )YTTT 2w (X Y t Y Y )w(2143)(2144)Applying equivalence (2141),dd Tg(X t Y ) w (X t Y )T (X t Y )wdtdt ¡ wT X T Y Y TX 2t Y T Y wTTT 2w (X Y t Y Y )w(2145)(2146)(2147)which is the same as (2144). Hence, the equivalence is demonstrated.It is easy to extract g(X) from (2147) knowing only (2141):¡ tr X g(X t Y )T Y ¡ tr X g(X)T Y X g(X) 2wT¡(X T Y t Y T Y )w 2 tr wwT(X T t Y T )Y¡ 2 tr wwTX T Y(2148)2XwwT2D.1.8.2second-orderLikewise removing the evaluation at t 0 from (2125), Y2dg (X t Y ) d2g(X t Y )dt2(2149)we can find a similar relationship between second-order gradient and second derivative: Inthe general case g(X) : RK L RM N from (2118) and (2121),³¡ T d2tr X tr X gmn (X t Y )T Y Y 2 gmn (X t Y )dt(2150)In the case of a real function g(X) : RK L R we have, of course,³¡ T d2tr X tr X g(X t Y )T Y Y 2 g(X t Y )dt(2151)From (2130), the simpler case, where real function g(X) : RK R has vector argument,Y T X2 g(X t Y )Y d2g(X t Y )dt2(2152)D.1.8.2.1 Example. Second-order gradient.We want to find 2 g(X) RK K K K given real function g(X) log det X havingdomain intr SK . From the tables in §D.2,h(X) , g(X) X 1 intr SK (2153)

614APPENDIX D. MATRIX CALCULUSso 2 g(X) h(X). By (2140) and (2103), for Y SK ¡ d hmn (X t Y )tr hmn (X)T Y dt t 0¶µ d h(X tY) dt mnµ t 0¶d 1 (X t Y )dt t 0mn¡ X 1 Y X 1 mn(2154)(2155)(2156)(2157)K KSetting Y to a member of {ek eT k , l 1 . . . K } , and employing a property (41)l Rof the trace function we find¡ ¡ 1 2 g(X)mnkl tr hmn (X)T ek eT hmn (X)kl X 1 ek eT(2158)ll Xmn¡ 1 2 g(X)kl h(X)kl X 1 ek eT RK K(2159)l X2From all these first- and second-order expressions, we may generate new ones by evaluatingboth sides at arbitrary t (in some open interval) but only after differentiation.D.2Tables of gradients and derivatives Results may be validated numerically via Richardson extrapolation. [332, §5.4] [146]When algebraically proving results for symmetric matrices, it is critical to takegradients ignoring symmetry and to then substitute symmetric entries afterward.[220] [78] i, j , k , ℓ , K , L , m , n , M , N are integers, unless otherwise noted, a , b Rn , x , y Rk ,A , B Rm n , X , Y RK L , t , µ R . x µ means δ(δ(x)µ ) for µ R ; id est, entrywise vector exponentiation. δ is themain-diagonal linear operator (1681). x0 , 1 , X 0 , I if square. d dx1 ddx , .ddxk y y , dg(x) , dg 2(x) (directional derivatives §D.1), log x , e x , x , x/y (Hadamard quotient), sgn x , x (entrywise square root), etcetera, are mapskkf : R R that maintain dimension; e.g, (§A.1.1)d 1x, x 1T δ(x) 1 1dx(2160) For A a scalar or square matrix, we have the Taylor series [98, §3.6]eA ,Further, [440, §5.4] X1 kAk!(2161)k 0eA 0 A Sm(2162) For all square A and integer kdetk A det Ak(2163)

D.2. TABLES OF GRADIENTS AND DERIVATIVESD.2.1615algebraic x x x xT I Rk k x 1T x x xT 1 1 Rk X X X X T , I RK L K L(Identity) X 1TX 1 X 1TX T 1 11T RK LT x (Ax¡ T b) T A x x A b A x (Ax b)T(Ax b) 2AT(Ax b) x2p(Ax b)T(Ax b) 2ATA x (Ax b)T(Ax b) AT(Ax b)/kAx bk2 x kAx bk2 x z T Ax b AT δ(z) sgn(Ax b) , zi 6 0 (Ax b)i 6 0 x 1T Ax b AT sgn(Axµ b) x kAx¶ bk1 (y) x 1T f ( Ax b ) AT δ dfdysgn(Ax b) y Ax b ¡ x xTAx 2xTB y y TC y A AT x 2B y¡ x (x y)TA(x y) A AT (x y)¡ T2 x x Ax 2xTB y y TC y A AT¡ X aTXb X bTX Ta abT X aTX 2 b X T abT abT X T X aTX 1 b X T abT X T x aTxTxb 2xaT bconfer X 1 1(2095) X 1 ek eTX,l Xkl(2159) X aTX TXb X(abT baT ) x aTxxT b (abT baT )x X aTXX T b (abT baT )X x aTxTxa 2xaTa X aTX TXa 2XaaT x aTxxTa 2aaTx X aTXX T a 2aaT X x aTyxT b b aTy X aT Y X T b baT Y x aTy Tx b y bTa X aT Y TXb Y abT x aTxy T b a bTy X aTXY T b abT Y x aTxTy b y aT b X aTX T Y b Y baT X (X 1 )kl

616APPENDIX D. MATRIX CALCULUSalgebraic continuedddt (X tY ) YdTdt B (X dTdt B (X dTdt B (X t Y ) 1 A B T (X t Y ) 1 Y (X t Y ) 1 At Y ) TA B T (X t Y ) T Y T (X t Y ) TAt Y )µ A . . . , 1 µ 1 , X , Y SM d2B T (X dt2d3B T (X dt3t Y ) 1 A 1tY )2B T (X t Y ) 1 Y (X t Y ) 1 Y (X t Y ) 1 AA 6B T (X t Y ) 1 Y (X t Y ) 1 Y (X t Y ) 1 Y (X t Y ) 1 Ad(X t Y )TA(X t Y ) Y TAX dt ¡ d2(X t Y )TA(X t Y ) 2 Y TAYdt2¡ 1dTdt (X¡ t Y ) A(X t Y ) 1 TT¡ X TAY 2 t Y TAY¡ 1 (X t Y ) A(X t Y ) (Y AX X TAY 2 t Y TAY ) (X t Y )TA(X t Y )ddt ((X t Y )A(X t Y )) YAX XAY 2 t YAYd2((X t Y )A(X t Y )) 2 YAYdt2D.2.2trace Kronecker vec X tr(A XBX T ) vec X vec(X)T (B T A) vec X (B AT B T A) vec X22TTTTT vecX tr(A XBX ) vec X vec(X) (B A) vec X B A B A (2076)

617D.2. TABLES OF GRADIENTS AND DERIVATIVESD.2.3trace x µ x µI X tr µX X µ tr X µId 1 x 1T δ(x) 1 1 dxx x 2T 1 x 1 δ(x) y δ(x) 2 y X tr X 1 X 2T X tr(X 1 Y ) X tr(Y X 1 ) X T Y TX Td µdx x X tr X µ µX µ 1 , µx µ 1X SM X tr X j jX (j 1)T x (b aTx) 1 (b aTx) 2 a x (b aTx)µ µ(b aTx)µ 1 a x xTy x y Tx y x xTx 2x¡ ¡ T X tr (B AX) 1 (B AX) 2 A X tr(X T Y ) X tr(Y X T ) X tr(Y TX) X tr(XY T ) Y X tr(X TX ) X tr(XX T ) 2X X tr(AXBX T ) X tr(XBX TA) ATXB T AXB X tr(AXBX) X tr(XBXA) ATX TB T B TX TAT X tr(AXAXAXAX) X tr(AXAXAX) X tr(AXAX) X tr(AX) X tr(XAXAXAXA) X tr(XAXAXA) X tr(XAXA) X tr(XA) X tr(Y X k ) X tr(X k Y ) k 1P¡ X i Y X k 1 ii 04(AXAXAXA )T3(AXAXA )T2(AXA )TAT T X tr(X T Y Y TXX T Y Y TX) 4Y Y TXX T Y Y TX X tr(XY Y TX TXY Y TX T ) 4XY Y TX TXY Y T X tr(Y TXX T Y ) X tr(X T Y Y TX) 2Y Y TX X tr(Y

Matrix Calculus From too much study, and from extreme passion, cometh madnesse. Isaac Newton [205, § 5] D.1 Gradient, Directional derivative, Taylor series D.1.1 Gradients Gradient of a differentiable real function f(x) : RK R with respect to its vector argument i

Related Documents:

CONTENTS CONTENTS Notation and Nomenclature A Matrix A ij Matrix indexed for some purpose A i Matrix indexed for some purpose Aij Matrix indexed for some purpose An Matrix indexed for some purpose or The n.th power of a square matrix A 1 The inverse matrix of the matrix A A The pseudo inverse matrix of the matrix A (see Sec. 3.6) A1 2 The square root of a matrix (if unique), not elementwise

A Matrix A ij Matrix indexed for some purpose A i Matrix indexed for some purpose Aij Matrix indexed for some purpose An Matrix indexed for some purpose or The n.th power of a square matrix A 1 The inverse matrix of the matrix A A The pseudo inverse matrix of the matrix A (see Sec. 3.6) A1/2 The square root of a matrix (if unique), not .

CONTENTS CONTENTS Notation and Nomenclature A Matrix Aij Matrix indexed for some purpose Ai Matrix indexed for some purpose Aij Matrix indexed for some purpose An Matrix indexed for some purpose or The n.th power of a square matrix A 1 The inverse matrix of the matrix A A The pseudo inverse matrix of the matrix A (see Sec. 3.6) A1/2 The square root of a matrix (if unique), not elementwise

CONTENTS CONTENTS Notation and Nomenclature A Matrix A ij Matrix indexed for some purpose A i Matrix indexed for some purpose Aij Matrix indexed for some purpose An Matrix indexed for some purpose or The n.th power of a square matrix A 1 The inverse matrix of the matrix A A The pseudo inverse matrix of the matrix A (see Sec. 3.6) A1 2 The sq

5500 AP Calculus AB 9780133311617 Advanced Placement Calculus: Graphical Numerical Algebraic 5511 AP Calculus BC n/a Independent 5495 Calculus 9781133112297 Essential Calculus 5495 Calculus - optional - ebook 9780357700013 Essential Calculus Optional ebook . 9780134296012 Campbell Biology 9th Ed

webwork answers calculus, webwork answers calculus 2, webwork answers calculus 3, webwork solutions calculus 3, webwork answer key calculus In Algebra and Geometry, the WebWork assignment must be finished by 10pm . Students in our Calculus 1-3

Further Maths Matrix Summary 1 Further Maths Matrix Summary A matrix is a rectangular array of numbers arranged in rows and columns. The numbers in a matrix are called the elements of the matrix. The order of a matrix is the number of rows and columns in the matrix. Example 1 [is a ] 3 by 2 or matrix as it has 3 rows and 2 columns. Matrices are .

S1 Akuntansi Pendidikan Profesi: PPAk S2 Magister Science, Magister Terapan S3 Ilmu Akuntansi Pendidikan IAI: KAPd. dan KASP Asosiasi Profesi Akuntansi: IAPI dan IAMI Asosiasi Profesi lain terkait akuntansi dan Internasional –Internal Auditor, CISA, ACCA, CMA, CIMA, CPA Negara lain Asosiasi Profesi PPAJP Kemenkeu Kemendiknas - DIKTI BNSP OJK Internasional .