Newton And Quasi-Newton Methods - Argonne National Laboratory

1y ago
17 Views
2 Downloads
1.30 MB
38 Pages
Last View : 10d ago
Last Download : 3m ago
Upload by : Tripp Mcmullen
Transcription

Newton and Quasi-Newton MethodsGIAN Short Course on Optimization:Applications, Algorithms, and ComputationSven LeyfferArgonne National LaboratorySeptember 12-24, 2016

Outline1Quadratic Models and Newton’s MethodModifying the Hessian to Ensure Descend2Quasi-Newton MethodsThe Rank-One Quasi-Newton Update.The BFGS Quasi-Newton Update.Limited-Memory Quasi-Newton Methods2 / 35

Quadratic Models and Newton’s MethodConsider unconstrained optimization problem:minimizef (x),nx Rwhere f : Rn R twice continuously differentiable.Motivation for Newton:Steepest descend is easy,. but can be slowQuadratics approximatenonlinear f (x) betterFaster local convergenceMore “robust” methods3 / 35

Quadratic Models and Newton’s Methodminimizef (x)nx RMain Idea Behind NewtonQuadratic function approximates a nonlinear f (x) well.First-order conditions of quadratics are easy to solve.Consider minimizing a quadratic function (wlog cons t 0)1minimize q(x) x T Hx b T xx2First-order conditions, q(x) 0, areHx b. a linear system of equations4 / 35

Quadratic Models and Newton’s Methodminimizef (x)nx RNewton’s method uses truncated Taylor series:1Tf (x (k) d) f (k) g (k) d d T H (k) d o(kdk2 )2where a o(kdk2 ) means thatakdk2 0 as kdk2 0.Notation ConventionFunctions evaluated at x (k) are identified by superscripts:f (k) : f (x (k) )g (k) : g (x (k) ) : f (x (k) )H (k) : H(x (k) ) : 2 f (x (k) )5 / 35

Quadratic Models and Newton’s Methodminimizef (x)nx RNewton’s method defines quadratic approx. at x (k)1Tq (k) (d) : f (k) g (k) d d T H (k) d,2and steps to minimum of q (k) (d).If H (k) positive definite, solve linear system:min q (k) (d)d q (k) (d) 0 H (k) d g (k) . then sets x (k 1) : x (k) d6 / 35

Simple Version of Newton’s Methodminimizef (x)nx RSimple Newton Line-Search MethodGiven x (0) , set k 0.repeatSolve H (k) s (k) : g (x (k) ) for Newton directionFind step length αk : Armijo(f (x), x (k) , s (k) )Set x (k 1) : x (k) αk s (k) and k k 1.until x (k) is (local) optimum;See Matlab demo7 / 35

Theory of Newton’s MethodNewton direction is a descend direction if H (k) is positive definite:LemmaIf H (k) is positive definite, then s (k) from solve ofH (k) s (k) : g (x (k) ) is a descend direction.Proof.Drop superscripts (k) for simplicityH is positive definite H 1 inverse exists and is pos. definite g T s g T H 1 ( g ) 0 s is a descend direction. 8 / 35

Theory of Newton’s MethodNewton’s method converges quadratically. steepest descend only linearlyTheoremf (x) twice continuously differentiable and that H(x) is Lipschitz:kH(x) H(y )k Lkx y knear local minimum x .If x (k) sufficiently close x , and if H positive definite, thenNewton’s method converges quadratically and αk 1.9 / 35

Theory of Newton’s MethodNewton’s method converges quadratically. steepest descend only linearlyTheoremf (x) twice continuously differentiable and that H(x) is Lipschitz:kH(x) H(y )k Lkx y knear local minimum x .If x (k) sufficiently close x , and if H positive definite, thenNewton’s method converges quadratically and αk 1.This is a remarkable result:Near a local solution, we do not need a line search.Convergence is quadratic . double the significant digits.9 / 35

Illustrative Example of Newton’s MethodConvergence & failure of Newton: f (x) x14 x1 x2 (1 x2 )210 / 35

Discussion of Newton’s Method IFull Newton step may fail to reduce f (x), E.g.x (0)1minimize f (x) x 2 x 4 .x4ppp 2/5 creates alternating iterates 2/5 and 2/5.Remedy: Use a line search.11 / 35

Discussion of Newton’s Method IINewton’s method solves linear system at every iteration.Can be computationally expensive, if n is large.Remedy: Apply iterative solvers, e.g. conjugate-gradients.Newton’s method needs first and second derivatives.Finite differences are computationally expensive.Use automatic differentiation (AD) for gradient. Hessian is harder, get efficient Hessian products: H (k) vRemedy: Code efficient gradients, or use AD tools.12 / 35

Discussion of Newton’s Method IIIProblem, if Hessian, H (k) not positive definiteNewton direction may not be definedIf H (k) singular, then H (k) s g (k) not well defined:Either H (k) s g (k) has no solution,or H (k) s g (k) has infinitely many solutions!Even if Newton direction exists, it may not reduce f (x) Newton’s method fails even with line search13 / 35

Discussion of Newton’s Method IVProblem, if Hessian, H (k) , has indefinite curvature:Considerminimize f (x) x14 x1 x2 (1 x2 )2xStarting Newton at x (0) 0, getx (0) 0,g (0) 0 ,2 01H (0) ,1 s (0) 2 20 Line-search from x (0) in direction s (0) : 2α(0)(0)x αs f (x (0) αs (0) ) ( 2α)4 1 16α4 1 10for all α 0, hence cannot decrease f (x) α0 0 Newton’s method stalls14 / 35

Failure of Newton’s MethodSteepest descend works fine Remedy: Modify Hessian to make it positive definite.15 / 35

Modifying the Hessian to Ensure Descend INewton’s method can fail, if H (k) , is not positive definite.To modify the Hessian, estimate smallest eigenvalue, λmin (H (k) ),Define modification matrix, Mk : Mk : max 0, λmin (H (k) ) I ,where 0 small, and I Rn n identity matrixUse modified Hessian, H (k) Mk , which is positive definiteMatlab get smallest eigenvalue: Lmin min(eig(H))16 / 35

Modifying the Hessian to Ensure Descend IIAlternative modificationCompute Cholesky factors of H (k) :H (k) Mk Lk LTkwhere Lk lower triangular with positive diagonalLk LTk is positive definiteChoose Mk 0 if H (k) is positive definiteChoose Mk not unreasonably largeRelated to Lk Dk LTk factors. perform modification as we solve the Newton system,H (k) s (k) : g (x (k) )17 / 35

Modified Newton Line-Search MethodGiven x (0) , set k 0. repeatForm Mk from eigenvalue est. or mod. Cholesky factors. Get modified Newton direction: H (k) Mk s (k) : g (x (k) ).Get step length αk : Armijo(f (x), x (k) , s (k) ).Set x (k 1) : x (k) αk s (k) and k k 1.until x (k) is (local) optimum;Modification H (k) λmin (H (k) )I bias towards steepest descend:Let µ λmin (H (k) ) 1 , then solve λmin (H (k) ) µH (k) I s (k) : g (x (k) ),As µ 0, recover steepest-descend direction, s (k) ' g (x (k) )18 / 35

Outline1Quadratic Models and Newton’s MethodModifying the Hessian to Ensure Descend2Quasi-Newton MethodsThe Rank-One Quasi-Newton Update.The BFGS Quasi-Newton Update.Limited-Memory Quasi-Newton Methods19 / 35

Quasi-Newton MethodsQuasi-Newton Methods avoid pitfalls of Newton’s method:1Failure Newton’s, if H (k) not positive definite;2Need for second derivatives;3Need to solve linear system at every iteration.Study quasi-Newton and more modern limited-memoryquasi-Newton methodsOvercome computational pitfalls of NewtonRetain fast local convergence (almost) 1Quasi-Newton methods work with approx. B (k) ' H (k) Newton solve becomes matrix-vector product: s (k) B (k) g (k)20 / 35

Quasi-Newton MethodsChoose initial approximation, B (0) νI Defineγ (k) : g (k 1) g (k) gradient differenceδ (k) : x (k 1) x (k) iterate difference,then, for quadratic q(x) : q0 g T x 12 x T Hx, getγ (k) Hδ (k) δ (k) H 1 γ (k) 1Because B (k) ' H (k) , ideally want B (k) γ (k) δ (k)Not possible, because need B (k) to compute x (k 1) , hence useQuasi-Newton ConditionB (k 1) γ (k) δ (k)21 / 35

Rank-One Quasi-Newton UpdateGoal: Find rank-one update such that B (k 1) γ (k) δ (k)Express symmetric rank-one matrix as outer product:uu T [u1 u; . . . ; un u] ,and set B (k 1) B (k) auu T .Choose a R and u Rn such that update, B (k 1) , satisfiesδ (k) B (k 1) γ (k) B (k) γ (k) auu T γ (k). quasi-Newton conditionRewrite Quasi-newton condition as δ (k) B (k) γ (k) auu T γ (k)“Solving” last equation of u, then quasi-Newton condition implies u δ (k) B (k) γ (k) / au T γ (k)assuming au T γ (k) 6 022 / 35

Rank-One Quasi-Newton UpdateFrom previous page: Quasi-Newton condition implies u δ (k) B (k) γ (k) / au T γ (k)assuming au T γ (k) 6 0We are looking for update auu TAssume au T γ (k) 6 0 (can be monitored)Choose u δ (k) B (k) γ (k)Given this choice of u, we must set a asa 1u T γ (k) 1δ (k) B (k) γ (k) Tγ (k).Double check that we satisfy the quasi-Newton condition:B (k 1) γ (k) B (k) γ (k) auu T γ (k)23 / 35

Rank-One Quasi-Newton UpdateSubstituting values for a and u we get .B(k 1) (k)γ B(k) (k)γ δ (k) B (k) γ (k)δ (k) B (k) γ (k) Tδ (k) B (k) γ (k) γ (k) Tγ (k) B (k) γ (k) δ (k) B (k) γ (k) δ (k)Rank-One Quasi-Newton UpdateAssuming that (δ Bγ)T γ 6 we use:B (k 1) B (δ Bγ) (δ Bγ)T(δ Bγ)T γ.24 / 35

Properties of Rank-One Quasi-Newton UpdateRank-One Quasi-Newton UpdateB (k 1) B (δ Bγ) (δ Bγ)T(δ Bγ)T γ.Theorem (Quadratic Termination of Rank-One)If rank-one update is well defined, and δ (1) , . . . , δ (n) linearlyindependent, then rank-one method terminates in at most n 1steps with B (n 1) H 1 for quadratic with pos. definite Hessian.Remark (Disadvantages of Rank-One Formula)12Does not maintain positive definiteness of B (k) steps may not be descend directionsRank-one breaks down, if denominator is zero or small.25 / 35

BFGS Quasi-Newton UpdateBFGS rank-two update . method of choiceBFGS Quasi-Newton UpdateB(k 1) B δγ T B Bγδ TδT γ γ T Bγ δδ T 1 T.δ γδT γ. works well with low-accuracy line-searchTheorem (BFGS Update is Positive Definite)If δ T γ 0, then BFGS update remains positive definite.26 / 35

Picture of BFGS Quasi-Newton UpdateWe can visualize the BFGS update .27 / 35

Picture of BFGS Quasi-Newton UpdateWe can visualize the BFGS update .27 / 35

Convergence of BFGS UpdatesQuestion (Convergence of BFGS with Wolfe Line Search)Does BFGS converge for nonconvex f (x) with Wolfe line-search?Wolfe Line-Search ConditionsWolfe line search finds α:Tf (x (k) αk s (k) ) f (k) δαk g (k) s (k)Tg (x (k) αk s (k) )T s (k) σg (k) s (k) .28 / 35

Convergence of BFGS UpdatesQuestion (Convergence of BFGS with Wolfe Line Search)Does BFGS converge for nonconvex f (x) with Wolfe line-search?Wolfe Line-Search ConditionsWolfe line search finds α:Tf (x (k) αk s (k) ) f (k) δαk g (k) s (k)Tg (x (k) αk s (k) )T s (k) σg (k) s (k) .Unfortunately, the answer is no!28 / 35

Dai [2013] Example of Failure of BFGSConstructs “perfect 4D example” for BFGS method:Steps s (k) , gradients, g (k) , satisfy R1 0τ R1 0(k)(k 1)(k)s sand g g (k 1) ,0 τ R20 R2where τ parameter, and R1 , R2 rotation matrices cos α sin αcos β sin βR1 and R2 sin α cos αsin β cos βCan show thatαk 1 satisfies Wolfe or Armijo line searchf (x) is polynomial of degree 38 (strongly convex along s (k) .Iterates converge to circle around vertices of octagon. not stationary points.29 / 35

Limited-Memory Quasi-Newton MethodsDisadvantage of quasi-Newton: Storage & computatn : O(n2 )Quasi-Newton matrices are dense ( sparse updates).Storage & computation of O(n2 ) prohibitive for large n. solve inverse problems from geology with 1012 unknownsLimited memory method are clever way to re-write quasi-NewtonStore m n most recent difference pairs m ' 10Cost per iteration only O(nm) not O(n2 )30 / 35

Limited-Memory Quasi-Newton MethodsRecall BFGS update: T δγ B Bγδ Tγ T Bγ δδ T(k 1)B B 1 T.δT γδ γδT γ T T δγ B Bγδ Tγ Bγ δδ Tδδ T B δT γδT γδT γ δT γRewrite BFGS update as (substitute and prove for yourself!)(k 1)BBFGS VkT BVk ρk δδ T ,where 1ρk δ T γ,and Vk I ρk γδ T .Recur update back to initial matrix, B (0) 031 / 35

Limited-Memory Quasi-Newton MethodsIdea: Apply m n quasi-Newton updates at iteration k,corresponding to difference pairs, (δi , γi ) for i k m, . . . , k 1:hihiTTB (k) Vk 1· · · Vk mB (0) Vk 1 · · · Vk mhihiTT ρk m Vk 1· · · Vk m 1B (0) Vk 1 · · · Vk m 1 . ρk 1 δ (k 1) δ (k 1)T. can be implemented recursively!32 / 35

Limited-Memory Quasi-Newton MethodsRecursive procedure to compute BFGS direction, s:Limited Memory BFGS Search Direction ComputationGiven initial B (0) , memory m, set gradient, q f (x (k) ).for i k 1, . . . , k m doTSet αi ρi δ (i) γ (i)Update gradient: q q αi γ (i)endApply initial quasi-Newton matrix: r H (0) qfor i k 1, . . . , k m doTSet β ρi γ (i) rUpdate direction: r r δ (i) (αi β)end Return search direction: s (k) : r H (k) g (k)Cost of recursion is O(4nm) if H (0) is diagonal33 / 35

General Quasi-Newton MethodsGiven any of updates discussed, quasi-Newton algorithm isGeneral Quasi-Newton (qN) Line-Search MethodGiven x (0) , set k 0.repeatGet quasi-Newton direction, s (k) B (k) g (k)Step length αk : Armijo(f (x), x (k) , s (k) )Set x (k 1) : x (k) αk s (k) .Form γ (k) , δ (k) , update qN matrix, B (k 1) , set k k 1.until x (k) is (local) optimum;34 / 35

Summary: Newton and Quasi-Newton MethodsMethods for unconstrained optimization:minimize f (x)xQuadratic model provides better approx. of f (x)Newton’s method minimizes quadratic for step d:1Tminimize q (k) (d) : f (k) g (k) d d T H (k) d,d2Modify if H (k) not pos. def. (no descend): H (k) Mk 0Converges quadratically (near solution)Quasi-Newton methods avoid need for Hessian H (k) 1Update quasi-Newton approx. B (k) H (k)Limited memory version for large-scale optimization35 / 35

Discussion of Newton's Method II Newton's method solves linear system at every iteration. Can be computationally expensive, if n is large. Remedy: Apply iterative solvers, e.g. conjugate-gradients. Newton's method needs rst and second derivatives. Finite di erences are computationally expensive. Use automatic di erentiation (AD) for gradient

Related Documents:

Extensions I 80% of the minimization and solving methods you meet will be derivative-based twists of Newton's Method I Halley's Method, Householder Methods: Higher-order derivatives I Quasi-Newton Methods: If you can't take J, approximate it I Secant Method: Newton's Method with nite di erences I Gauss-Newton: Minimize sum of squared functions I Lecture 2 talks about alternatives

quasi-experimental study design used, (2) justification of the use of the design, (3) use of correct nomenclature to describe the design, and (4) statistical methods used. Criterion 1: Type of Quasi-experimental Study Design Used The hierarchy for quasi-experimental designs in the field of infectious diseases was

randomization (quasi-experimental design). This re-search has taken advantage of public pension pro-grams in Europe, which have been shown to increase retirement [19-21]. These quasi-experimental studies have found neutral to positive effects for physical health [12, 22, 23]. The quasi-experimental study designs described

Quasi Monte Carlo has been developed. While the convergence rate of classical Monte Carlo (MC) is O(n¡1 2), the convergence rate of Quasi Monte Carlo (QMC) can be made almost as high as O(n¡1). Correspondingly, the use of Quasi Monte Carlo is increasing, especially in the areas where it most readily can be employed. 1.1 Classical Monte Carlo

Galilean Relativity Newton's profound perspective Newton's Laws of Motion 3 ways Newton's Law of Gravitation 9/8/10 2 Newton's profound perspective Newton formulated a universal theory of motion and gravity Same laws of physics operate anywhere and anytime in the Universe

6. Newton-Raphson Method In numerical analysis, Newton-Raphson (James, 2008) method also known as Newton's methods is one of the well-known approximation methods in solving non-linear equations. Newton's method is well-known for its fast converge speed; especially when the initial guess is sufficiently closed to the root.

2. Determine when a group action by quasi-M obius maps is conjugate to a group action by M obius maps. 3. Describe the group of quasi-M obius self-maps of a given metric space. 4. Classify metric spaces with large groups of quasi-M obius maps. 5. Use quasi-M obius group actions to describe the geometry of the spaces on which they act.

INTRODUCTION TO OPENFOAM open Field Operation And Manipulation C libraries Name. INTRODUCTION TO OPENFOAM open Field Operation And Manipulation C libraries Rita F. Carvalho, MARE, Department of Civil Engineering, University of Coimbra, Portugal OpenFOAM Equations Solvers How to use/code Examples Conclusions 3 25 26 33 46 49 50. SOLVE PARTIAL DIFFERENTIAL EQUATIONS (PDE .