Methods For Non-linear Least Squares Problems

1y ago
2 Views
1 Downloads
516.74 KB
30 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Tia Newell
Transcription

IMMC ONTENTS1. I NTRODUCTION AND D EFINITIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12. D ESCENT M ETHODS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5METHODS FORNON-LINEAR LEASTSQUARES PROBLEMS2nd Edition, April 20042.1. The Steepest Descent method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2. Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3. Line Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4. Trust Region and Damped Methods . . . . . . . . . . . . . . . . . . . . . . . . 113. N ON -L INEAR L EAST S QUARES P ROBLEMS . . . . . . . . . . . . . . . . . . . . . 173.1. The Gauss–Newton Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2. The Levenberg–Marquardt Method . . . . . . . . . . . . . . . . . . . . . . . . . 243.3. Powell’s Dog Leg Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.4. A Hybrid Method: L–M and Quasi–Newton. . . . . . . . . . . . . . . . .34K. Madsen, H.B. Nielsen, O. Tingleff3.5. A Secant Version of the L–M Method . . . . . . . . . . . . . . . . . . . . . . 403.6. A Secant Version of the Dog Leg Method . . . . . . . . . . . . . . . . . . . 453.7. Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47A PPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50R EFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55I NDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57Informatics and Mathematical ModellingTechnical University of Denmark

1. I NTRODUCTION AND D EFINITIONS2The model depends on the parameters x [x1 , x2 , x3 , x4 ] . We assume thatthere exists an x† so thatyi M (x† , ti ) εi ,where the {εi } are (measurement) errors on the data ordinates, assumed to behave like “white noise”.For any choice of x we can compute the residuals1. I NTRODUCTION AND D EFINITIONSfi (x) yi M (x, ti ) yi x3 ex1 ti x4 ex2 ti ,i 1, . . . , m .For a least squares fit the parameters are determined as the minimizer x of thesum of squared residuals. This is seen to be a problem of the form in Definition 1.1 with n 4. The graph of M (x , t) is shown by full line in Figure 1.1.In this booklet we consider the following problem,Definition 1.1. Least Squares ProblemFind x , a local minimizer for1)mX2F (x) 12(fi (x)) ,i 1where fi : IR 7 IR, i 1, . . . , m are given functions, and m n.nExample 1.1. An important source of least squares problems is data fitting. As anexample consider the data points (t1 , y1 ), . . . , (tm , ym ) shown belowA least squares problem is a special variant of the more general problem:Given a function F : IRn 7 IR, find an argument of F that gives the minimumvalue of this so-called objective function or cost function.Definition 1.2. Global MinimizerGiven F : IRn 7 IR. Findx argminx {F (x)} .yThis problem is very hard to solve in general, and we only present methods for solving the simpler problem of finding a local minimizer for F , anargument vector which gives a minimum value of F inside a certain regionwhose size is given by δ, where δ is a small, positive number.tFigure 1.1. Data points {(ti , yi )} (marked by )and model M (x, t) (marked by full line.)Definition 1.3. Local MinimizerGiven F : IRn 7 IR. Find x so thatF (x ) F (x)for kx x k δ .Further, we are given a fitting model,M (x, t) x3 ex1 t x4 ex2 t .1) The factor 1 in the definition of F (x) has no effect on x . It is introduced for conve2nience, see page 18.In the remainder of this introduction we shall discuss some basic concepts inoptimization, and Chapter 2 is a brief review of methods for finding a local

31. I NTRODUCTION AND D EFINITIONSminimizer for general cost functions. For more details we refer to Frandsenet al (2004). In Chapter 3 we give methods that are specially tuned for leastsquares problems.We assume that the cost function F is differentiable and so smooth that thefollowing Taylor expansion is valid,2)F (x h) F (x) h g 12 h H h O(khk ) ,3where g is the gradient, F(x) x1. 0g F (x) F(x) xnand H is the Hessian,00H F (x) ·(1.4a) , (1.4b)Thus, a local minimizer is also a stationary point, but so is a local maximizer.A stationary point which is neither a local maximizer nor a local minimizeris called a saddle point. In order to determine whether a given stationarypoint is a local minimizer or not, we need to include the second order termin the Taylor series (1.4a). Inserting xs we see thatF (xs h) F (xs ) 12 h Hs h O(khk )3(1.7)From definition (1.4c) of the Hessian it follows that any H is a symmetricmatrix. If we request that Hs is positive definite, then its eigenvalues aregreater than some number δ 0 (see Appendix A), andh Hs h δ khk2 .This shows that for khk sufficiently small the third term on the right-handside of (1.7) will be dominated by the second. This term is positive, so thatwe get(1.4c)If x is a local minimizer and khk is sufficiently small, then we cannot find apoint x h with a smaller F -value. Combining this observation with (1.4a)we getTheorem 1.5. Necessary condition for a local minimizer.If x is a local minimizer, theng F 0 (x ) 0 .We use a special name for arguments that satisfy the necessary condition:If0gs F (xs ) 0 ,then xs is said to be a stationary point for F .2) Unless otherwise specified, k · k denotes the 2-norm, khk 4with Hs F 00 (xs ) . 2F(x) . xi xjDefinition 1.6. Stationary point.1. I NTRODUCTION AND D EFINITIONSqh21 · · · h2n .Theorem 1.8. Sufficient condition for a local minimizer.Assume that xs is a stationary point and that F 00 (xs ) is positive definite.Then xs is a local minimizer.If Hs is negative definite, then xs is a local maximizer. If Hs is indefinite (ieit has both positive and negative eigenvalues), then xs is a saddle point.

2. D ESCENT M ETHODSQuadratic convergence:kek 1 k O(kek k2 )Superlinear convergence:kek 1 k/kek k 02. D ESCENT M ETHODSAll methods for non-linear optimization are iterative: From a starting pointx0 the method produces a series of vectors x1 , x2 , . . ., which (hopefully)converges to x , a local minimizer for the given function, see Definition 1.3.Most methods have measures which enforce the descending condition(2.1)F (xk 1 ) F (xk ) .This prevents convergence to a maximizer and also makes it less probablethat we converge towards a saddle point. If the given function has severalminimizers the result will depend on the starting point x0 . We do not knowwhich of the minimizers that will be found; it is not necessarily the minimizer closest to x0 .In many cases the method produces vectors which converge towards theminimizer in two clearly different stages. When x0 is far from the solutionwe want the method to produce iterates which move steadily towards x .In this “global stage” of the iteration we are satisfied if the errors do notincrease except in the very first steps, iekek 1 k kek kfor k K ,where ek denotes the current error,ek xk x .(2.2)when kek k is small ,(2.3b)for k .(2.3c)The methods presented in this lecture note are descent methods which satisfy the descending condition (2.1) in each step of the iteration. One stepfrom the current iterate consists in1. Find a descent direction hd (discussed below), and2. find a step length giving a good decrease in the F -value.Thus an outline of a descent method isAlgorithm 2.4. Descent methodbegink : 0; x : x0 ; found : falsewhile (not found) and (k kmax )hd : search direction(x)if (no such h exists)found : trueelseα : step length(x, hd )x : x αhd ; k : k 1end{Starting point}{From x and downhill}{x is stationary}{from x in direction hd }{next iterate}Consider the variation of the F -value along the half line starting at x andwith direction h. From the Taylor expansion (1.4a) we see thatF (x αh) F (x) αh F 0 (x) O(α2 ) In the final stage of the iteration, where xk is close to x , we want fasterconvergence. We distinguish betweenLinear convergence:kek 1 k akek k6' F (x) αh F 0 (x)for α sufficiently small.(2.5)We say that h is a descent direction if F (x αh) is a decreasing function ofα at α 0. This leads to the following definition.when kek k is small; 0 a 1 ,(2.3a)

72. D ESCENT M ETHODSDefinition 2.6. Descent direction.h is a descent direction for F at x ifh F 0 (x) 0 .If no such h exists, then F 0 (x) 0, showing that in this case x is stationary.Otherwise, we have to choose α, ie how far we should go from x in thedirection given by hd , so that we get a decrease in the value of the objectivefunction. One way of doing this is to find (an approximation to)αe argminα 0 {F (x αh)} .(2.7)The process is called line search, and is discussed in Section 2.3. First,however, we shall introduce two methods for computing a descent direction.8Considerations like this has lead to the so-called hybrid methods, which – asthe name suggests – are based on two different methods. One which is goodin the initial stage, like the gradient method, and another method which isgood in the final stage, like Newton’s method; see the next section. A majorproblem with a hybrid method is the mechanism which switches betweenthe two methods when appropriate.2.2. Newton’s MethodWe can derive this method from the condition that x is a stationary point.According to Definition 1.6 it satisfies F 0 (x ) 0. This is a nonlinear system of equations, and from the Taylor expansionF 0 (x h) F 0 (x) F 00 (x)h O(khk2 )' F 0 (x) F 00 (x)h2.1. The Steepest Descent methodFrom (2.5) we see that when we perform a step αh with positive α, then therelative gain in function value satisfies1 0F (x) F (x αh) h F (x) kF 0 (x)k cos θ ,limα 0αkhkkhkfor khk sufficiently smallwe derive Newton’s method: Find hn as the solutions toH hn F 0 (x)with H F 00 (x) ,(2.9a)and compute the next iterate bywhere θ is the angle between the vectors h and F 0 (x). This shows that weget the greatest gain rate if θ π, ie if we use the steepest descent directionhsd given byhsd F 0 (x) .2.2. Newton’s Method(2.8)The method based on (2.8) (ie hd hsd in Algorithm 2.4) is called the steepest descent method or gradient method. The choice of descent direction is“the best” (locally) and we could combine it with an exact line search (2.7).A method like this converges, but the final convergence is linear and oftenvery slow. Examples in Frandsen et al (2004) show how the steepest descentmethod with exact line search and finite computer precision can fail to findthe minimizer of a second degree polynomial. For many problems, however,the method has quite good performance in the initial stage of the iterativeprocess.x : x hn .(2.9b)Suppose that H is positive definite, then it is nonsingular (implying that(2.9a) has a unique solution), and u H u 0 for all nonzero u. Thus, bymultiplying with h n on both sides of (2.9a) we get 00 h n H hn hn F (x) ,(2.10)showing that hn is a descent direction: it satisfies the condition in Definition 2.6.Newton’s method is very good in the final stage of the iteration, where x isclose to x . One can show (see Frandsen et al (2004)) that if the Hessianat the solution is positive definite (the sufficient condition in Theorem 1.8is satisfied) and if we are at a position inside the region around x where

92. D ESCENT M ETHODSF 00 (x) is positive definite, then we get quadratic convergence (defined in(2.3)). On the other hand, if x is in a region where F 00 (x) is negative definiteeverywhere, and where there is a stationary point, the basic Newton method(2.9) would converge (quadratically) towards this stationary point, whichis a maximizer. We can avoid this by requiring that all steps taken are indescent directions.2.3. Line Search10ϕ(α) F (x αh) ,x and h fixed, α 0 .An example of the behaviour of ϕ(α) is shown in Figure 2.1.yy φ(0)We can build a hybrid method, based on Newton’s method and the steepestdescent method. According to (2.10) the Newton step is guaranteed to bedownhill if F 00 (x) is positive definite, so a sketch of the central section ofthis hybrid algorithm could beif F 00 (x) is positive definiteh : hnelseh : hsdx : x αh(2.12)y φ(α)α(2.11)Here, hsd is the steepest descent direction and α is found by line search; seeSection 2.3. A good tool for checking a matrix for positive definiteness isCholesky’s method (see Appendix A) which, when successful, is also usedfor solving the linear system in question. Thus, the check for definiteness isalmost for free.In Section 2.4 we introduce some methods, where the computation of thesearch direction hd and step length α is done simultaneously, and give aversion of (2.11) without line search. Such hybrid methods can be veryefficient, but they are hardly ever used. The reason is that they need an implementation of F 00 (x), and for complicated application problems this is notavailable. Instead we can use a so-called Quasi-Newton method, based onseries of matrices which gradually approach H F 00 (x ). In Section 3.4we present such a method. Also see Chapter 5 in Frandsen et al (2004).Figure 2.1. Variation of the costfunction along the search line.Our h being a descent direction ensures thatϕ 0 (0) h F 0 (x) 0 ,indicating that if α is sufficiently small, we satisfy the descending condition(2.1), which is equivalent toϕ(α) ϕ(0) .Often, we are given an initial guess on α, eg α 1 with Newton’s method.Figure 2.1 illustrates that three different situations can arise1 α is so small that the gain in value of the objective function is verysmall. α should be increased.2 α is too large: ϕ(α) ϕ(0). Decrease α in order to satisfy the descentcondition (2.1).3 α is close to the minimizer1) of ϕ(α). Accept this α-value.2.3. Line SearchGiven a point x and a descent direction h. The next iteration step is a movefrom x in direction h. To find out, how far to move, we study the variationof the given function along the half line from x in the direction h,1) More precisely: the smallest local minimizer of ϕ. If we increase α beyond the intervalshown in Figure 2.1, it may well happen that we get close to another local minimumfor F .

112. D ESCENT M ETHODSAn exact line search is an iterative process producing a series α1 , α2 . . . .The aim is to find the true minimizer αe defined in (2.7), and the algorithmstops when the iterate αs satisfies ϕ0 (αs ) τ ϕ0 (0) ,where τ is a small, positive number. In the iteration we can use approximations to the variation of ϕ(α) based on the computed values ofϕ(αk ) F (x αk h) andϕ0 (αk ) h F 0 (x αk h) .See Sections 2.5 – 2.6 in Frandsen et al (2004) for details.Exact line search can waste much computing time: When x is far from x the search direction h may be far from the direction x x, and there is noneed to find the true minimum of ϕ very accurately. This is the backgroundfor the so-called soft line search, where we accept an α-value if it does notfall in the categories 1 or 2 listed above. We use a stricter version of thedescending condition (2.1), viz0ϕ(αs ) ϕ(0) γ1 · ϕ (0) · αwith 0 γ1 1 .(2.13a)This ensures that we are not in case 2 . Case 1 corresponds to the point(α, ϕ(α)) being too close to the starting tangent, and we supplement withthe condition00ϕ (αs ) γ2 · ϕ (0) with γ1 γ2 1 .(2.13b)If the starting guess on α satisfies both these criteria, then we accept it asαs . Otherwise, we have to iterate as sketched for exact line search. Detailscan be seen in Section 2.5 of Frandsen et al (2004).Assume that we have a model L of the behaviour of F in the neighbourhoodof the current iterate x, F (x h) ' L(h) F (x) h c h B h ,1212where c IRn and the matrix B IRn n is symmetric. The basic ideas of thissection may be generalized to other forms of the model, but in this bookletwe only need the form of L given in (2.14). Typically, the model is a secondorder Taylor expansion of F around x, like the first three terms in the righthand side of (1.4a), or L(h) may be an approximation to this expansion. Itis generally true that such a model is good only when h is sufficiently small.We shall introduce two methods that include this aspect in the determinationof a step h, which is a descent direction and which can be used with α 1in Algorithm 2.4.In a trust region method we assume that we know a positive number suchthat the model is sufficiently accurate inside a ball with radius , centeredat x, and determine the step ash htr argminkhk {L(h)}.(2.15)In a damped method the step is determined ash hdm argminh {L(h) 12µ h h},where the damping parameter µ 0. The termto penalize large steps.12µ h h (2.16)12µ khk2 is seenThe central part of Algorithm 2.4 based on one of these methods has theformCompute h by (2.15) or (2.16)if F (x h) F (x)x : x hUpdate or µ(2.17)This corresponds to α 1 if the step h satisfies the descending condition(2.1). Otherwise, α 0, ie we do not move.2) However, we are not stuck2.4. Trust Region and Damped Methods 2.4. Trust Region and Damped Methods(2.14)2) There are versions of these methods that include a proper line search to find a pointx αh with smaller F -value, and information gathered during the line search is used inthe updating of or µ. For many problems such versions use fewer iteration steps but alarger accumulated number of function values.

132. D ESCENT M ETHODSat x (unless x x ): by a proper modification of or µ we aim at havingbetter luck in the next iteration step.Since L(h) is assumed to be a good approximation to F (x h) for h sufficiently small, the reason why the step failed is that h was too large, andshould be reduced. Further, if the step is accepted, it may be possible to usea larger step from the new iterate and thereby reduce the number of stepsneeded before we reach x .The quality of the model with the computed step can be evaluated by theso-called gain ratio% F (x) F (x h),L(0) L(h)(2.18)ie the ratio between the actual and predicted decrease in function value. Byconstruction the denominator is positive, and the numerator is negative ifthe step was not downhill – it was too large and should be reduced.With a trust region method we monitor the step length by the size of theradius . The following updating strategy is widely used,if % 0.25 : /2elseif % 0.75 : max{ , 3 khk}2.4. Trust Region and Damped Methodsif % 0.25µ : µ 2elseif % 0.75µ : µ/3In a damped method a small value of % indicates that we should increasethe damping factor and thereby increase the penalty on large steps. A largevalue of % indicates that L(h) is a good approximation to F (x h) for thecomputed h, and the damping may be reduced. A widely used strategy isthe following, which is similar to (2.19), and was was originally proposedby Marquardt (1963),(2.20)Again, the method is not sensitive to minor changes in the thresholds 0.25and 0.75 or the numbers p1 2 and p2 3, but it is important that the numbers p1 and p2 are chosen so that the µ-values cannot oscillate. Experienceshows that the discontinuous changes across the thresholds 0.25 and 0.75can give rise to a “flutter” (illustrated in Example 3.7 on page 27) that canslow down convergence, and we demonstrated in Nielsen (1999) that thefollowing strategy in general outperforms (2.20),if % 0µ : µ max{ 13 , 1 (2% 1)3 };elseµ : µ ν; ν : 2 νν : 2(2.21)The factor ν is initialized to ν 2. Notice that a series of consecutive failures results in rapidly increasing µ-values. The two updating formulas areillustrated below.µnew/µ(2.19)Thus, if % 14 , we decide to use smaller steps, while % 34 indicates that itmay be possible to use larger steps. A trust region algorithm is not sensitiveto minor changes in the thresholds 0.25 and 0.75, the divisor p1 2 or thefactor p2 3, but it is important that the numbers p1 and p2 are chosen sothat the -values cannot oscillate.14100.250.751%Figure 2.2. Updating of µ by (2.21) with ν 2 (full line)Marquardt’s strategy (2.20) (dasheded line).2.4.1. Computation of the step. In a damped method the step is computedas a stationary point for the functionψµ (h) L(h) 12µ h h ,

152. D ESCENT M ETHODS0 L (h) µh 0 ,and from the definition of L(h) in (2.14) we see that this is equivalent to(B µI)hdm c ,(2.22)where I is the identity matrix. If µ is sufficiently large, the symmetric matrixB µI is positive definite (shown in Appendix A), and then it follows fromTheorem 1.8 that hdm is a minimizer for L.Example 2.1. In a damped Newton method the model L(h) is given by c F 0 (x)and B F 00 (x), and (2.22) takes the form(F 00 (x) µI)hdn F 0 (x) .We return to damped methods in Section 3.2.In a trust region method the step htr is the solution to a constrained optimization problem,L(h)subject to h h 2 .and if this is sufficiently small (if it satisfies h h 2 ), then this is thedesired step, htr . Otherwise, the constraint is active, and the problem ismore complicated. With a similar argument as we used on page 11, we cansee that we do not have to compute the true solution to (2.23), and in Sections 3.3 and 3.4 we present two ways of computing an approximation tohtr .Finally, we present two similarities between a damped method and a trustregion method in the case where B is positive definite: In case the unconstrained minimizer is outside the trust region, it can be shown (Theorem2.11 in Madsen et al (2004)) that there exists a λ 0 such thatBhtr c λhtr .hdn is the so-called damped Newton step. If µ is very large, then1hdn ' F 0 (x) ,µie a short step in a direction close to the steepest descent direction. On the otherhand, if µ is very small, then hdn is close to the Newton step hn . Thus, we canthink of the damped Newton method as a hybrid between the steepest descentmethod and the Newton method.minimize16Bh c ,This means that hdm is a solution toψµ0 (h)2.4. Trust Region and Damped Methods(2.23)It is outside the scope of this booklet to discuss this problem in any detail(see Madsen et al (2004) or Section 4.1 in Nocedal and Wright (1999). Wejust want to mention a few properties.If the matrix B in (2.14) is positive definite, then the unconstrained minimizer of L is the solution to(2.24a)By reordering this equation and comparing it with (2.22) we see that htr isidentical with the damped step hdm computed with the damping parameterµ λ. On the other hand, one can also show (Theorem 5.11 in Frandsen etal (2004)) that if we compute hdm for a given µ 0, thenhdm argminkhk khdm k {L(h)} ,(2.24b)ie hdm is equal to htr corresponding to the trust region radius khdm k.Thus, the two classes of methods are closely related, but there is not a simpleformula for the connection between the - and µ-values that give the samestep.

3. L EAST S QUARES P ROBLEMS18that1)X fi F(x) fi (x)(x) . xj xji 1m(3.3)Thus, the gradient (1.4b) is3. N ON -L INEAR L EAST S QUARES P ROBLEMSIn the remainder of this lecture note we shall discuss methods for nonlinearleast squares problems. Given a vector function f : IRn 7 IRm with m n.We want to minimize kf (x)k, or equivalently to findx argminx {F (x)} ,(3.1a)whereF (x) 12mXF 0 (x) J(x) f (x) .We shall also need the Hessian of F . From (3.3) we see that the element inposition (j, k) is¶m µX fi 2 fi 2F fi(x) (x)(x) fi (x)(x) , xj xk xj xk xj xki 1showing thatF 00 (x) J(x) J(x) 2(fi (x)) 212 kf (x)k 12 f (x) f (x).(3.4a)mXfi (x)fi00 (x) .(3.4b)i 1(3.1b)i 1Least squares problems can be solved by general optimization methods, butwe shall present special methods that are more efficient. In many cases theyachieve better than linear convergence, sometimes even quadratic convergence, even though they do not need implementation of second derivatives.In the description of the methods in this chapter we shall need formulas forderivatives of F : Provided that f has continuous second partial derivatives,we can write its Taylor expansion asf (x h) f (x) J(x)h O(khk2 ) ,(3.2a)where J IRm n is the Jacobian. This is a matrix containing the first partialderivatives of the function components,(J(x))ij fi(x) . xj(3.2b)As regards F : IRn 7 IR, it follows from the first formulation in (3.1b),Example 3.1. The simplest case of (3.1) is when f (x) has the formf (x) b Ax ,where the vector b IRm and matrix A IRm n are given. We say that this is alinear least squares problem. In this case J(x) A for all x, and from (3.4a)we see thatF 0 (x) A (b Ax) .This is zero for x determined as the solution to the so-called normal equations,(A A)x A b .(3.5)The problem can be written in the formAx ' b ,and alternatively we can solve it via orthogonal transformation: Find an orthogonal matrix Q such that1) If we had not used the factor 1 in the definition (3.1b), we would have got an annoying2factor of 2 in a lot of expressions.

193. L EAST S QUARES P ROBLEMSQ A ·R0 Rx (Q b)1:n .This method is more accurate than the solution via the normal equations.In M ATLAB suppose that the arrays A and b hold the matrix A and vector b, respectively. Then the command A\b returns the least squares solution computedvia orthogonal transformation.As the title of the booklet suggests, we assume that f is nonlinear, and shall notdiscuss linear problems in detail. We refer to Chapter 2 in Madsen and Nielsen(2002) or Section 5.2 in Golub and Van Loan (1996).Example 3.2. In Example 1.1 we saw a nonlinear least squares problem arisingfrom data fitting. Another application is in the solution of nonlinear systems ofequations,wheref : IRn 7 IRn .We can use Newton-Raphson’s method: From an initial guess x0 we computex1 , x2 , . . . by the following algorithm, which is based on seeking h so thatf (x h) 0 and ignoring the term O(khk2 ) in (3.2a),Solve J(xk )hk f (xk ) for hkxk 1 xk hk .20since F (x ) 0 and F (x) 0 if f (x) 6 0. We may eg replace the updating ofthe approximate solution in (3.6) by,where R IRn n is upper triangular. The solution is found by back substitutionin the system2)f (x ) 0 ,3.1. Gauss–Newton Method(3.6)Here, the Jacobian J is given by (3.2b). If J(x ) is nonsingular, then themethod has quadratic final convergence, ie if dk kxk x k is small, thenkxk 1 x k O(d2k ). However, if xk is far from x , then we risk to get evenfurther away.We can reformulate the problem in a way that enables us to use all the “tools” thatwe are going to present in this chapter: A solution of (3.6) is a global minimizerof the function F defined by (3.1),F (x) 12 kf (x)k2 ,2) An expression like up:q is used to denote the subvector with elements ui , i p, . . . , q.The ith row and jth column of a matrix A is denoted Ai,: and A:,j , respectively.xk 1 xk αk hk ,where αk is found by line search applied to the function ϕ(α) F (xk αhk ).As a specific example we shall consider the following problem, taken from Powell (1970), ·x1,f (x) 10x1 2x22x1 0.1with x 0 as the only solution. The Jacobian is· 10J(x) ,(x1 0.1) 2 4x2which is singular at the solution.If we take x0 [ 3, 1 ] and use the above algorithm with exact line search,then the iterates converge to xc ' [ 1.8016, 0 ] , which is not a solution. Onthe other hand, it is easily seen that the iterates given by Algorithm (3.6) arexk [0, yk ] with yk 1 12 yk , ie we have linear convergence to the solution.In a number of examples we shall return to this problem to see how differentmethods handle it.3.1. The Gauss–Newton MethodThis method is the basis of the very efficient methods we will describe in thenext sections. It is based on implemented first derivatives of the componentsof the vector function. In special cases it can give quadratic convergence asthe Newton-method does for general optimization, see Frandsen et al (2004).The Gauss–Newton method is based on a linear approximation to the components of f (a linear model of f ) in the neighbourhood of x : For small khkwe see from the Taylor expansion (3.2) thatf (x h) ' (h) f (x) J(x)h .Inserting this in the definition (3.1) of F we see that(3.7a)

213. L EAST S QUARES P ROBLEMS F (x h) ' L(h) 12 (h) (h) 1 2f f h J f F (x) h J f 3.1. Gauss–Newton MethodTo see this, we compare the search directions used in the two methods,1 2 h J Jh1 2 h J JhF 00 (x)hn F 0 (x)(3.7b)(with f f (x) and J J(x)). The Gauss–Newton step hgn minimizes L(h),hgn argminh {L(h)} .F 00 (x) L 00 (h) L 00 (h) J J .(3.8)(3.9)This is a descent direction for F sincehgn F 0 (x) hgn (J f ) hgn (J J)hgn 0 .x : x αhgn{x F (x) F (x0 )} is bounded, andb)the Jacobian J(x) has full rank in all steps.(3.12)Therefore, if f (x ) 0, then L 00 (h) ' F 00 (x) for x close to x , and we getquadratic convergence also with the Gauss-Newton method. We can expectsuperlinear convergence if the functions {fi } have small curvatures or if the{ fi (x ) } are small, but in general we must expect linear convergence. It isremarkable that the value of F (x ) controls the convergence speed.Example 3.3. Consider the simple problem with n 1, m 2 given by ·x 1. F (x) 12 (x 1)2 12 (λx2 x 1)2 .f (x) λx2 x 1F 0 (x) 2λ2 x3 3λx2 2(λ 1)x ,so x 0 is a stationary point for F . Now,F 00 (x) 6λ2 x2 6λx 2(λ 1) .(3.11)where α is found by line search. The classical Gauss-Newton method usesα 1 in all steps. The method with line search can be shown to have guaranteed convergence, provided thata)fi (x)fi00 (x) .It follows that(3.10)Thus, we can use hgn for hd in Algorithm 2.4. The typical step isSolve (J J)hgn J fmXi 1Comparison with

For a least squares fit the parameters are determined as the minimizer x of the sum of squared residuals. This is seen to be a problem of the form in Defini-tion 1.1 with n 4. The graph of M(x ;t)is shown by full line in Figure 1.1. A least squares problem is a special variant of the more general problem: Given a function F:IR n7!

Related Documents:

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid

LÄS NOGGRANT FÖLJANDE VILLKOR FÖR APPLE DEVELOPER PROGRAM LICENCE . Apple Developer Program License Agreement Syfte Du vill använda Apple-mjukvara (enligt definitionen nedan) för att utveckla en eller flera Applikationer (enligt definitionen nedan) för Apple-märkta produkter. . Applikationer som utvecklas för iOS-produkter, Apple .

Linear Least Squares ! Linear least squares attempts to find a least squares solution for an overdetermined linear system (i.e. a linear system described by an m x n matrix A with more equations than parameters). ! Least squares minimizes the squared Eucliden norm of the residual ! For data fitting on m data points using a linear

SKF Linear Motion linear rail INA Linear SKF Linear Motion Star Linear Thomson Linear linear sysTems Bishop-Wisecarver INA Linear Pacific Bearing Thomson Linear mecHanical acTuaTors Duff-Norton Joyce/Dayton Nook Industries Precision mecHanical comPonenTs PIC Design Stock Drive Product

For each of the following PDEs, state its order and whether it is linear or non-linear. If it is linear, also state whether it is homogeneous or nonhomo-geneous: (a) uu x x2u yyy sinx 0: (b) u x ex 2u y 0: (c) u tt (siny)u yy etcosy 0: Solution. (a) Order 3, non-linear. (b) Order 1, linear, homogeneous. (c) Order 2, linear, non .