1y ago

0 Views

0 Downloads

302.13 KB

21 Pages

Transcription

Faster Lagrangian-Based Methods in Convex OptimizationShoham Sabach Marc Teboulle†October 27, 2020AbstractIn this paper, we aim at unifying, simplifying, and improving the convergence rate analysis of Lagrangianbased methods for convex optimization problems. We first introduce the notion of nice primal algorithmicmap, which plays a central role in the unification and in the simplification of the analysis of all Lagrangianbased methods. Equipped with a nice primal algorithmic map, we then introduce a versatile generic scheme,which allows for the design and analysis of Faster LAGrangian (FLAG) methods with a new provably sublinearrate of convergence expressed in terms of functions values and feasibility violation of the original (non-ergodic)generated sequence. To demonstrate the power and versatility of our approach and results, we show that allwell-known iconic Lagrangian-based schemes admit a nice primal algorithmic map, and hence share the newfaster rate of convergence results within their corresponding FLAG.2010 Mathematics Subject Classification: 90C25, 65K05.Keywords: Augmented Lagrangian, Lagrangian multiplier methods, proximal multiplier algorithms, convexcomposite minimization, alternating direction method of multiplier, non-smooth optimization, fast non-ergodicglobal rate of convergence.1IntroductionOver the last decade, we are witnessing a resurgence of Lagrangian-based methods. This surge of interestis due to the flexibility that they provide in exploiting special problem structures that abound in modernand disparate applications (e.g., image science, machine learning, communication systems). The AugmentedLagrangian methods (also known as multiplier methods) which are a typical prototype, were invented about halfa century ago by Hestenes [16] and Powell [20], are the core of the analysis and development of many extensionsof Lagrangian-based schemes capable of tackling optimization problems with various types of structures andconstraints. This is reflected by a rich and very large volume of literature on various practical and theoreticalaspects of many Lagrangian-based methods. Two classical references providing the central developments andideas underlying Lagrangian-based and related decomposition methods are the books of Bertsekas [3], andBertsekas and Tsitsiklis [4], which include a comprehensive list of relevant references for these developmentsuntil the nineties. For more modern developments, results, and applications, see e.g., the more recent works[5, 23, 7, 22], and references therein.A recent and voluminous literature on Lagrangian-based methods has focused on rate of convergence resultsfor the so-called Alternating Direction of Multipliers Method [12, 11, 10], with an almost absolute predominanceon ergodic type of rate of convergence results. Since the literature on this topic is very wide, let us just mentionthe first two central works [6, 14], which have obtained an ergodic rate of O(1/N ). For many more rate ofconvergence results of the ergodic type, and relevant references, we refer the readers to, e.g., [8], and the veryrecent monograph [18], which also includes an up to date extensive bibliography. Faculty of Industrial Engineering and Management, The Technion, Haifa, 32000, Israel. E-mail: ssabach@ie.technion.ac.il.School of Mathematical Sciences, Tel-Aviv University, Ramat-Aviv 69978, Israel. E-mail: teboulle@post.tau.ac.il. This researchwas partially supported by the Israel Science Foundation, under ISF Grants 1844-16, and 2619-20.†1

The focus of this paper is on faster and non-ergodic global rate of convergence analysis for Lagrangian-basedmethods applied to convex optimization, where the rate is measured in terms of function values and feasibilityviolation of the original produced sequence. The literature on papers obtaining non-ergodic rate of convergenceresults remains very limited, in particular, in term of the measures just mentioned. For instance, in [15], theauthors obtained a non-ergodic O(1/N ) result for the squared distance between two successive primal iterates,while in [6], a non-ergodic O(1/N 2 ) result, in the strongly convex setting, is proven for the squared distancebetween the primal iterates to an optimal primal solution. One exception is the very recent work [17], whichderives a new non-ergodic O(1/N ) rate of convergence result in terms of function values and feasibility violationof the original produced sequence for the specific Linearized ADMM, through a rather involved and nontrivialanalysis (compare with the results developed in Section 5 for this particular algorithm).The analysis of Lagrangian-based methods is usually intricate and complicated, which results in lengthyproofs. Recently, in [22], we have shown that all well-known Lagrangian-based methods and their decompositionvariants can be captured and analysed through a single scheme (that we called the Perturbed Proximal Methodof Multipliers, see details in [22]). It allows us to derive an ergodic O(1/N ) rate of convergence in terms offunction values and feasibility violation. Much like most of the existing literature, the approach we developedthere was limited to Lagrangian-based methods with classical sublinear ergodic type results, and hence naturallyleads to the question: Can we develop an algorithmic framework capable of producing non-ergodic rate as wellas faster Lagrangian-based schemes? In this paper, we provide a positive answer, within a surprisingly simpleand elegant approach. This is developed in four main sections which contents are now briefly outlined.The formulation of the problem under study, some basic preliminaries on Lagrangians in the convex setting,and some important examples of convex optimization problems which fit our model, can be found in Section2. The starting point of our study is that all Lagrangian-based methods can be simply described along thefollowing protocol. Consider the corresponding augmented Lagrangian of the problem at hand, which consistsof primal and dual (multiplier) variables, building a Lagrangian-based method requires to choose updatingrules for both the primal and the dual variables. Since, in all Lagrangian-based methods, the dual variable isupdated via an explicit formula, the main difference between Lagrangian-based methods is encapsulated in thechoice of a primal algorithmic map which updates the primal variables. This choice is governed by the problem’sstructure and data information, which must be best exploited, and can be seen as a single step of any appropriateoptimization method that applied on the augmented Lagrangian itself, or a variation of it. For instance, theclassical Augmented Lagrangian method [16, 20], is a typical example whereby the primal algorithmic map isnothing else but an exact minimization applied on the augmented Lagrangian itself associated to problem (P).Adopting this point of view for Lagrangian-based methods, we introduce in Section 3 the notion of nice primalalgorithmic map, which will plays a central role in unifying the analysis of all Lagrangian-based methods intoa unitary and simple framework. In turns, equipped with a nice primal algorithmic map, we propose a genericalgorithmic framework, called (FLAG) for Faster LAGrangian based methods that aims at achieving faster (aswell as classical) convergence rates of all Lagrangian-based methods at once. FLAG which is equipped with anice primal algorithmic maps allows for the design and analysis of faster schemes with new provable sublinearNon-ergodic rate of O(1/N 2 ) in the strongly convex case and a Non-ergodic O(1/N ) in the convex case (seeTheorem 4.1 and Theorem 4.2, respectively). As a by product, we also derive classical and faster ergodic resultsin Corollaries 4.1 and 4.2; see Section 4. To illustrate the power and versatility of FLAG and our new results,in Section 5, we consider various block and other models of problem (P) and demonstrate that all well-knowniconic schemes, e.g., Linearized Proximal Method of Multipliers, ADMM and its linearized versions, Jacobi typedecomposition, Predictor Corrector Proximal Multipliers, and many more, all admit a nice primal algorithmicmap, and hence can be used in FLAG to obtain new rate of convergence results. This provides a significant listof faster schemes for fundamental well-known Lagrangian-based methods.2

2Problem FormulationIn this paper, we consider the linearly constrained convex optimization model,min {Ψ (x) : Ax b, x Rn } ,(P)where Ψ : Rn ( , ] is a proper, lower semi-continuous and convex function, A : Rn Rm is alinear mapping, and b Rm . Throughout this work, the feasible set of problem (P) is denoted by F {x Rn : Ax b}.Despite its apparent simplicity, with an additional structural/data information on the objective functionor/and on the linear mapping (see examples below), the optimization model (P) essentially captures a verywide range of convex optimization problems arising in many modern applications. Since we are interested inanalyzing this model also in the strongly convex case, we instead assume that Ψ (·) is σ-strongly convex withσ 0 (i.e., when σ 0 we recover the convex setting). At this juncture, it is important to mention that inmany applied problems the function Ψ is defined by the sum of two or more functions, see e.g., [7]. Thereforein such cases, to warrant the σ-strong convexity of Ψ (·), it is enough to assume that one of the functions isσ-strongly convex, see Section 5.2 for more details.Let us illustrate few basic and important instances of model (P).Example 2.1 (Linear composite model). The linear composite optimization model, which naturally appearsin many applications, fuels very much the study of Lagrangian-based methods and is given bymin {f (u) g (Au)} ,u Rp(2.1)where f : Rp ( , ] and g : Rq ( , ] are proper, lower semi-continuous and convex functions,and A : Rp Rq is a linear mapping. Rewriting problem (2.1) as an equivalent linearly constrained optimizationproblem (which requires the definition of an auxiliary variable) yieldsminu Rp ,v Rq{f (u) g (v) : Au v} .We easily see that it fits into model (P), with x uT , v Tb 0). T, Ψ (x) : f (u) g (v) and Ax Au v (i.e.,Example 2.2 (Block linear constrained model). Another useful and more general model which includes (2.1)is the followingmin{f (u) g (v) : Au Bv b} ,(2.2)pqu R ,v Rwhere f : Rp ( , ] and g : Rq ( , ] are proper, lower semi-continuous and convex functions,A : Rp Rm and B : Rq Rm are linear mappings. Again we can easily see that it fits into model (P), with Tx uT , v T , Ψ (x) : f (u) g (v) and Ax Au Bv.Example 2.3 (Additive smooth/non-smooth composite model). A very classical model in optimization is ofadditive composite nature that include the sum of two functions one which is smooth and one which is notnecessarily smooth. Here we consider this model with an additional linear constraint, that ismin {f (x) h (x) : Ax b} ,x Rn(2.3)where f : Rp R is a proper, lower semi-continuous and convex function, while h : Rq ( , ] is acontinuously differentiable function with a Lipschitz continuous gradient. Again we easily see that it fits intomodel (P), with Ψ (x) : f (x) h (x), being the sum of a smooth function and a non-smooth function. Thisadditional structure in the objective function that can be beneficially exploited when developing an adequateLagrangian-based method, as we discuss in details in Section 5.3.3

We conclude this section with some basic properties of Lagrangians in convex optimization. For the equalityconstrained problem given in model (P), the Lagrangian L : Rn Rm ( , ] is defined byL (x, y) Ψ (x) hy, Ax bi ,(2.4)where y Rm is the Lagrange multiplier associated with the linear constraint Ax b. In this study we willalso use the augmented Lagrangian associated to problem (P) and defined byLρ (x, y) : L (x, y) ρρkAx bk2 Ψ (x) hy, Ax bi kAx bk2 ,22(2.5)where ρ 0 is a penalty parameter. Clearly, when ρ 0 we recover the Lagrangian itself, i.e., L L0 .Throughout this work we make the following standing assumption.Assumption A. The Lagrangian L has a saddle point, that is, there exists a pair (x , y ) such thatL (x , y) L (x , y ) L (x, y ) , x Rn , y Rm .(2.6)It is well-known (see [21]) that (x , y ) is a saddle point of L if and only if the following conditions hold:(i) x is a solution of problem (P), i.e., Ψ (x ) is finite valued and Ax b.(ii) The multiplier y is an optimal solution of the dual problem associated to problem (P), given by (D)d sup d (y) : Ψ AT y hy, bi ,y Rmand p : L (x , y ) d , where Ψ (·) stands for the Fenchel conjugate function of Ψ (see [21]).Thanks to the convexity of problem (P), recall that the existence of an optimal dual Lagrange multiplier y attached to the constraint Ax b is ensured under the standard constraint qualification for problem (P), thatcan be formulated as follows: there exists a x̄ ri(dom Ψ) satisfying Ax̄ b, and that strong duality holds,i.e., p d R. Moreover, the set of optimal dual solutions is non-empty, convex and compact, see e.g., [21].We end this section with some notations that will be used throughout the paper. We denote by Sn and Sn the sets ofpositive definite matrices, respectively. Given P Sn ,p all n n positive semi-definite matrices andkukP : hu, P ui stands for the semi-norm of u Rn . For any u, v, w Rn , we define the following quantity 1 P (u, v, w) : ku vk2P ku wk2P .(2.7)2In the case that P I, where In denotes the n n identity matrix, we simply denote : P .3A Unified Framework for Lagrangian-based MethodsThe starting point of our study is that all Lagrangian-based methods can be simply described along the followingunified approach. Considering the corresponding augmented Lagrangian of the problem at hand, which consistsof primal and dual (multiplier) variables, building a Lagrangian-based method requires to choose updating rulesfor both the primal and the dual variables. Since, in all Lagrangian-based methods, the dual variable is updatedvia an explicit formula (see (3.2) below), the main difference between Lagrangian-based methods is encapsulatedin the choice of a primal algorithmic map that updates the primal variables. Therefore, any Lagrangian-basedmethod updates a couple (z, y) by first choosing a primal algorithmic map P, and will update the primal anddual (multiplier) variables viaz P (z, y) , (3.1) y y µρ Az b ,4(3.2)

where µ 0 is a scaling parameter. The update of the primal variable which depends on the algorithmic mapP, can be seen as a single step of any optimization method that applied on the augmented Lagrangian itselfor a variation of it. For example, the classical Augmented Lagrangian method [16, 20], is a typical prototypewhere the mapping P is nothing but an exact minimization applied on the augmented Lagrangian itself.Adopting this point of view of Lagrangian-based methods, we propose a general framework that aims atachieving convergence rates (classical and fast) of all Lagrangian-based methods at once. Considering theoptimization problem (P) with the data [Ψ, A, b, σ] and its associated augmented Lagrangian Lρ (·), which isdefined in (2.5), we introduce the notion of nice primal algorithmic map, that plays a central role in unifyingthe analysis of all Lagrangian-based methods into a single and simple framework.Definition 3.1 (Nice primal algorithmic map). Given (z, λ) Rn Rm and the parameters ρ, t 0 we let ρt ρand τt t 1 (when σ 0) or ρt ρt and τt t (when σ 0). A primal algorithmic map Primt : Rn Rm Rn ,which is applied on the augmented Lagrangian Lρt (z, λ), that generates z via z Primt (z, λ), is called nice, if there exist a parameter δ (0, 1] and matrices P, Q Sn , such that for any ξ F we have τt Lρt z , λ Lρt (ξ, λ) τt P ξ, z, z z z22Q σξ z 22 δρtAz b22.(3.3)The essence of our ability to both unify the analysis, and to accelerate the convergence rate of all Lagrangianbased methods at once, comes from the algorithmic framework that we introduce now. This framework cruciallyuses as an input a nice primal algorithmic map Primt (·) with the induced parameter δ (0, 1] and matricesP, Q Sn .FLAG – Faster LAGrangian based method1. Input: The problem data [Ψ, A, b, σ], and a nice primal algorithmic map Primt (·). 2. Initialization: Set µ (0, δ] and ρ 0. Start with any x0 , z 0 , y 0 Rn Rn Rm , t0 1. 3. Iterations: Sequences xk , z k , y k k N and {tk }k N are generated as follows: for each k 03.1. If σ 0, let τk t 1k and ρk ρ, or, if σ 0, let τk tk and ρk ρtk . Compute λk y k ρk (tk 1) Axk b .(3.4)3.2. Update the sequence xk , z k , y k k Nby z k 1 Primk z k , λk , y k 1 y k µρk Az k 1 b , kk 1xk 1 1 t 1x t 1.kk z(3.5)(3.6)(3.7)3.3. Update the sequence {tk }k N by solving the equation tpk 1 tpk tp 1k 1 , i.e.,tk 1 tk 1,p 1 (convex case),q (1 1 4t2 )/2, p 2 (strongly convex case).k5(3.8)

Remark 3.1 (Few comments on the algorithmic framework FLAG). (i) Borrowing ideas from well-knownaccelerated first order minimization methods (see, for instance, [19, 1, 2]), here the sequence {tk }k Nplays a key role in accelerating the input map, i.e., the nice primal algorithmic map Primt (·). To thisend, the augmented parameter ρk and the prox parameter τk are determined and chosen through thedefinition of the recursion which defines the sequence {tk }k N (cf. (3.8)).(ii) Clearly, setting tk 1, for all k N, in FLAG, implies ρk ρ, λk y k , and xk z k , thus recovering theclassical basic Lagrangian-based method discussed above.(iii) A main new feature in our algorithmic framework FLAG is the introduction of the fundamental auxiliarysequence λk k N , which will enable us to derive new faster rate of convergence results in terms of theoriginal sequence produced by FLAG. As we shall see, in Section 4.3, when λk coincides with y k , onlyergodic type rates (classical and fast) will be obtained.In FLAG, once the nice algorithmic map Primt (·) has been chosen, i.e., a parameter δ (0, 1] and matrices P, Q Sn , have been determined, Definition 3.1 translates in terms of the iterative sequence xk , z k , y k k Nproduced by FLAG as follows: for each k 0 and ξ F, τ22σkLρk z k 1 , λk Lρk ξ, λk τk P ξ, z k , z k 1 z k 1 z k ξ z k 122Q2δρk Az k 1 b ,(3.9)2where τk t 1k and ρk ρ (when σ 0) or τk tk and ρk ρtk (when σ 0).To conclude this part, and before we prove that the property of being a nice primal algorithmic map (asdefined in Definition 3.1) is all we need to guarantee rate of convergence results (classical and fast), we wouldlike to note that there is no need to enter into the mechanics of the proofs to be presented next in order toderive theoretical guarantees of newly developed/analyzed Lagrangian-based methods, but just proving thatthe primal algorithmic map at hand is nice! This will be extensively illustrated in Section 5 whereby, afterproviding a recipe for obtaining rate of a given method, we will prove that all iconic Lagrangian-based schemesadmit a nice primal map, and hence through FLAG will generate new faster schemes.4Convergence Analysis of FLAGThe section is divided into three parts, where we start with two key lemmas, which will serve us in deriving themain non-ergodic rate of convergence results in Section 4.2, followed by the ergodic type results in Section 4.3.4.1Main Pillars of the AnalysisBefore proceeding, we recall the following fundamental Pythagoras three-points identity that is frequently usedbelow. Given any matrix P Sn , we have2 hu w, P (w v)i ku vk2P ku wk2P kv wk2P , u, v, w Rn .(4.1)We also recall the main parameter sequences introduced in FLAG and which, for convenience, will be used oftenand systematically as follows:ρk ρtp 1k , (ρ 0)and tpk tpk 1 tp 1k , (t 1 0 and t0 1),where p 1 (if σ 0) or p 2 (if σ 0), and τk t 1k if σ 0, or τk tk if σ 0. We will also often use thefollowing expression. For any k 0 and any ξ F we define, sk : Lρtpxk , η Lρtp (ξ, η) Lρtpxk , η Ψ(ξ).(4.2)k 1k 1k 16

We start with the first useful technical lemma. Lemma 4.1. Let xk , z k , y k k N be a sequence generated by FLAG. Then, for any ξ F, η Rm and k 0,we have σ 21ξ z k 1 η, y k , y k 1Lρk z k 1 , η Lρk (ξ, η) τk P ξ, z k , z k 1 2µρkDEpkk 1 ρtk 1 Ax b, Az b .xk , z k , y kProof. Since term (τk /2)z k 1 2zk Q,k Nis generated by FLAG, we obtain from (3.9) after omitting the non-negativethat σ ξ z k 1Lρk z k 1 , λk Lρk ξ, λk τk P ξ, z k , z k 1 22 δρkAz k 1 b22.From the multiplier update (3.6) and the three-points identity (4.1), we obtain, for all η Rm ,DEE 21 D11η y k , Az k 1 b y k 1 y kη y k , y k 1 y k η, y k , y k 1 µρkµρk2µρk µρ 21k η, y k , y k 1 Az k 1 b .µρk2(4.3)(4.4)Using the update rule of the sequence {tk }k N (cf. (3.8)) we have that ρk (tk 1) ρtpk 1 and thus λk y k ρk (tk 1) Axk b y k ρtpk 1 Axk b .Using this together with (4.4) yields, for all η Rm , thatDEDE DEη λk , Az k 1 b η y k , Az k 1 b ρtpk 1 Axk b, Az k 1 b µρDE 21k η, y k , y k 1 Az k 1 b ρtpk 1 Axk b, Az k 1 b . µρk2(4.5)Adding (4.3) to (4.5) yields (recall that µ δ) the desired result.From this result, we will get the (classical and fast) ergodic rate of convergence results to be presented inSection 4.3 (see Corollaries 4.1 and 4.2). However, in order to obtain the non-ergodic results in Section 4.2, thefollowing result is the main step towards the derivation of the promised (classical and fast) rate of convergenceresults. Lemma 4.2. Let xk , z k , y k k N be a sequence generated by FLAG. Then, for any ξ F, η Rm and k 0,we have ρ σ 21 τk ρkktpk sk 1 tpk 1 sk P ξ, z k , z k 1 ξ z k 1 η, y k , y k 1 ,ρ2ρµρ kwhere sk Lρtpx , η Lρtp (ξ, η) (cf. 4.2).k 1k 1 k 1k 1k 1 (cf. (3.7)) we obtain from the convexity of Ψ (·) thatProof. Since x 1 tk x t 1k z k 1k 1Ψ xk 1 1 t 1Ψx tΨz.kkTherefore, multiplying both sides by tpk (recalling that tpk tkp 1 tpk 1 ), we obtain tpk Ψ xk 1 Ψ (ξ) tpk 1 Ψ xk Ψ (ξ) tkp 1 Ψ z k 1 Ψ (ξ) .7

In addition, using again (3.7), yields thatDEDEDEk 1tpk η, Axk 1 b tpk 1 η, Axk b tp 1η,Az b.kCombining these two facts yields (using the definition of the Lagrangian L) tpk L xk 1 , η L (ξ, η) tpk 1 L xk , η L (ξ, η) tkp 1 L z k 1 , η L (ξ, η) .(4.6) kk 1 to obtainNow, we will again use that xk 1 1 t 1x t 1kk zDE222 2 2 1 1kk 1kk 1 t 21 ttAxk 1 b 1 t 1Ax bAz bAx b,Az b.kkkkp 1Multiplying both sides of the above equality by ρt2pand tpk tp 1 tpk 1 ) yieldsk /2 (recalling ρk ρtkkρt2pkAxk 1 b22ρt2p k 1 Axk b22ρk tp 1kAz k 1 b22DE ρk tpk 1 Axk b, Az k 1 b .(4.7) Therefore, adding (4.6) to (4.7) yields (recall that kAξ bk2 0) with sk Lρtpxk , η Lρtp (ξ, η),k 1k 1 DEtpk sk 1 tpk 1 sk tp 1Lρk z k 1 , η Lρk (ξ, η) ρk tpk 1 Axk b, Az k 1 b .k (recall that ρk ρtp 1From Lemma 4.1, after we multiplied both sides by tp 1kk ), we obtain that DE τ ρ ρ σkk kpk k 1k 1kk 1ξ,z,z ξ z k 1tp 1Lz,η L(ξ,η) ρtAx b,Az b ρρPkkkkk 1ρ2ρ 1 η, y k , y k 1 .µρBy combining the last two inequalities, we thus get ρ σ τk ρkktpk sk 1 tpk 1 sk P ξ, z k , z k 1 ξ z k 1ρ2ρ2 2 1 η, y k , y k 1 ,µρwhich proves the desired result.We conclude this section with the following simple result which summarizes some useful properties of thesequence {tk }k N (cf. (3.8)) that will be used below.Lemma 4.3. Let {tk }k N be the sequence of positive numbers generated from t0 1 via the equation tpk 1 tpk p 1, with p 1 or p 2. Then, for any k 0, we havetk 1(i) For p 1, we have tk k 1.(ii) For p 2, we have tk (k 1) /2. Moreover, one has t2k 1 t2k 2tk .Proof. The first item is immediate as well as the first part of the second item (see, for instance, [2, Lemma4.3]). Moreover, note that by the given recursion t2k 1 t2k tk 1 , one also has tk 1 tk tk 1 / (tk tk 1 ),and hence it follows that t2k 1 t2k tk 1 t2k 2tk , since tk 1.In the rest of this section we aim at proving rate of convergence results of FLAG. We will focus, in thispaper, on iteration complexity of Lagrangian-based methods, using the following two classical measures at agiven iterate k N: (i) Function values gap in terms of Ψ xk Ψ (x ).(ii) Feasibility violation of the constraints of problem (P) in terms of Axk b .When discussing the measures mentioned above for the iterates produced by FLAG, we will also distinguishour results between rates expressed in terms of the original produced sequence or of the ergodic sequence, asdefined below in subsection 4.3.8

4.2Non-Ergodic Rate of ConvergenceThroughout the rest of this section, recalling the fact that the set of optimal dual solutions is compact (seeSection 2), we take c 0 to be a constant for which c 2 ky k, where y is an optimal solution of the dualproblem.We are now ready to prove our main result in the strongly convex setting (i.e., p 2) on the sequence itself. Theorem 4.1 (A fast non-ergodic function values and feasibility violation rates). Let xk , z k , y k k N be asequence generated by FLAG. Suppose that σ 0 and 0 P (σ/2) In . Then, for any optimal solution x ofproblem (P), we havewhere Bρ,c (x ) : 4 x z 02P 1µρ Bρ,c (x )Ψ xN Ψ (x ) ,2N 2Bρ,c (x )AxN b ,cN 2 2 y0 c .(4.8)(4.9)Proof. From Lemma 4.2, after we substitute ξ x and p 2 (recall that ρk ρtk and τk tk ), we obtain t σ21 kx z k 1 η, y k , y k 1 .t2k sk 1 t2k 1 sk t2k P x , z k , z k 1 2µρUsing the definition of P and (see (2.7)) and the fact that P (σ/2) In we have that 2222t2k122 k k 1 k 1tk sk 1 tk 1 sk x zη y k η y k 1 x z tk x z 22µρPPP 22221 2 1kk2 k 1k 1t x zη y tk 1 x z η y ,2 k2µρPP2 where the last inequality follows from Lemma 4.3(ii). Summing this inequality for all k 0, 1, . . . , N 1 (recallthat t 1 0 and t0 1), it follows (where we use the definition of sN , see (4.2), for the first inequality below) 11 22x z0 P η y0 .t2N 1 Ψ xN Ψ (x ) η, AxN b t2N 1 sN t2 1 s0 22µρTherefore, by taking the maximum of both sides over kηk c, proves that (recall that t2N 1 N 2 /4 usingLemma 4.3(ii)) Bρ,c (x )Ψ xN Ψ (x ) c AxN b : β,2N 2from which the estimate (4.8) immediately follows. Moreover, since (x , y ) is a saddle point of problem (P),with c 2 ky k, using the above inequality it follows that cc AxN b Ψ (x ) Ψ xN β y , AxN b β AxN b β,2and the desired estimate (4.9) is proved.Our main result in the convex setting (i.e., σ 0 and p 1) on the sequence itself is recorded next. Theorem 4.2 (A non-ergodic function values and feasibility violation rates). Let xk , z k , y k k N be a sequencegenerated by FLAG and suppose that σ 0. Then, for any optimal solution x of problem (P), we havewhere Bρ,c (x ) : 2 x z 02P 1µρ Bρ,c (x )Ψ xN Ψ (x ) ,2NBρ,c (x )AxN b ,cN 2 y0 c .9(4.10)(4.11)

Proof. From Lemma 4.2, after we substitute ξ x , σ 0 and p 1 (recall that ρk ρ and τk t 1k ), weobtain 1 k k 1tk sk 1 tk 1 sk t 1 η, y k , y k 1 .k P x , z , zµρUsing the definition of P and (see (2.7)) we have that 222211 k 1k 1 kk x z η yx zη ytk sk 1 tk 1 sk 2tk2µρPP 22221 111 , x z k x z k 1η y k η y k 12 tktk 12µρPP 1where the second inequality follows from the fact that t 1k tk 1 . Summing this inequality for all k 0, 1, . . . , N 1 (recall that t 1 0 and t0 1) it follows 1 122tN 1 Ψ xN Ψ (x ) η, AxN b tN 1 sN t 1 s0 tN 1 sN x z0 P η y0 .22µρTherefore, by taking the maximum of both sides over kηk c, proves that (recall that tN 1 N using Lemma4.3(i)) Bρ,c (x ),Ψ xN Ψ (x ) c AxN b 2Nand the desired results follow exactly as done at the end of the proof of Theorem 4.1.4.3Classical Ergodic Rate of ConvergenceAlthough deriving ergodic type results was not our primary goal, to conclude this section we further illustratethe versatility of our framework by providing the fast and classical ergodic rate of convergence results. Beforedoing so, it should be noted that in the setting of producing such weaker results of ergodic type, two mainfeatures of our main framework FLAG are simplified as follows: (i) The auxiliary sequence λk k N is simply replaced by the multiplier sequence y k k N , i.e., the sequence k kx k N plays no role. Thus, z k N is the only primal sequence, so that the main iteration (3.5) reads now z k 1 Primk z k , y k .(ii) Only the parameter ρk and τk are still used to include both the fast and classical results.According to the above, since we replace the auxiliary multiplier λk with y k , it is easy to verify from the proofof Lemma 4.1 (by combining (4.3) and (4.4) there) that in this case we have σ 21Lγk z k 1 , η Lγk (ξ, η) τk P ξ, z k , z k 1 (4.12)ξ z k 1 η, y k , y k 1 ,2µρkwhere γk : (1 δ µ) ρk 0. Therefore, in this case we can consider scaling parameters µ in the largerinterval (0, 1 δ], instead of (0, δ] as stated in the original version of FLAG above.We begin with the result in the strongly convex setting (i.e., p 2). Corollary 4.1 (A fast ergodic function values and feasibility violation rates). Let z k , y k k N be a sequencegenerated by FLAG. Suppose that σ 0 and 0 P (σ/2)PIn . Then, for any optimal solution z of problemN 1k 1(P), the following holds for the ergodic sequence z̄ N t 2k 0 tk zN 1where Bρ,c (z ) : 4 z z02P 1µρ Bρ,c (z )Ψ z̄ N Ψ (z ) ,2N 2Bρ,c (z ),Az̄ N b cN 2

In this paper, we aim at unifying, simplifying, and improving the convergence rate analysis of Lagrangian-based methods for convex optimization problems. We rst introduce the notion of nice primal algorithmic map, which plays a central role in the uni cation and in the simpli cation of the analysis of all Lagrangian-based methods.

Related Documents: