Convex Optimization: Modeling And Algorithms

2y ago
32 Views
2 Downloads
678.54 KB
182 Pages
Last View : 2m ago
Last Download : 3m ago
Upload by : Kaleb Stephen
Transcription

Convex Optimization: Modeling andAlgorithmsLieven VandenbergheElectrical Engineering Department, UC Los AngelesTutorial lectures, 21st Machine Learning Summer SchoolKyoto, August 29-30, 2012

Convex optimization — MLSS 2012Introduction mathematical optimization linear and convex optimization recent history1

Mathematical optimizationminimizef0(x1, . . . , xn)subject to f1(x1, . . . , xn) 0···fm(x1, . . . , xn) 0 a mathematical model of a decision, design, or estimation problem finding a global solution is generally intractable even simple looking nonlinear optimization problems can be very hardIntroduction2

The famous exception: Linear programmingminimize c1x1 · · · c2x2subject to a11x1 · · · a1nxn b1.am1x1 · · · amnxn bm widely used since Dantzig introduced the simplex algorithm in 1948 since 1950s, many applications in operations research, networkoptimization, finance, engineering, combinatorial optimization, . . . extensive theory (optimality conditions, sensitivity analysis, . . . ) there exist very efficient algorithms for solving linear programsIntroduction3

Convex optimization problemminimize f0(x)subject to fi(x) 0,i 1, . . . , m objective and constraint functions are convex: for 0 θ 1fi(θx (1 θ)y) θfi(x) (1 θ)fi(y) can be solved globally, with similar (polynomial-time) complexity as LPs surprisingly many problems can be solved via convex optimization provides tractable heuristics and relaxations for non-convex problemsIntroduction4

History 1940s: linear programmingminimize cT xsubject to aTi x bi,i 1, . . . , m 1950s: quadratic programming 1960s: geometric programming 1990s: semidefinite programming, second-order cone programming,quadratically constrained quadratic programming, robust optimization,sum-of-squares programming, . . .Introduction5

New applications since 1990 linear matrix inequality techniques in control support vector machine training via quadratic programming semidefinite programming relaxations in combinatorial optimization circuit design via geometric programming ℓ1-norm optimization for sparse signal reconstruction applications in structural optimization, statistics, signal processing,communications, image processing, computer vision, quantuminformation theory, finance, power distribution, . . .Introduction6

Advances in convex optimization algorithmsinterior-point methods 1984 (Karmarkar): first practical polynomial-time algorithm for LP 1984-1990: efficient implementations for large-scale LPs around 1990 (Nesterov & Nemirovski): polynomial-time interior-pointmethods for nonlinear convex programming since 1990: extensions and high-quality software packagesfirst-order algorithms fast gradient methods, based on Nesterov’s methods from 1980s extend to certain nondifferentiable or constrained problems multiplier methods for large-scale and distributed optimizationIntroduction7

Overview1. Basic theory and convex modeling convex sets and functions common problem classes and applications2. Interior-point methods for conic optimization conic optimization barrier methods symmetric primal-dual methods3. First-order methods (proximal) gradient algorithms dual techniques and multiplier methods

Convex optimization — MLSS 2012Convex sets and functions convex sets convex functions operations that preserve convexity

Convex setcontains the line segment between any two points in the setx1, x2 C,convexConvex sets and functions0 θ 1 not convexθx1 (1 θ)x2 Cnot convex8

Basic examplesaffine set: solution set of linear equations Ax bhalfspace: solution of one linear inequality aT x b (a 6 0)polyhedron: solution of finitely many linear inequalities Ax bellipsoid: solution of positive definite quadratic inquality(x xc)T A(x xc) 1(A positive definite)norm ball: solution of kxk R (for any norm)positive semidefinite cone: Sn {X Sn X 0}the intersection of any number of convex sets is convexConvex sets and functions9

Example of intersection propertyC {x Rn p(t) 1 for t π/3}where p(t) x1 cos t x2 cos 2t · · · xn cos nt210x2p(t)1 1C0 10π/3t2π/3π 2 2 1x0112C is intersection of infinitely many halfspaces, hence convexConvex sets and functions10

Convex functiondomain dom f is a convex set and Jensen’s inequality holds:f (θx (1 θ)y) θf (x) (1 θ)f (y)for all x, y dom f , 0 θ 1(y, f (y))(x, f (x))f is concave if f is convexConvex sets and functions11

Examples linear and affine functions are convex and concave exp x, log x, x log x are convex xα is convex for x 0 and α 1 or α 0; x α is convex for α 1 norms are convex quadratic-over-linear function xT x/t is convex in x, t for t 0 geometric mean (x1x2 · · · xn)1/n is concave for x 0 log det X is concave on set of positive definite matrices log(ex1 · · · exn ) is convexConvex sets and functions12

Epigraph and sublevel setepigraph: epi f {(x, t) x dom f, f (x) t}epi fa function is convex if and only itsepigraph is a convex setfsublevel sets: Cα {x dom f f (x) α}the sublevel sets of a convex function are convex (converse is false)Convex sets and functions13

Differentiable convex functionsdifferentiable f is convex if and only if dom f is convex andf (y) f (x) f (x)T (y x)for all x, y dom ff (y)f (x) f (x)T (y x)(x, f (x))twice differentiable f is convex if and only if dom f is convex and 2f (x) 0 for all x dom fConvex sets and functions14

Establishing convexity of a function1. verify definition2. for twice differentiable functions, show 2f (x) 03. show that f is obtained from simple convex functions by operationsthat preserve convexity nonnegative weighted sumcomposition with affine functionpointwise maximum and supremumminimizationcompositionperspectiveConvex sets and functions15

Positive weighted sum & composition with affine functionnonnegative multiple: αf is convex if f is convex, α 0sum: f1 f2 convex if f1, f2 convex (extends to infinite sums, integrals)composition with affine function: f (Ax b) is convex if f is convexexamples logarithmic barrier for linear inequalitiesf (x) mXi 1log(bi aTi x) (any) norm of affine function: f (x) kAx bkConvex sets and functions16

Pointwise maximumf (x) max{f1(x), . . . , fm(x)}is convex if f1, . . . , fm are convexexample: sum of r largest components of x Rnf (x) x[1] x[2] · · · x[r]is convex (x[i] is ith largest component of x)proof:f (x) max{xi1 xi2 · · · xir 1 i1 i2 · · · ir n}Convex sets and functions17

Pointwise supremumg(x) sup f (x, y)y Ais convex if f (x, y) is convex in x for each y Aexamples maximum eigenvalue of symmetric matrixλmax(X) sup y T Xykyk2 1 support function of a set CSC (x) sup y T xy CConvex sets and functions18

Minimizationh(x) inf f (x, y)y Cis convex if f (x, y) is convex in (x, y) and C is a convex setexamples distance to a convex set C: h(x) inf y C kx yk optimal value of linear program as function of righthand sideh(x) infy:Ay xcT yfollows by takingf (x, y) cT y,Convex sets and functionsdom f {(x, y) Ay x}19

Compositioncomposition of g : Rn R and h : R R:f (x) h(g(x))f is convex ifg convex, h convex and nondecreasingg concave, h convex and nonincreasing(if we assign h(x) for x dom h)examples exp g(x) is convex if g is convex 1/g(x) is convex if g is concave and positiveConvex sets and functions20

Vector compositioncomposition of g : Rn Rk and h : Rk R:f (x) h(g(x)) h (g1(x), g2(x), . . . , gk (x))f is convex ifgi convex, h convex and nondecreasing in each argumentgi concave, h convex and nonincreasing in each argument(if we assign h(x) for x dom h)examplelogmPexp gi(x) is convex if gi are convexi 1Convex sets and functions21

Perspectivethe perspective of a function f : Rn R is the function g : Rn R R,g(x, t) tf (x/t)g is convex if f is convex on dom g {(x, t) x/t dom f, t 0}examples perspective of f (x) xT x is quadratic-over-linear functionxT xg(x, t) t perspective of negative logarithm f (x) log x is relative entropyg(x, t) t log t t log xConvex sets and functions22

Conjugate functionthe conjugate of a function f isf (y) sup (y T x f (x))x dom ff (x)xyx(0, f (y))f is convex (even if f is not)Convex sets and functions23

Examplesconvex quadratic function (Q 0)1 Tf (x) x Qx21 T 1f (y) y Q y2 negative entropyf (x) nXf (y) xi log xii 1i 1normf (x) kxk f (y) indicator function (C convex) 0x Cf (x) IC (x) otherwiseConvex sets and functionsnX e yi 10kyk 1 otherwisef (y) sup y T xx C24

Convex optimization — MLSS 2012Convex optimization problems linear programming quadratic programming geometric programming second-order cone programming semidefinite programming

Convex optimization problemminimize f0(x)subject to fi(x) 0,Ax bi 1, . . . , mf0, f1, . . . , fm are convex functions feasible set is convex locally optimal points are globally optimal tractable, in theory and practiceConvex optimization problems25

Linear program (LP)minimize cT x dsubject to Gx hAx b inequality is componentwise vector inequality convex problem with affine objective and constraint functions feasible set is a polyhedron cPConvex optimization problemsx 26

Piecewise-linear minimizationminimize f (x) max (aTi x bi)i 1,.,mf (x)aTi x bixequivalent linear programminimize tsubject to aTi x bi t,i 1, . . . , man LP with variables x, t RConvex optimization problems27

ℓ1-Norm and ℓ -norm minimizationℓ1-norm approximation and equivalent LP (kyk1 minimize kAx bk1minimizenXPk yk )yii 1subject to y Ax b yℓ -norm approximation (kyk maxk yk )minimize kAx bk minimize ysubject to y1 Ax b y1(1 is vector of ones)Convex optimization problems28

example: histograms of residuals Ax b (with A is 200 80) forxls argmin kAx bk2,xℓ1 argmin kAx bk110864201.51.00.50.00.51.01.5(Axls b)k1008060401.0 01.5 200.50.00.51.01.5(Axℓ1 b)k1-norm distribution is wider with a high peak at zeroConvex optimization problems29

Robust regression25201510f (t)5 015 20 10 10 550t510 42 points ti, yi (circles), including two outliers function f (t) α βt fitted using 2-norm (dashed) and 1-normConvex optimization problems30

Linear discrimination given a set of points {x1, . . . , xN } with binary labels si { 1, 1} find hyperplane aT x b 0 that strictly separates the two classesa T xi b 0if si 1aT xi b 0 if si 1homogeneous in a, b, hence equivalent to the linear inequalities (in a, b)si(aT xi b) 1,Convex optimization problemsi 1, . . . , N31

Approximate linear separation of non-separable setsminimizeNXi 1max{0, 1 si(aT xi b)} a piecewise-linear minimization problem in a, b; equivalent to an LP can be interpreted as a heuristic for minimizing #misclassified pointsConvex optimization problems32

Quadratic program (QP)minimize (1/2)xT P x q T x rsubject to Gx h P Sn , so objective is convex quadratic minimize a convex quadratic function over a polyhedron f0(x )x PConvex optimization problems33

Linear program with random costminimize cT xsubject to Gx h c is random vector with mean c̄ and covariance Σ hence, cT x is random variable with mean c̄T x and variance xT Σxexpected cost-variance trade-offminimize E cT x γ var(cT x) c̄T x γxT Σxsubject to Gx hγ 0 is risk aversion parameterConvex optimization problems34

Robust linear discriminationH1 {z aT z b 1}H 1 {z aT z b 1}distance between hyperplanes is 2/kak2to separate two sets of points by maximum margin,minimizekak22 aT asubject to si(aT xi b) 1,i 1, . . . , Na quadratic program in a, bConvex optimization problems35

Support vector classifierminimize γkak22 NXi 1γ 0max{0, 1 si(aT xi b)}γ 10equivalent to a quadratic programConvex optimization problems36

Kernel formulationminimize f (Xa) kak22 variables a Rn X RN n with N n and rank Nchange of variablesy Xa,a X T (XX T ) 1y a is minimum-norm solution of Xa y gives convex problem with N variables yminimize f (y) y T Q 1yQ XX T is kernel matrixConvex optimization problems37

Total variation signal reconstructionminimize kx̂ xcork22 γφ(x̂) xcor x v is corrupted version of unknown signal x, with noise v variable x̂ (reconstructed signal) is estimate of x φ : Rn R is quadratic or total variation smoothing penaltyφquad(x̂) n 1Xi 1Convex optimization problems(x̂i 1 x̂i)2,φtv (x̂) n 1Xi 1 x̂i 1 x̂i 38

example: xcor, and reconstruction with quadratic and t.v. smoothingxcor2quad. t.v. 0202i 020i quadratic smoothing smooths out noise and sharp transitions in signal total variation smoothing preserves sharp transitions in signalConvex optimization problems39

Geometric programmingposynomial functionf (x) KXk 1aack x1 1k x2 2k · · · xannk ,dom f Rn with ck 0geometric program (GP)minimize f0(x)subject to fi(x) 1,i 1, . . . , mwith fi posynomialConvex optimization problems40

Geometric program in convex formchange variables toyi log xi,and take logarithm of cost, constraintsgeometric program in convex form:KX!exp(aT0k y b0k )k 1!KXexp(aTik y bik ) 0,subject to logminimizelogi 1, . . . , mk 1bik log cikConvex optimization problems41

Second-order cone program (SOCP)minimize f T xsubject to kAix bik2 cTi x di, k · k2 is Euclidean norm kyk2 pi 1, . . . , my12 · · · yn2 constraints are nonlinear, nondifferentiable, convex1constraints are inequalitiesw.r.t. second-order cone:nyq2y12 · · · yp 1 ypy30.5o01100y2Convex optimization problems 1 1y142

Robust linear program (stochastic)minimize cT xsubject to prob(aTi x bi) η,i 1, . . . , m ai random and normally distributed with mean āi, covariance Σi we require that x satisfies each constraint with probability exceeding ηη 10%Convex optimization problemsη 50%η 90%43

SOCP formulationthe ‘chance constraint’ prob(aTi x bi) η is equivalent to the constraint1/2āTi x Φ 1(η)kΣi xk2 biΦ is the (unit) normal cumulative density function1Φ(t)η0.50Φ 1(η)0trobust LP is a second-order cone program for η 0.5Convex optimization problems44

Robust linear program (deterministic)minimize cT xsubject to aTi x bi for all ai Ei,i 1, . . . , m ai uncertain but bounded by ellipsoid Ei {āi Piu kuk2 1} we require that x satisfies each constraint for all possible aiSOCP formulationminimize cT xsubject to āTi x kPiT xk2 bi,i 1, . . . , mfollows fromsup (āi Piu)T x āTi x kPiT xk2kuk2 1Convex optimization problems45

Examples of second-order cone constraintsconvex quadratic constraint (A LLT positive definite)xT Ax 2bT x c 0mLT x L 1b2 (bT A 1b c)1/2extends to positive semidefinite singular Ahyperbolic constraintxT x yz, Convex optimization problems2xy z m2y, z 0 y z,y, z 046

Examples of SOC-representable constraintspositive powersx1.5 t, z :x2 tz,mx 0z 2 x,x, z 0 two hyperbolic constraints can be converted to SOC constraints extends to powers xp for rational p 1negative powersx 3 t, z :1 tz,x 0mz 2 tx,x, z 0 two hyperbolic constraints on r.h.s. can be converted to SOC constraints extends to powers xp for rational p 0Convex optimization problems47

Semidefinite program (SDP)minimize cT xsubject to x1A1 x2A2 · · · xnAn B A1, A2, . . . , An, B are symmetric matrices inequality X Y means Y X is positive semidefinite, i.e.,Tz (Y X)z Xi,j(Yij Xij )zizj 0 for all z includes many nonlinear constraints as special casesConvex optimization problems48

Geometry1z0.5 x yy z 001100.5y 1 0x a nonpolyhedral convex cone feasible set of a semidefinite program is the intersection of the positivesemidefinite cone in high dimension with planesConvex optimization problems49

ExamplesA(x) A0 x1A1 · · · xmAm(Ai Sn)eigenvalue minimization (and equivalent SDP)minimize λmax(A(x))minimize tsubject to A(x) tImatrix-fractional functionminimize bT A(x) 1bsubject to A(x) 0Convex optimization problemsminimizesubject tot A(x) bbTt 050

Matrix norm minimization(Ai Rp q )A(x) A0 x1A1 x2A2 · · · xnAnmatrix norm approximation (kXk2 maxk σk (X))minimize kA(x)k2minimizesubject tonuclear norm approximation (kXk minimize kA(x)k Convex optimization problemsPt tIA(x)A(x)tIT 0k σk (X))minimize(tr U tr V )/2 TUA(x)subject to 0A(x)V51

Semidefinite relaxationsemidefinite programming is often used to find good bounds for nonconvex polynomial problems, via relaxation as a heuristic for good suboptimal pointsexample: Boolean least-squaresminimize kAx bk22subject to x2i 1, i 1, . . . , n basic problem in digital communications could check all 2n possible values of x { 1, 1}n . . . an NP-hard problem, and very hard in generalConvex optimization problems52

LiftingBoolean least-squares problemminimize xT AT Ax 2bT Ax bT bsubject to x2i 1, i 1, . . . , nreformulation: introduce new variable Y xxTminimize tr(AT AY ) 2bT Ax bT bsubject to Y xxTdiag(Y ) 1 cost function and second constraint are linear (in the variables Y , x) first constraint is nonlinear and nonconvex. . . still a very hard problemConvex optimization problems53

Relaxationreplace Y xxT with weaker constraint Y xxT to obtain relaxationminimize tr(AT AY ) 2bT Ax bT bsubject to Y xxTdiag(Y ) 1 convex; can be solved as a semidefinite programY xxT YxTx1 0 optimal value gives lower bound for Boolean LS problem if Y xxT at the optimum, we have solved the exact problem otherwise, can use randomized roundinggenerate z from N (x, Y xxT ) and take x sign(z)Convex optimization problems54

Example0.5frequency0.4SDP boundLS solution0.30.20.1011.2kAx bk2/(SDP bound) n 100: feasible set has 2100 1030 points histogram of 1000 randomized solutions from SDP relaxationConvex optimization problems55

Overview1. Basic theory and convex modeling convex sets and functions common problem classes and applications2. Interior-point methods for conic optimization conic optimization barrier methods symmetric primal-dual methods3. First-order methods (proximal) gradient algorithms dual techniques and multiplier methods

Convex optimization — MLSS 2012Conic optimization definitions and examples modeling duality

Generalized (conic) inequalitiesconic inequality: a constraint x K with K a convex cone in Rmwe require that K is a proper cone: closed pointed: does not contain a line (equivalently, K ( K) {0} with nonempty interior: int K 6 (equivalently, K ( K) Rm)notationx K y x y K,x K y x y int Ksubscript in K is omitted if K is clear from the contextConic optimization56

Cone linear programminimize cT xsubject to Ax K bif K is the nonnegative orthant, this is a (regular) linear programwidely used in recent literature on convex optimization modeling: a small number of ‘primitive’ cones is sufficient to expressmost convex constraints that arise in practice algorithms: a convenient problem format when extending interior-pointalgorithms for linear programming to convex optimizationConic optimization57

Norm coneK (x, y) Rm 1 R kxk y y10.501100x2 1 1x1for the Euclidean norm this is the second-order cone (notation: Qm)Conic optimization58

Second-order cone programminimizecT xsubject to kBk0x dk0k2 Bk1x dk1,k 1, . . . , rcone LP formulation: express constraints as Ax K bK Q m1 · · · Q mr , A B10 B11. Br0 Br1 , b d10d11.dr0dr1 (assuming Bk0, dk0 have mk 1 rows)Conic optimization59

Vector notation for symmetric matrices

1. Basic theory and convex modeling convex sets and functions common problem classes and applications 2. Interior-point methods for conic optimization conic optimization barrier methods symmetric primal-dual methods 3. First-order methods (proximal) gradient alg

Related Documents:

Convex Optimization Theory Athena Scientific, 2009 by Dimitri P. Bertsekas Massachusetts Institute of Technology Supplementary Chapter 6 on Convex Optimization Algorithms This chapter aims to supplement the book Convex Optimization Theory, Athena Scientific, 2009 with material on convex optimization algorithms. The chapter will be .

Convex obj non-convex domain. Introduction Generic Problem: min Q(x); s:t: x 2F; Q(x) convex, especially: convex quadratic F nonconvex Examples: F is a mixed-integer set F is constrained in a nasty way, e.g. x 1 3 sin(x 2) 2cos(x 3) 4 Bienstock, Michalka Columbia Convex obj non-convex domain.

Convex optimization – Boyd & Vandenberghe Nonlinear programming – Bertsekas Convex Analysis – Rockafellar Fundamentals of convex analysis – Urruty, Lemarechal Lectures on modern convex optimization – Nemirovski Optimization for Machine Learning – Sra, Nowozin, Wright Theory of Convex Optimization for Machine Learning – Bubeck .

Convex optimization { Boyd & Vandenberghe (BV) Introductory lectures on convex optimisation { Nesterov Nonlinear programming { Bertsekas Convex Analysis { Rockafellar Numerical optimization { Nocedal & Wright Lectures on modern convex optimization { Nemirovski Optimization for Machine Learning { Sra, Nowozin, Wright

Solution. We prove the rst part. The intersection of two convex sets is convex. There-fore if Sis a convex set, the intersection of Swith a line is convex. Conversely, suppose the intersection of Swith any line is convex. Take any two distinct points x1 and x2 2 S. The intersection of Swith the line through x1 and x2 is convex.

What is convex projective geometry? Motivation from hyperbolic geometry Nice Properties of Hyperbolic Space Convex: Intersection with projective lines is connected. Properly Convex: Convex and closure is contained in an affine patch ()Disjoint from some projective hyperplane. Strictly Convex: Properly convex and boundary contains

Convex optimization is still important for many nonconvex problems: . Some Convex Optimization References D. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods, Athena Scientific, 1996. . R.T. ROCKAFELLAR, Convex Analysis, Princeton University Press, Princeton, NJ, 1970. 24

The optimization problem (1.1) is convex if every function involved f 0;f 1;:::;f m, is convex. The examples presented in section (1.1.2) are all convex. Examples of non-convex problems include combinatorial optimization problems, where (some if not all) variables are constrained to be bo