A Convex Upper Bound On The Log-Partition Function For .

2y ago
18 Views
2 Downloads
255.50 KB
22 Pages
Last View : 14d ago
Last Download : 3m ago
Upload by : Pierre Damon
Transcription

A Convex Upper Bound on the Log-Partition Functionfor Binary Graphical ModelsLaurent El GhaouiElectrical Engineering and Computer SciencesUniversity of California at BerkeleyTechnical Report No. /TechRpts/2007/EECS-2007-146.htmlDecember 10, 2007

Copyright 2007, by the author(s).All rights reserved.Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission.AcknowledgementResearch supported in part by NSF grant DMS-0625371.

A Convex Upper Bound on the Log-Partition Functionfor Binary Graphical Models Laurent El Ghaouielghaoui@eecs.berkeley.eduDecember 10, 2007AbstractWe consider the problem of bounding from above the log-partition function corresponding tosecond-order Ising models for binary distributions. We introduce a new bound, the cardinalitybound, which can be computed via convex optimization. The corresponding error on the logpartition function is bounded above by twice the distance, in model parameter space, to a classof “standard” Ising models, for which variable inter-dependence is described via a simple meanfield term. In the context of maximum-likelihood, using the new bound instead of the exactlog-partition function, while constraining the distance to the class of standard Ising models,leads not only to a good approximation to the log-partition function, but also to a model that isparsimonious, and easily interpretable. We compare our bound with the log-determinant boundintroduced by Wainwright and Jordan (2006), and show that when the l1 -norm of the modelparameter vector is small enough, the latter is outperformed by the new bound. Research supported in part by NSF grant DMS-0625371.1

Contents1 Introduction1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.2 Prior work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.3 Main results and outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33442 The2.12.22.3Cardinality BoundThe maximum bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .The cardinality bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Quality analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55673 The3.13.23.3Pseudo Maximum-Likelihood Problem9Tractable formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Dual and interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Ensuring quality via bounds on Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Links with the Log-Determinant Bound4.1 The log-determinant bounds . . . . . . .4.2 Comparison with the maximum bound .4.3 Summary of comparison results . . . . .4.4 A numerical experiment . . . . . . . . .5 The5.15.25.3.1111121313Case of Standard Ising Models13Log-partition function and maximum-likelihood problem . . . . . . . . . . . . . . . . 14Maximum and cardinality bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15The relaxed log-determinant bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Extensions166.1 Partition bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166.2 Worst-case probability bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17A Lipschitz Continuity of the Log-Partition Function18B Dual to the Log-Determinant Relaxation19C Valid Equalities202

1Introduction1.1Problem statementThis note is motivated by the problem fitting binary distributions to experimental data. In thesecond-order Ising model, the fitted distribution p is assumed to have the parametric formp(x; Q, q) exp(xT Qx q T x Z(Q, q)), x {0, 1}n ,where Q QT Rn and q Rn contain the parameters of the model, and Z(Q, q), the normalization constant, is called the log-partition function of the model. Noting that xT Qx q T x xT (Q D(q))x for every x {0, 1}n , we will without loss of generality assume that q 0, anddenote by Z(Q) the corresponding log-partition function XZ(Q) : log exp[xT Qx] .(1)x {0,1}nIn the Ising model, the maximum-likelihood approach to fitting data leads to the problemmin Z(Q) TrQS,Q Q(2)n is the empirical second-momentwhere Q is a subset of the set S n of symmetric matrices, and S S matrix. When Q S n , the dual to (2) is the maximum entropy problemXmax H(p) : p P, S p(x)xxT ,(3)px {0,1}nwhere P is the set of distributions with support in {0, 1}n , and H is the entropyXH(p) p(x) log p(x).(4)x {0,1}nnThe constraints of problem (3) define a polytope in R2 called the marginal polytope.For general Q’s, computing the log-partition function is NP-hard. Hence, except for specialchoices of Q, the maximum-likelihood problem (2) is also NP-hard. It is thus desirable to findcomputationally tractable approximations to the log-partition function, such that the resultingmaximum-likelihood problem is also tractable. In this regard, convex, upper bounds on the logpartition function are of particular interest, and our focus here: convexity usually brings aboutcomputational tractability, while using upper bounds yields a parameter Q that is suboptimal forthe exact problem.Using an upper bound in lieu of Z(Q) in (2), leads to a problem we will generically refer to asthe pseudo maximum-likelihood problem. This corresponds to a relaxation to the maximum-entropyproblem, which is (3) when Q S n . Such relaxations may involve two ingredients: an upper boundon the entropy, and an outer approximation to the marginal polytope.3

1.2Prior workDue to the vast applicability of Ising models, the problem of approximating their log-partitionfunction, and the related maximum-likelihood problem, has received considerable attention in theliterature for decades, first in statistical physics, and more recently in machine learning.The so-called log-determinant bound has been recently introduced, for a large class of Markovrandom fields, by Wainwright and Jordan [2]. (Their paper provides an excellent overview of theprior work, in the general context of graphical models.) The log-determinant bound is based onan upper bound on the differential entropy of continuous random variable, that is attained for aGaussian distribution. The log-determinant bound enjoys good tractability properties, both for thecomputation of the log-partition function, and in the context of the maximum-likelihood problem(2). A recent paper by Ravikumar and Lafferty [1] discusses using bounds on the log-partitionfunction to estimate marginal probabilities for a large class of graphical models, which adds extramotivation for the present study.1.3Main results and outlineThe main purpose of this note is to introduce a new upper bound on the log-partition functionthat is computationally tractable. The new bound is convex in Q, and leads to a restriction to themaximum-likelihood problem that is also tractable. Our development crucially involves a specificclass of Ising models, which we’ll refer to as standard Ising models, in which the model parameterQ has the form Q µI λ11T , where λ, µ are arbitrary scalars. Such models are indeed standardin statistical physics: the first term µI describes interaction with the external magnetic field, andthe second (λ11T ) is a simple mean field approximation to ferro-magnetic coupling.For standard Ising models, the log-partition functions has a computationally tractable, closedform expression (see appendix 5.1). Our bound is constructed so as to be exact in the case ofstandard Ising models. In fact, the error between our bound and the true value of the log-partitionfunction is bounded above by twice the l1 -norm distance from the model parameters (Q) to theclass of standard Ising models.The outline of the note reflects our main results: in section 2, we introduce our bound, andshow that the approximation error is bounded above by the distance to the class of standard Isingmodels. We discuss in section 3 the use of our bound in the context of the maximum-likelihoodproblem (2) and its dual (3). In particular, we discuss how imposing a bound on the distance tothe class of standard Ising models may be desirable, not only to obtain an accurate approximationto the log-partition function, but also to find a parsimonious model, having good interpretabilityproperties. We then compare the new bound with the log-determinant bound of Wainwright andJordan in section 4. We show that our new bound outperforms the log-determinant bound whenthe norm kQk1 is small enough (less than 0.08n), and provide numerical experiments supportingthe claim that our comparison analysis is quite conservative: our bound appears to be better overa wide range of values of kQk1 . Extensions of the approach are discussed in section 6.Notation. Throughout the note, n is a fixed integer. For k {0, . . . , n}, define k : {x {0, 1}n : Card(x) k}. Let ck k denote the cardinal of k , and πk : 2 n ck the probabilityof k under the uniform distribution.For a distribution p, the notation Ep refers to the corresponding expectation operator, andProbp (S) to the probability of the event S under p. The set P is the set of distributions with4

support on {0, 1}n .For X Rn n , the notation kXk1 denotes the sum of the absolute values of the elements of X,n the set ofand kXk the largest of these values. The set S n is the set of symmetric matrices, S n.symmetric positive semidefinite matrices. We use the notation X 0 for the statement X S If x Rn , D(x) is the diagonal matrix with x on its diagonal. If X Rn n , d(X) is the n-vectorformed with the diagonal elements of X. Finally, X is the set {(X, x) S n Rn : d(X) x} andX {(X, x) S n Rn : X xxT , d(X) x}.2The Cardinality Bound2.1The maximum boundTo ease our derivation, we begin with a simple bound based on replacing each term in the logpartition function by its maximum over {0, 1}n . This leads to an upper bound on the log-partitionfunction:Z(Q) n log 2 φmax (Q),whereφmax (Q) : maxx {0,1}nxT Qx.Computing the above quantity is in general NP-hard. Starting with the expressionφmax (Q) maxTrQX : rank(X) 1,(X,x) X and relaxing the rank constraint leads to the upper bound φmax (Q) ψmax (Q), where ψmax (Q) isdefined via a semidefinite program:ψmax (Q) maxTrQX,(5)(X,x) X where X {(X, x) S n Rn : X xxT , d(X) x}. For later reference, we note the dual form: D(ν) Q 21 νψmax (Q) min t : 0(6)1 Tt,νt2ν1 min ν T (D(ν) Q) 1 ν : D(ν) Q.(7)ν4The corresponding bound on the log-partition function, referred to as the maximum bound, isZ(Q) Zmax (Q) : n log 2 ψmax (Q).The complexity of this bound (using interior-point methods) is roughly O(n3 ).Let us make a few observations before proceeding. First, the maximum-bound is a convexfunction of Q, which is important in the context of the maximum-likelihood problem (2). Second,we have Zmax (Q) n log 2 kQk1 , which follows from (5), together with the fact that any matrixX that is feasible for that problem satisfies kXk 1. Finally, we observe that the function Zmaxis Lipschitz continuous, with constant 1 with respect to the l1 -norm. (As seen in appendix A, the5

same property holds for the log-partition function Z itself.) Indeed, for every symmetric matricesQ, R we have the sub-gradient inequalityZmax (R) Zmax (Q) TrX opt (R Q),where X opt is any optimal variable for the dual problem (5). Since any feasible X satisfies kXk 1, we can bound the term TrX opt (Q R) from below by kQ Rk1 , and after exchanging theroles of Q, R, obtain the desired result.2.2The cardinality boundFor every k {0, . . . , n}, consider the subset of variables with cardinality k, k : {x {0, 1}n :Card(x) k}. This defines a partition of {0, 1}n , thus n XXZ(Q) log exp[xT Qx] .k 0 x kWe can refine the maximum bound by replacing the terms in the log-partition by their maximumover k , leading to!nXZ(Q) logck exp[φk (Q)] ,k 0where, for k {0, . . . , n}, ck k , andφk (Q) : max xT Qx.x kComputing φk (Q) for arbitrary k {0, . . . , n} is NP-hard. Based on the identityφk (Q) TrQX : xT x k, 1T X1 k 2 , rankX 1,max(8)(X,x) X and using rank relaxation as before, we obtain the bound φk (Q) ψk (Q), whereψk (Q) maxTrQX : xT x k, 1T X1 k 2 .(9)(X,x) X Note that the last two inequalities seem redundant—they are in the original problem (8), but not inthe relaxed counterpart (9). (See appendix C for a case in which considering only the first equalityconstraint in the above leads to a strictly worse bound.)We have obtained the cardinality bound, defined as!nXZcard (Q) : logck exp[ψk (Q)] .k 0The complexity of computing ψk (Q) (using interior-point methods) is roughly O(n3 ). The upperbound Zcard (Q) is computed via n semidefinite programs of the form (9). Hence, its complexity isroughly O(n4 ).6

Problem (9) admits the dual formψk (Q) : min t kµ λk2t,µ,ν,λ :D(ν) µI λ11T Q1 T2ν12νt 0.(10)The fact that ψk (Q) ψmax (Q) for every k is obtained upon setting λ µ 0 in the semi-definiteprogramming problem (10). In fact, we haveψk (Q) min kµ k 2 λ ψmax (Q µI λ11T ).µ,λ(11)The above expression can be directly obtained from the following, valid for every µ, λ:φk (Q) kµ k 2 λ φk (Q µI λ11T ) kµ k 2 λ φmax (Q µI λ11T ) kµ k 2 λ ψmax (Q µI λ11T ).As seen in appendix 5, in the case of standard Ising models, that is if Q has the form µI λ11Tfor some scalars µ, λ, then the bound ψk (Q) is exact. Since the values of xT Qx when x ranges kare constant, the cardinality bound is also exact.By construction, Zcard (Q) is guaranteed to be better (lower) than Zmax (Q), since the latter isobtained upon replacing ψk (Q) by its upper bound ψ(Q) for every k. The cardinality bound thussatisfiesZ(Q) Zcard (Q) Zmax (Q) n log 2 kQk1 .(12)Using the same technique as used in the context of the maximum bound, we can show thatthe function ψk is Lipschitz-continuous, with constant 1 with respect to the l1 -norm. Using theLipschitz continuity of positively weighted log-sum-exp functions (with constant 1 with respect tothe l norm), we deduce that Zcard (Q) is also Lipschitz-continuous: for every symmetric matricesQ, R,!!nnXX Zcard (Q) Zcard (R) logck exp[ψk (Q)] logck exp[ψk (R)]k 0k 0 max ψk (Q) ψk (R) 0 k n kQ Rk1 ,as claimed.2.3Quality analysisWe now seek to establish conditions on the model parameter Q, which guarantee that the approximation error Zcard (Q) Z(Q) is small. The analysis relies on the fact that, for standard Isingmodels, the error is zero.We begin by establishing an upper bound on the difference between maximal and minimalvalues of xT Qx when x k . We have the boundmin xT Qx ηk (Q) : x kminTrQX : xT x k, 1T X1 k 2 .(X,x) X 7

In the same fashion as for the quantity ψk (Q), we can express ηk (Q) asηk (Q) max kµ k 2 λ ψmin (Q µI λ11T ),µ,λwhere ψmin (Q) : minTrQX. Based on this expression , we have, for every k:(X,x) X 0 ψk (Q) ηk (Q) k(µ µ0 ) k 2 (λ λ0 ) minλ,µ, λ0 ,µ0ψmax (Q µI λ11T ) ψmin (Q µ0 I λ0 11T )ψmax (Q µI λ11T ) ψmin (Q µI λ11T ) minλ,µ 2 minλ,µ kQ µI λ11T k1 ,where we have used the fact that , for every symmetric matrix R, we have0 ψmax (R) ψmin (R) maxTrR(X Y )(X,x),(Y,y) X maxkXk 1, kY k 1TrR(X Y ) 2kRk1 .Using again the Lipschitz continuity properties of the weighted log-sum-exp function, we obtainthat for every Q, the absolute error between Z(Q) and Zcard (Q) is bounded as follows:!!nnXX0 Zcard (Q) Z(Q) logck exp[ψk (Q)] logck exp[ηk (Q)]k 0k 0 max (ψk (Q) ηk (Q))0 k n 2Dst (Q), Dst (Q) : min kQ µI λ11T k1 ,λ,µ(13)Thus, a measure of quality is Dst (Q), the distance, in l1 -norm, between the model and the classof standard Ising models. Note that this measure is easily computed, in O(n2 log n) time, by firstsetting λ to be the median of the values Qij , 1 i j n, and then setting µ to be the medianof the values Qii λ, i 1, . . . , n.We summarize our findings so far with the following theorem:Theorem 1 (Cardinality bound) The cardinality bound isZcard (Q) : lognX!ck exp[ψk (Q)] .k 0where φk (Q), k 0, . . . , n, is defined via the semidefinite program (9), which can be solved inO(n3 ). The approximation error is bounded above by twice the distance (in l1 -norm) to the classof standard Ising models:0 Zcard (Q) Z(Q) 2 min kQ µI λ11T k1 .λ,µ8

33.1The Pseudo Maximum-Likelihood ProblemTractable formulationUsing the bound Zcard (Q) in lieu of Z(Q) in the maximum-likelihood problem (2) leads to a convexrestriction of that problem, referred to as the pseudo-maximum likelihood problem. This problemcan be cast as!nXmin logck exp[tk kµk k 2 λk ] TrQSt,µ,ν,Qk 0 s.t. Q Q,12 νktkD(νk ) µk I λk 11T Q1 T2 νk 0, k 0, . . . , n.The complexity of this bound is XXX. For numerical reasons, and without loss of generality, it isadvisable to scale the ck ’s and replace them by πk : 2 n ck [0, 1].3.2Dual and interpretationWhen Q S n , the dual to the above problem ismax(Yk ,yk ,qk )nk 0 D(q π) : S nXYk , q 0, q T 1 1,k 0 YkykT ykqk 0, d(Yk ) yk ,1T yk kqk , 1T Yk 1 k 2 qk , k 0 . . . , n.where π is the distribution on {0, . . . , n}, with πk Probu k 2 n ck , and D(q π) is the relativeentropy (Kullback-Leibler divergence) between the distributions q, π:D(q π) : nXk 0qk logqk.πkTo interpret this dual, we assume without loss of generality q 0, and use the variablesXk : qk 1 Yk , xk : qk 1 yk . We obtain the equivalent (non-convex) formulationmax(Xk ,xk ,qk )nk 0 D(q π) : S nXqk Xk , q 0, q T 1 1,(14)k 0(Xk , xk ) X , 1T xk k, 1T Xk 1 k 2 , k 0 . . . , n.The above problem can be obtained as a relaxation to the dual of the exact maximum-likelihoodproblem (2), which is the maximum entropy problem (3). The relaxation involves two steps: oneis to form an outer approximation to the marginal polytope, the other is to find an upper boundon the entropy function (4).First observe that we can express any distribution on {0, 1}n asp(x) nXk 09qk pk (x),(15)

whereqk Probp k X p(x), pk (x) x kqk 1 p(x) if x k ,0otherwise.Note that the functions pk are valid distributions on {0, 1}n as well as k .To obtain an outer approximation to the marginal polytope, we then write the moment-matchingequality constraint in problem (3) asS Ep xxT nXqk Xk ,k 0where Xk ’s are the second-order moment matrices with respect to pk :Xp(x)xxT .Xk Epk xxT qk 1x kTo relax the constraints in the maximum-entropy problem (3), we simply use the valid constraintsXk xk xTk , d(Xk ) xk , 1T xk k, 1T Xk 1 k 2 , where xk is the mean under pk :Xxk Epk x qk 1p(x)x.x kThis process yields exactly the constraints of the relaxed problem (14).To finalize our relaxation, we now form an upper bound on the entropy function (4). To thisend, we use the fact that, since each pk has support in k , its entropy is bounded above by log k ,as follows: H(p) Xp(x) log p(x) x {0,1}nn XXp(x) log p(x)k 0 x k n XXqk pk (x) log(qk pk (x))k 0 x knXqk (log qk H(pk ))k 0nXqk (log qk log k )( k 2n πk )k 0 nXqk logk 0qk n log 2,πkwhich is, up to a constant, the objective of problem (14).3.3Ensuring quality via bounds on QWe consider the (exact) maximum-likelihood problem (2), with Q {Q QT : kQk1 }:min Z(Q) TrQS : kQk1 ,Q QT10(16)

and its convex relaxation:min Zcard (Q) TrQS : kQk1 .Q QT(17)The feasible sets of problems (16) and (17) are the same, and on it the difference in the objectivefunctions is uniformly bounded by 2 . Thus, any -suboptimal solution of the relaxation (17) isguaranteed to by 3 -suboptimal for the exact problem, (16).In practice, the l1 -norm constraint in (17) encourages sparsity of Q, hence the interpretabilityof the model. It also has good properties in terms of the generalization error. As seen above, theconstraint also implies a better approximation to the exact problem (16). All these benefits comeat the expense of goodness-of-fit, as the constraint reduces the expressive power of the model. Thisis an illustration of the intimate connections between computational and statistical properties ofthe model.A more accurate bound on the approximation error can be obtained by imposing the followingconstraint on Q and two new variables λ, µ:kQ µI λ11T k1 .We can draw similar conclusions as before. Here, the resulting model will not be sparse, in thesense of having many elements in Q equal to zero. However, it will still be quite interpretable, asthe bound above will encourage the number of off-diagonal elements in Q that differ from theirmedian, to be small.A yet more accurate control on the approximation error can be induced by the constraintsψk (Q) ηk (Q) for every k, each of which can be expressed as an LMI constraint. Thecorresponding constrained relaxation to the maximum-likelihood problem has the form!nX 2 minlogck exp[t k kµk k λk ] TrQSt,µ ,ν ,Qk 0 s.t. Tdiag(νk ) µ k I λk 11 Q1 2 νk TQ diag(νk ) µ k I λk 111 2 νk1 2 νkt k1 2 νkt k 0, k 0, . . . , n, 0, k 0, . . . , n, t k tk , k 0, . . . , n.Using this model instead of ones we saw previously, we sacrifice less on the front of the approximation to the true likelihood, at the expense of increased computational effort.44.1Links with the Log-Determinant BoundThe log-determinant boundsThe bound in Wainwright and Jordan [2] is based on an upper bound on the (differential) entropyof a continuous random variable, which is attained for a Gaussian distribution. It has the formZ(Q) Zld (Q), withZld (Q) : αn maxTrQX (X,x) X 1111log det(X xxT I)212(18)

where α : (1/2) log(2πe) 1.42. Wainwright and Jordan suggest to further relax this bound toone which is easier to compute:Zld (Q) Zrld (Q) : αn maxTrQX (X,x) X11log det(X xxT I).212(19)Like Z and the bounds examined previously, the bound Zld and Zrld are Lipschitz-continuous,with constant 1 with respect to the l1 norm. The proof starts with the representations above, andexploits the fact that kQk1 is an upper bound on TrQX when (X, x) X .The dual of the log-determinant bound has the form (see appendix (B))n1Zld (Q) log π log 2 22 11D(ν) Q Fmin t Tr(D(ν) Q F ) log det 12 ν T g Tt,ν,F,g,h122 F gs.t. 0.g h 21 ν gt h (20)The relaxed counterpart Zrld (Q) is obtained upon setting F, g, h to zero in the dual above: 111nD(ν) Q 12 ν.Zrld (Q) log π log 2 min t Tr(D(ν) Q) log dett,ν 12 ν Tt22122Using Schur complements to eliminate the variable t, we further obtainn1log π 221 T11min ν (D(ν) Q) 1 ν Tr(D(ν) Q) log det(D(ν) Q).ν4122Zrld (Q) 4.2(21)Comparison with the maximum boundWe first note the similarity in structure between the dual problem (5) defining Zmax (Q) and thatof the relaxed log-determinant bound.Despite these connections, the log-determinant bound is neither better nor worse than thecardinality or maximum bounds. As seen later in section 5, for some special choices of Q, forexample when Q is diagonal, the cardinality bound is exact, while the log-determinant one is not.Conversely, one can choose Q so that Zcard (Q) Zld (Q), so no bound dominates the other. Thesame can be said for Zmax (Q) (see section 4.4 for numerical examples).However, when we impose an extra condition on Q, namely a bound on its l1 norm, more can besaid. The analysis is based on the case Q 0, and exploits the Lipschitz continuity of the boundswith respect to the l1 -norm.As seen in section 5, for Q 0, the relaxed log-determinant bound writesn2πe 1log 232nπe 1 Zmax (0) log .262Zrld (0) 12

Now invoke the Lipschitz continuity properties of the bounds Zrld (Q) and Zmax (Q), and obtainthatZrld (Q) Zmax (Q) (Zrld (Q) Zrld (0)) (Zrld (0) Zmax (0)) (Zmax (0) Zmax (Q)) 2kQk1 (Zrld (0) Zmax (0))πe 1n . 2kQk1 log2621This proves that if kQk1 n4 log πe6 4 , then the relaxed log-determinant bound Zrld (Q) is worse(larger) than the maximum bound Zmax (Q). We can strengthen the above condition to kQk1 0.08n.4.3Summary of comparison resultsTo summarize our findings:Theorem 2 (Comparison) We have for every Q:Z(Q) Zcard (Q) Zmax (Q) n log 2 kQk1 .In addition, we have Zmax (Q) Zrld (Q) whenever kQk1 0.08n.4.4A numerical experimentWe now illustrate our findings on the comparison between the log-determinant bounds and thecardinality and maximum bounds. We set the size of our model to be n 20, and for a range ofvalues of a parameter ρ, generate N 10 random instances of Q with kQk1 ρ. Figure 4.4 showsthe average values of the bounds, as well as the associated error bars. Clearly, the new boundoutperforms the log-determinant bounds for a wide range of values of ρ. Our predicted thresholdvalue of kQk1 for which the new bound becomes worse, namely ρ 0.08n 1.6 is seen to be veryconservative, with respect to the observed threshold of ρ 30. On the other hand, we observe thatfor large values of kQk1 , the log-determinant bounds do behave better. Across the range of ρ, wenote that the log-determinant bound is indistinguishable from its relaxed counterpart.5The Case of Standard Ising ModelsIn this section, we examine the special case of standard Ising models, for which Q µI λ11T forsome µ, λ R.13

approximation error for upper bounds on the exact log!partition function (n m of Q3035404550Figure 1: Comparison between the various bounds as a function of the l1 -norm of the modelparameter matrix Q, for randomly generated instances, with n 20.5.1Log-partition function and maximum-likelihood problemFor standard Ising models, we have n XXexp[µ(xT x) λ(1T x)2 ] Z(Q) log k 0 x k n XX log exp[µk λk 2 ] k 0 x k lognX!ck exp[µk λk 2 ] .k 0When λ 0, the expression reduces toZ(µI) n log(1 exp(µ)).For standard Ising models, the maximum-likelihood problem also has a tractable expression.Given samples x(i) , i 1, . . . , N , we form the sufficient statistics(i)δk : c 1 k }, k 0, . . . , n.k Card{i : xThe vector δ contains the empirical probabilities that the random variable belongs to k . Themaximum-likelihood problem readsmax δ T θ Z(θ) : θk kµ λk 2 , k 0, . . . , n.θ14

(In the absence of constraints, the value of the problem is the negative entropy evaluated at δ, andthe optimizer is θk log δk , k 0, . . . , n.) The corresponding maximum entropy problem is in thespace of distributions of a random variable k {0, . . . , n}:max pnXpk log pk : Ep (k) Eδ (k), Ep (k 2 ) Eδ (k 2 ).k 0In this problem, the random variable is the cardinality k {0, . . . , n} of the original random variablein {0, 1}n . The objective is the entropy of the distribution of k, and the constraints correspond toa matching condition on the first- and second-moment, with respect to the empirical probabilitydistribution of the cardinality k.5.2Maximum and cardinality boundWe now examine the behavior of the cardinality bound with standard Ising models, again withQ µI λ11T . We haveφk (Q) max xT Qx µk λk 2 .x kThe corresponding bound ψk (Q) is exact: ψk (Q) maxµTrX λ1T X1 : xT 1 k, 1T X1 k 2 kµ λk 2 .(X,x) X In this case then, the cardinality bound Zcard (Q) is exact.Let us examine the maximum bound. When Q µI λ11T , we have Zmax (Q) n log 2 ψ(Q),withψ(Q) minν1 Tν (D(ν) µI λ11T ) 1 ν.4By symmetry, we can without loss of generality set ν ξ1 for some ξ R. Setting v ξ µ, weobtain(v µ)2 T1 (vI λ11T ) 1 1v nλ4n(v µ)2 min4 v nλ v nλ n(µ nλ) .ψ(Q) minNote thatφ(Q) maxx {0,1}nxT Qx maxµk λk 2k {0,.,n}is not, in general, equal to its upper bound ψ(Q) in this case, unless λ, µ have the same sign.We have obtainedZmax (µI λ11T ) n(log 2 (µ nλ) ),which is not exact. Therefore, in general the maximum bound is not exact for standard Isingmodels.15

5.3The relaxed log-determinant boundFor the relaxed log-determinant bound, in the case of standard Ising models, we observe that, bysymmetry, we can without loss of generality assume that ν ξ1 for some ξ R in (21). Withv ξ µ, and with1T (vI λ11T ) 1 1 nnλ, log det(vI λ11T ) n log v log(1 ),v nλvwe obtainZrld (Q) 1n(v µ)21n 11nlog π min (v λ) log v log(v nλ).22 v nλ 4(v nλ) 1222When µ 0, this expression reduces toZrld (0) n2πe 1log .232So, the relaxed log-determinant bound is not exact for standard Ising Models, even for Q 0.How the log-determinant bound behaves for the scaled identity case is still unclear, but numericalexperiments suggest that it is equal to its relaxed counterpart for standard Ising models.66.1ExtensionsPartition boundsThe cardinality bound can be interpreted as a special case of a class of bounds which we calledpartition bounds. Such bounds themselves are closely linked to a very general class of bounds thatare based on worst-case probability analysis, as seen in section 6.2.nConsider a partition D (Dk )Kk 0 of the set {0, 1} into K 1 disjoint subsets Dk , k 0, . . . , Kn(K 2 1). We can express Z(Q) as K XXZ(Q) log exp[xT Qx] .k 0 x DkDefine φ(Q; Dk ) : maxx Dk xT Qx, and replace each term for x k by its upper bound, to getan upper bound on Z:!KXZ(Q) ZD (Q) : log Dk exp[φ(Q; Dk )] ,(22)k 0where Dk denotes the cardinality of the set Dk .Evaluating

2 The Cardinality Bound 2.1 The maximum bound To ease our derivation, we begin with a simple bound based on replacing each term in the log-partition function by its maximum over {0,1}n. This leads to an upper bound on the log-partition function: Z(

Related Documents:

Convex obj non-convex domain. Introduction Generic Problem: min Q(x); s:t: x 2F; Q(x) convex, especially: convex quadratic F nonconvex Examples: F is a mixed-integer set F is constrained in a nasty way, e.g. x 1 3 sin(x 2) 2cos(x 3) 4 Bienstock, Michalka Columbia Convex obj non-convex domain.

Solution. We prove the rst part. The intersection of two convex sets is convex. There-fore if Sis a convex set, the intersection of Swith a line is convex. Conversely, suppose the intersection of Swith any line is convex. Take any two distinct points x1 and x2 2 S. The intersection of Swith the line through x1 and x2 is convex.

What is convex projective geometry? Motivation from hyperbolic geometry Nice Properties of Hyperbolic Space Convex: Intersection with projective lines is connected. Properly Convex: Convex and closure is contained in an affine patch ()Disjoint from some projective hyperplane. Strictly Convex: Properly convex and boundary contains

3 convex-convex n [DK85] convex-convex (n;lognp lognq) [DK90] INTERSECTION DETECTION OF CONVEX POLYGONS Perhaps the most easily understood example of how the structure of geometric objects can be exploited to yield an e cient intersection test is that of detecting the intersection of two convex polygons. There are a number of solutions to this .

Proof:Let us denote the set of all convex combinations of ppoints of Sby Cp(S). Then the set of all possible convex combinations of points of S is C(S) : [1 p 1Cp(S). If x2 C(S) then it is a convex com

Convex optimization – Boyd & Vandenberghe Nonlinear programming – Bertsekas Convex Analysis – Rockafellar Fundamentals of convex analysis – Urruty, Lemarechal Lectures on modern convex optimization – Nemirovski Optimization for Machine Learning – Sra, Nowozin, Wright Theory of Convex Optimization for Machine Learning – Bubeck .

Convex Optimization Theory Athena Scientific, 2009 by Dimitri P. Bertsekas Massachusetts Institute of Technology Supplementary Chapter 6 on Convex Optimization Algorithms This chapter aims to supplement the book Convex Optimization Theory, Athena Scientific, 2009 with material on convex optimization algorithms. The chapter will be .

engineering materials, tools, equipments, manufacturing processes, basic concepts of electro-mechanical controls of machine tools, production criteria’s, characteristics and uses of various testing instruments and measuring or inspecting devices for checking components or products manufactured in various manufacturing shops in an industrial environment. It also describes and demonstrates the .