2y ago

35 Views

1 Downloads

383.88 KB

12 Pages

Transcription

1014IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 7, JULY 1999Fast and Exact Simultaneous Gate and Wire Sizingby Lagrangian RelaxationChung-Ping Chen, Chris C. N. Chu, and D. F. Wong, Member, IEEEAbstract— This paper considers simultaneous gate and wiresizing for general very large scale integrated (VLSI) circuitsunder the Elmore delay model. We present a fast and exactalgorithm which can minimize total area subject to maximumdelay bound. The algorithm can be easily modified to giveexact algorithms for optimizing several other objectives (e.g.,minimizing maximum delay or minimizing total area subject toarrival time specifications at all inputs and outputs). No previousalgorithm for simultaneous gate and wire sizing can guaranteeexact solutions for general circuits. Our algorithm is an iterativeone with a guarantee on convergence to global optimal solutions.It is based on Lagrangian relaxation and “one-gate/wire-at-atime” greedy optimizations, and is extremely economical and fast.For example, we can optimize a circuit with 27 648 gates andwires in 11.53 min using under 23 Mbytes memory on a PC witha 333-MHz Pentium II processor.Index Terms—Gate sizing, interconnect, Lagrangian relaxation,performance optimization, wire sizing.I. INTRODUCTIONSINCE the invention of integrated circuits almost 40 yearsago, gate sizing has always been an effective techniqueto achieve desirable circuit performance. As technology continues to scale down, total number of gates and interconnectswithin a die grows over millions. In such increasingly denseintegrated circuits, a significant portion of the total circuitdelay comes from the interconnects. Therefore, developingefficient algorithms which can handle large scale gate andinterconnect optimization problems are of great importance.In the past, gate delay was the dominant factor in determining circuit performance. Thus, gate and transistor sizing havebeen extensively studied in the literature [6], [12], [16], [17],[23]. As interconnect delay plays an increasingly importantrole in determining circuit performance, wire sizing has beenan active research topic in the past few years [2], [4], [7], [9],[19], [22].Since gate sizes affect wire-sizing solutions and wire sizesaffect gate-sizing solutions, it is beneficial to simultaneouslysize both gates and wires. Several results on simultaneous gateand wire sizing have been reported [2], [7], [8], [18], [20].Chen et al. [2], Cong and Koh [8], Menezes et al. [18], andMenezes et al. [20] considered a single routing tree togetherManuscript received July 22, 1998; revised November 24, 1998. Thiswork supported in part by the Texas Advanced Research Program underGrant 003658288 and by a grant from the Intel Corporation. This paper wasrecommended by Associate Editor K.-T. Cheng.The authors are with the Department of Computer Sciences, University of Texas at Austin, Austin, TX 78712 (e-mail: ccp@cs.utexas.edu;cnchu@cs.utexas.edu; wong@cs.utexas.edu).Publisher Item Identifier S 0278-0070(99)05038-1.with the driving gate. For simultaneous gate and wire sizing forgeneral circuits, Cong and He [7] minimized a weighted delayusing a greedy sizing technique in conjunction with dynamicprogramming. The algorithm cannot guarantee to give exactsolutions for objectives such as minimizing total area subjectto maximum delay bound or minimizing maximum delay.In this paper, we consider simultaneous gate and wire sizingfor general very large scale integrated (VLSI) circuits underthe Elmore delay model. We present a fast and exact algorithmwhich can minimize total area subject to maximum delaybound. It can be easily modified to give algorithms for optimizing several other objectives (e.g., minimizing maximum delayor minimizing total area subject to arrival time specificationsat all inputs and outputs). Convergence to global optimalsolutions is guaranteed for all cases. Our algorithm is basedon the Lagrangian relaxation technique, which transforms theproblem into a sequence of subproblems called the Lagrangianrelaxation subproblems. We show that each subproblem canbe greatly simplified by the Kuhn–Tucker conditions and canbe solved by an efficient “one-gate/wire-at-a-time” greedyalgorithm. So our algorithm is extremely economical and fast.For example, we can optimize a circuit with 27 648 gates andwires in 11.53 min using under 23 Mbytes memory on a PCwith a 333-MHz Pentium II processor.We notice that the gate and wire sizing problem is similarto the transistor sizing problem. In this paper, our problemis formulated as a geometric program [10]. Fishburn andDunlop [12] have shown a long time ago that the transistorsizing problem can also be formulated as a similar geometric program. However, it would be very slow to solvethe geometric program by some general-purpose geometricprogramming solver. So instead of solving it exactly, Fishburnand Dunlop proposed TILOS [12], which is based on anefficient sensitivity-based heuristic.Marple [16], [17] solved the geometric program by theLagrangian augmentation technique. Lagrangian augmentation, like Lagrangian relaxation, is a general technique forconstrained nonlinear optimization. The difference betweenthem is that Lagrangian augmentation adds to the Lagrangiana penalty term that helps to steer the solution toward thefeasible region. However, we demonstrate in this paper thatfor the device sizing problems, Lagrangian relaxation is amuch better technique. Without the penalty term, tremendoussimplification and efficient greedy algorithm to the Lagrangianrelaxation subproblems are possible. Hence, our approach ismuch faster. At the same time, rapid convergence to globaloptimal solutions is still guaranteed.0278–0070/99 10.00 1999 IEEEAuthorized licensed use limited to: Iowa State University. Downloaded on April 15,2010 at 14:41:55 UTC from IEEE Xplore. Restrictions apply.

CHEN et al: SIMULTANEOUS GATE AND WIRE SIZING BY LAGRANGIAN RELAXATIONSapatnekar et al. [23] later transformed the geometric program into a convex program and solved it by a sophisticatedgeneral-purpose convex programming solver based on interiorpoint method. This is the best-known previous algorithm thatcan guarantee exact transistor sizing solutions. Again, as weexplore the special structure of the geometric program, ourtailored algorithm is much faster than the algorithm in [23].For example, the largest test circuit in [23] has 832 transistorsand the reported runtime and memory are 9 hours (on a SunSPARCstation 1) and 11 Mbytes, respectively. Note that for aproblem of similar size (864), our approach only needs 1.3 s ofruntime (on a PC with a 333-MHz Pentium II processor) and1.15 Mbytes of memory. According to the SPEC benchmarkresults,1 our machine is roughly 40 times faster than theslowest model of Sun SPARCstation 1. Taking the speeddifference of the machines into account, our algorithm is about600 times faster than the general-purpose solver for a smallcircuit. For larger circuits, we expect the speedup to be evenmore significant.The rest of this paper is organized as follows. In Section II,we introduce some notations and terminology that we use inthis paper. In Section III, we present our algorithm for theproblem of minimizing total area subject to maximum delaybound. In Section IV, we show how to modify our algorithm tominimize maximum delay, to handle arrival time specificationsat all inputs and outputs, to consider power dissipation and touse a more accurate gate model. In Section V, experimentalresults to show the runtime and storage requirements of ouralgorithm are presented.II. PRELIMINARIESIn this section, we define some notations and terminologythat we use in this paper.For a general VLSI circuit, we can ignore all latches andoptimize its combinational subcircuits. Therefore, we focuson combinational circuits below.Given a combinational circuit with input drivers, outputgates or wire segments, the gate sizes or theloads, andsegment widths are allowed to be varied in order to optimize, letbe the driver resistancesome objective. For, letbe the loadof the th input driver. Forcapacitance of the th output load. See Fig. 1 for an illustrationof a circuit.A gate, a wire segment, or an input driver is called acomponent. In order to unify the notations that we introducelater, imagine that two factitious components are added to thecircuit. The first one is called an output component whichconsists of all the output loads. The second one is called aninput component which connects to all the input drivers. Leta node be a connection point between two components or theoutput point of the output component. Note that the output ofeach component should connect to a distinct node. So it is easycomponents andnodes.to see that there are. We label the nodes by indexesLetas follows. The node with index 0 is the output point of the, the node with index isoutput component. For1 SPECtable; ftp://ftp.cdf.toronto.edu/pub/spectable.1015Fig. 1. An illustration of a circuit.Fig. 2. An illustration of the circuit in Fig. 1 with node indexes andcomponent indexes. The factitious input component and output componentare also shown.the one connecting to the th output load. For,the node with index is a connection point among the gatesand wire segments. The indexes are assigned in such a waythat if node and node are connected to an input and the. Foroutput of some component, respectively, then, the node with index is the one connectingth input driver. The node with indexis theto theoutput point of the input component. It is not difficult to seethat if we view the circuit as a directed acyclic graph, the nodeindex assignment is a reverse topological ordering of the graph.such thatWe also label the components by indexesthe output of the component with index is connected to node. See Fig. 2 for an illustration of the circuit in Fig. 1 withfactitious components, node indexes, and component indexes., let inputbe the set of indexesForof components directly connected to the inputs of component. For, let outputbe the set of indexes ofcomponents directly connected to the output of component. For example, for the circuit in Fig. 2, input(0),, output(6), input(8), andinput(6). Note thatinputif and only ifoutput(8)output .Let be the set of component indexes of gates in the circuit.be the set of component indexes of wire segmentsLetandin the circuit. For the circuit in Fig. 2,.For the purpose of delay calculation, we model componentsas resistance-capacitance (RC) circuits. If component is a), it is modeled as a switch-level RC circuit asgate (i.e.,shown in Fig. 3. See [24] for a reference of this model. Letbe the gate size. Then the output resistance, andAuthorized licensed use limited to: Iowa State University. Downloaded on April 15,2010 at 14:41:55 UTC from IEEE Xplore. Restrictions apply.

1016IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 7, JULY 1999Letbe the delay associated with resistor . Webyrepresent a signal path passing through resistors. Let be the set of all possible pathsthe setfrom node to node 0 (i.e., from an input driver to an output, the Elmore delay along pathload). Then for anyis.Fig. 3. The model of component i, which is a gate, by a switch-level RCcircuit. Note that riri xi and ci ci xi fi , where ri , ci , and fi arethe unit size output resistance, the unit size gate area capacitance and the gateperimeter capacitance of gate i, respectively. Although the gate shown hereis a two-input AND gate, the model can be easily generalized for any gatewith any number of input pins. Fig. 4. The model of component i, which is a wire segment, by a -typeRC circuit. Note that riri xi and ci ci xi fi , where ri , ci , and fiare the unit width wire resistance, the unit width wire area capacitance, andthe wire fringing capacitance of segment i, respectively. the input capacitance of a pin, where , ,andare the unit size output resistance, the unit size gatearea capacitance and the gate perimeter capacitance of gate ,respectively. (To simplify the notations, we assume the inputcapacitances of all input pins of a gate are the same. We alsoignore the intrinsic gate delay. It is clear that all our resultswill still hold without these assumptions.)), it is modeledIf component is a wire segment (i.e.,as a -type RC circuit as shown in Fig. 4. Letbe the, andsegment width. Then the segment resistance, where , , andthe segment capacitanceare the unit width wire resistance, the unit width wire areacapacitance and the wire fringing capacitance of segment ,, letandbe, respectively,respectively. Forthe lower bound and upper bound of the value of , i.e.,.Elmore delay model [11] is used for delay calculation.Basically, the Elmore delay along a signal path is the sum ofthe delays associated with the resistors in the path, where thedelay associated with a resistor is equal to its resistance timesits downstream capacitance. For our case, each component(except the two factitious components) contains a resistor.such thatWe label the resistors by indexesresistor is the one inside component . For convenience, for, let(i.e., the driver resistanceof theth input driver). So for, the, letbe theresistance of resistor is . Fordownstream capacitance of resistor . Fig. 5 shows the circuitin Fig. 2 after replacing the components by the RC models.The resistance of each resistor is marked in the figure. Also,the regions corresponding to the downstream capacitances ofresistors 5 and 12 are shaded.III. MINIMIZING TOTAL AREASUBJECT TO MAXIMUM DELAY BOUNDIn this section, we solve the problem of minimizing the totalcomponent area with respect to component sizessubject to the constraint that the maximum delay from anyinput driver to any output load is at most some constant(i.e.,is a bound on the arrival time at node 0).In Section III-A, we first show how to formulate the problem as a constrained optimization problem with a polynomial number of constraints. We call this formulation the).is a geometric program. Thereprimal problem (are many standard methods for solving geometric programs, we[10]. However, because of the special structure ofshow that it can be solved very efficiently by Lagrangianrelaxation. Lagrangian relaxation is a general technique forsolving constrained optimization problems. We outline thebasic idea of Lagrangian relaxation below. More details canbe found in [1], [13], and [14].In Lagrangian relaxation, “troublesome” constraints are“relaxed” and incorporated into the objective function aftermultiplying them by constants called Lagrange multipliers, onemultiplier for each constraint. For each fixed vector of theLagrange multipliers introduced, we have a new optimizationproblem (which should be easier to solve because it is freeof troublesome constraints) called the Lagrangian relaxation(). It can be shownsubproblem associated withthat there exists a vector such that the optimal solution ofis also the optimal solution of the original constrained. The problem of finding such aoptimization problem). Sovector is called the Lagrangian dual problem (and, then the optimalif we can solve bothwill be given bywhereis thesolution of.optimal solution ofis relaxed to obtain theIn Section III-B, we show howand the corresponding. In Section III-C, weuse the Kuhn–Tucker conditions (see [1] for a reference) toderive a set of optimality conditions on . We show that the.optimality conditions can be used to greatly simplify. In Section III-D,We called the simplified version(i.e.,) for any fixedwe show how to solvebyvector . In Section III-E, we describe how to solvethe classical method of subgradient optimization.A. Problem FormulationFor each , the area of component is proportional to itssize . Therefore, the total component area can be written asfor some constants. Then the problemof minimizing total area subject to maximum delay bound canAuthorized licensed use limited to: Iowa State University. Downloaded on April 15,2010 at 14:41:55 UTC from IEEE Xplore. Restrictions apply.

CHEN et al: SIMULTANEOUS GATE AND WIRE SIZING BY LAGRANGIAN RELAXATION1017Fig. 5. Illustration of the circuit in Fig. 2 after replacing the gates and wire segments by the RC models. The resistance of each resistor is marked in thefigure. Also, the regions corresponding to the downstream capacitances of resistors 5 and 12 are shaded.problemhas a unique global minimum and no other localin the following.minimum. We consider the formulationbe formulated directly asMinimizeB. Lagrangian RelaxationSubject toHowever, the number of possible signal paths from nodeto node 0 (and, hence, the number of constraints in themathematical program above) can be exponential in . So thisdirect formulation is impractical.This difficulty can be handled by the classical techniqueof partitioning the constraints on path delay into constraintstoon delay across components. We associate a variablerepresents the arrival time at node (i.e., theeach node .maximum delay from node to node ). Then it is not difficultto see that the mathematical program below, which we called), is equivalent to the mathematicalthe primal problem (program above:sinceWe relax all the constraints on arrival time ofthey are difficult to handle. The simple constraints on theare not relaxed. They are handledcomponent sizesin the Lagrangian relaxation subproblem.Following the Lagrangian relaxation procedure, we introduce a nonnegative value called the Lagrange multiplier forinput(i.e.,each constraint on arrival time. For all), we introducefor the constraint.and for allinput , we introduceForfor the constraint. For, wefor the constraint. Let be a vector ofintroduceall the Lagrange multipliers introduced. Letand. Let:MinimizeSubject toinputoutputsandinputinputsThen the Lagrangian relaxation subproblem associated withthe Lagrange multipliers isMinimizeSubject toNote that the number of constraints inis polynomial inand . Also note that for the problem, the objectivefunction is a posynomial [10] and the constraints can berewritten in the form of polynomials. It is well known thatunder a variable transformation, the problem is convex. SoLet the functionbe the optimal value of the problem. We define the Lagrangian dual problem as follows:MaximizeSubject toAuthorized licensed use limited to: Iowa State University. Downloaded on April 15,2010 at 14:41:55 UTC from IEEE Xplore. Restrictions apply.

1018IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 7, JULY 1999As said in Section III-A,can be transformed into a convexproblem. So [2, Theorem 6.2.4] implies that if is the optimal, then the optimal solution ofwillsolution of.also optimizeC. Simplification ofHere, we use the Kuhn–Tucker conditions to derive a set ofoptimality conditions on . Then we show that the optimality.conditions can be used to greatly simplify, let and be the Lagrange multipliers forForand, respectively. Considerthe constraintsofthe Lagrangian [1]The Kuhn–Tucker conditions implyforat the optimal solution of. In other words, theLagrange multipliers corresponding to the optimal solution ofmust satisfy the conditionsfor. So we can consider those Lagrange multipliers only., we obtain the following conditionsBy settingon .Optimality Conditions on :forWe show in Lemma 1 below how the optimality conditions. Let:on can be used to simplifysatisfies the optimality conditions on ., solvingis equivalentLemma 1: For anyto solvingMinimizeSubject towhere,, andProof: By rearranging the terms,ten as follows:for.can be rewrit-So by substituting the optimality conditions on, we getinto(1)Authorized licensed use limited to: Iowa State University. Downloaded on April 15,2010 at 14:41:55 UTC from IEEE Xplore. Restrictions apply.

CHEN et al: SIMULTANEOUS GATE AND WIRE SIZING BY LAGRANGIAN RELAXATIONNote thatin (1) no longer depends on . Also noteis a constant. So if letthat, then minimizingis the same as mi

the unit size output resistance, the unit size gate area capacitance and the gate perimeter capacitance of gate i , respectively. Although the gate shown here is a two-input AND gate, the model can be easily generalized for any gate with any number of input pins. Fig. 4. The model of component i , which is a wire segment, by a -type RC circuit.

Related Documents: