• Have any questions?
  • info.zbook.org@gmail.com

Cost-Effective Design Of Scalable High-Performance

14d ago
14 Views
0 Downloads
3.46 MB
8 Pages
Last View : 2d ago
Last Download : n/a
Upload by : Axel Lin
Share:
Transcription

Cost-Effective Design of Scalable High-PerformanceSystems Using Active and Passive InterposersDylan Stow, Yuan XieTaniya Siddiqua, Gabriel H. LohElectrical and Computer EngineeringUniversity of California, Santa BarbaraSanta Barbara, California{dstow, yuanxie}@ece.ucsb.eduAMD ResearchAdvanced Micro Devices, Inc.Bellevue, Washington{taniya.siddiqua, gabriel.loh}@amd.comAbstract—Cutting-edge high-performance systems demandlarger and denser processors, but future lithographic nodes areexpected to introduce higher manufacturing costs and yield challenges. Die-level integration technologies like passive interposerbased 2.5D have demonstrated the potential for cost reductionsthrough die partitioning and yield improvement, but systemperformance and scalability may be impacted. Alternatively,active interposer technology, the intersection of 3D and 2.5Dmethodologies, can provide higher-performance interconnect networks to integrate chiplets, but the active interposer die is itselfsubject to cost and yield concerns. In this work, we perform a costand performance comparison between traditional monolithic 2DSoCs, 2.5D passive interposers, and 2.5D/3D active interposers todemonstrate the trade-offs between the interposer types for current and future high-performance systems. This work introducesa multi-die core-binning cost model to demonstrate the yieldimprovements from interposer-based die partitioning of largemulti-core processors. The relative cost and performance scalingtrade-offs of passive and active interposer dies are then comparedfor the target systems, demonstrating that both methodologiescan indeed provide cost-effective integration for different systemrequirements. Finally, this work demonstrates how the extra“prepaid” silicon area of the interposers can be leveraged for faulttolerance to improve yield and cost-effectiveness. In summary,this work concludes that both active and passive interposerscan cost-effectively improve the functional and parametric yieldof high-performance systems, together providing a cost versusperformance space to meet a range of design requirements.I. I NTRODUCTIONAs outlined in the ITRS 2.0 roadmap [2] [4], the datacenterand microserver markets demand increasingly performant andlocalized processing, with a roughly 3 increase in availablememory and 4 increase in the number of processor cores persocket and rack unit, respectively, over the next ten years. Similarly, the push for high-performance exascale supercomputingwill likely require complex heterogeneous SoCs with manycores and integrated memory to provide sufficient bandwidthand data localization to meet efficiency requirements [20].Modern manycore server processors, such as the 32-core AMD“Epyc” processor, demonstrate that the industry is indeedmoving in these directions to meet datacenter and microserverdemands.Unfortunately, the ability to meet these demands with conventional process scaling is becoming increasingly difficultand expensive. The Moore’s Law target cadence is alreadyslipping, with almost all foundries no longer able to meetthe desired transistor scaling rates in the most recent olithic ipletInterposerFig. 1. Transition from monolithic manycore CPU to interposer-based 2.5Dsystem with multiple chipletsnodes [12] and future process roadmaps slowing for each newnode. Increased process complexity has led to more expensivefabrication and longer manufacturing cycle times [13], and astransistor cost reduction slows, yield and endurance challengesgrow, and cost per area increases [24] [28], it becomesincreasingly costly to meet the market requirements for denser,larger integrated circuits.As shown in Figure 1, an alternative solution to traditional monolithic SoC integration is the usage of die-levelintegration methods like Through Silicon Via (TSV)-based3D and interposer-based 2.5D methodologies. Manufacturingyield can be improved by partitioning the SoC into multiplechiplets, ideally with identical modular structure to reducedesign and mask cost, and by bonding these chiplets throughhigh-yield, high-bandwidth, chip-to-chip interconnects. 3D integration has long been studied as a solution to improve yieldand performance, but die stacking requires significant EDAchanges and leads to thermal density challenges. Interposerbased 2.5D integration, however, has already come to marketfor several high-end devices, including the AMD Radeon R9GPUs with High Bandwidth Memory integration for improvedperformance, efficiency, and footprint [15] and the Virtex7 FPGA from Xilinx [19] with multiple FPGA slices andheterogeneous transceiver chiplets for improved yield, configurability, and performance. However, the usage of interposershas so far been limited to these cases, while the wider highperformance market could stand to benefit from interposeradoption. In recent analysis of a cost-driven design methodology, both 2.5D and 3D designs were shown to have lowerpost-yield manufacturing costs than 2D SoCs for midsize andlarge systems [22], but only 2.5D designs were cost-effectivefor high-power designs, while 3D suffered from increasedpackaging and cooling costs when thermal management wasconsidered [23].

Die 1𝜇bump𝜇bumpDie 2𝜇bump𝜇bumpTSV(a) Passive InterposerDie 1𝜇bumpDie 2𝜇bump𝜇bump𝜇bumpTSV(b) Active InterposerFig. 2. Illustrative two-chiplet system, integrated with microbumps using (a)Passive interposer with only passive interconnect and TSVs, and (b) Activeinterposer with active CMOS logic.Although interposers can be utilized for partitioning andintegration, the metal-only nature of current passive interposers potentially limits their ability to provide sufficientbandwidth and latency for new high-performance systems.Active interposers [14] are an emerging combination of2.5D and 3D integration that balances the simplified designmethodology and thermal management of passive 2.5D butleverages standard CMOS processes to integrate active transistor devices into the interposer for faster repeated interconnectand flexible Network-on-Chip (NoC) for better chiplet connectivity [8]. Active interposers have been demonstrated toimprove signaling and efficiency over passive interposer [10],[11], and functional samples with active NoC have recentlybeen fabricated [26].The transition from a passive to active interposer increasesthe interposer cost overhead due to additional process complexity, and the active interposer itself could become a large,low-yield die that increases system cost. To date, no activeinterposers have been adopted in commercial designs dueto these cost concerns. As such, all recent active interposerwork has focused on “minimally active” interposers [9], [26]with only a small percentage of the available area utilizedto minimize yield losses. Some work has gone as far assimplifying the transistors to minimize the number of extraprocess steps, at the expense of transistor functionality [25].Yet in all of these minimally active designs, a large and costlyactive CMOS die is being produced and paid for, but littleeffective area is being utilized.This work explores the benefits and trade-offs of active andpassive interposer-based design for high-performance systems.First, the yield and performance benefits of interposer-enableddie partitioning are demonstrated in Section II through theuse of a novel core-binning 2.5D cost model. Following thisjustification for interposer-enabled partitioning, Sections IIIand IV provide guidance on interposer technology selectionthrough analysis of active and passive interposers on themetrics of performance scalability and cost overheads. Further,fault-tolerant methods are proposed to reduce active interposercost overhead without increasing total system footprint. Thiswork conflicts with prior assumptions about active interposercost-effectiveness and demonstrates the feasibility, with propertechnology selection, of both active and passive interposerdesign methodologies to provide cost reductions and highbandwidth integration for a broad range of high-performancesystems.II. T HE C ASE FOR I NTERPOSERS : Y IELD AND B INNINGI MPROVEMENT FROM D IE PARTITIONINGModern and future performance-targeted systems will spanthe wide market range from desktop CPUs and GPUs usedfor virtual reality and workstation applications, to exascaleprocessors for the most demanding scientific and big datacomputations. Unlike the mobile and IoT markets, these highend systems have significantly larger die sizes and thus moredifficult yield challenges. For consumer devices, such asan eight-core desktop and workstation processor, manufacturability translates to improved performance per dollar. Formanycore server processors [1] or future exascale processors [20], improved yield and reduced manufacturing costsallow for lower total cost of ownership and wider marketshare. These cost reductions, pushed down to the consumersand warehouse-scale providers, allow for the proliferation ofhigher-performance processing, thus expanding the range ofachievable software solutions across the field.In this section, we demonstrate how interposer-enabled diepartitioning can result in significant manufacturability improvements and cost and functionality benefits across the rangeof performance targeting circuits, motivating the transitionaway from monolithic SoC integration. First, a manufacturingyield and cost model is presented for interposers and 2.5Dsystems. To improve the model accuracy for large-area circuits,a novel core-binning defect model and a chiplet matchingstrategy are developed. The application of these models on twocase studies demonstrates how interposer-based partitioningcan greatly improve yield and increase the number of highmargin, high-performance fully-enabled chips, especially iffuture processes exhibit yield challenges.A. Manufacturing Cost Model for SoC and Interposer SystemsThe cost of a single semiconductor die can be estimatedby using only the die area and the process technology. Thechoice of the process technology has a major impact on the diecost, determining the cost per wafer Cw and density of criticaldefects D0 . Performance-targeted circuits historically adoptthe most recently available process technologies to leverage thelatest improvements in transistor density and speed, althoughit remains to be seen how future technologies will scale incost, yield, and reliability. The defect density D0 of a newprocess is initially high, but it decreases, generally by 2-5 forhistorical technologies [6], over several years as the processmatures. Using the negative binomial yield model [21], theyield of an individual die can be calculated with critical areaA as: A D0 α(1)Ydie 1 αwhere α is a process dependent clustering parameter, frequently between 1 (high defect clustering) and 3 (moderatedefect clustering)1 . For logic-dominated dies, the critical areaA is commonly assigned to be the total area of the integrated1 Poisson yield, with uniform defect distribution, is overly pessimistic forlarge dies [6], but can be approximated with α 10.

circuit. With the die yield, die area, and wafer diameterφwaf er , the number of dies per wafer Ndie is found with:CCCCCCCCCCCCCCCC2Ndieπ (φwaf er /2)π φwaf er Adie2 Adie(2)The manufacturing cost per die is then calculated as:CdieCwaf er ()/YdieNdie(a)(3)where Cwaf er is the process-dependent wafer cost.These three equations are sufficient for modeling the manufacturing cost of a single 2D semiconductor die, but a 2.5Dinterposer-based system introduces additional cost overheads.Unlike stacked 3D integration, the primary active dies in a2.5D system do not require thinning or through-silicon via(TSV) creation. The dies are bonded to the interposer usingface-to-face (metal-to-metal) bonding through microbumps,copper pillars, or micropads. This bonding process does however introduce an extra process complexity that translatesinto fabrication cost and a potential failure point that caninfluence yield. Thankfully, bonding assembly yields havebeen consistently demonstrated at greater than 99% successrates [11], [15]. Of course, the interposer-based system mustalso include the cost of the interposer itself, which can againbe calculated like a standard die using Equations 1-3, withadjustment only to the wafer cost Cwaf er . A passive interposeronly has TSVs (to connect to the substrate) and several layersof metal interconnect, so the wafer cost is significantly lowerthan a comparable CMOS process technology (explored indetail later in Section IV-B). An active interposer would befabricated by using an existing CMOS process technology andby then adding TSVs, resulting in a higher cost per die thana passive interposer given the same size and yield. The totalmanufacturing cost of a 2.5D system with n chiplets and oneinterposer is calculated with:PnCiCinti 1 ( yi Cbondi )yint (4)C2.5D n 1Ybondwhere Cint and yint are the interposer silicon cost and yield.In this work, we assume that Known Good Die (KGD) testingis performed on each chiplet before bonding to the interposer,which is necessary for improving system yield and reducingmanufacturing cost [14].Nonrecurring costs like engineering effort for design andverification, or production of mask sets, can also contribute tototal cost per die, especially when volumes are low. Becausethe high-performance systems under examination already require the largest design effort and are appropriately marketedin large volumes, we assume that any nonrecurring costs aresufficiently amortized across volume or are minimally changedbetween integration approaches.B. A Core-Binning Yield Model for Modular CircuitsThe yield model in Equation 1, although commonly used inprior work, are not representative of the fabrication of largearea integrated circuits. With chip sizes that can approach the(b)Fig. 3. (a) An eight-core die in which 100% of the die can be flexibly disabledfor binning. (b) A representative system where only the cores, which makeup 50% of the area, can be disabled for binning.reticle limit, the yield for a defect-free die can be very low,even for mature process nodes. For example, according toEquation 1 with α 3, a 600 mm2 GPU die in a matureprocess node with defect density D0 0.2 cm 2 [17] wouldhave a die yield (before parametric variation) of only 36%.For an emerging process with D0 0.5 cm 2 , yield isonly 12.5%! In order to improve revenue and produce morefunctional parts, leading manufacturers of CPUs, GPUs, andother high-performance circuits rely on binning at the coreunit level. If a defect is present in a modular core, the impactedsegment of the die is disabled and the chip is sold with reducedfunctionality at a lower price.2In order to model the distribution of defects between andwithin the dies, we utilize the derivation equation of thenegative binomial yield model, shown below in Equation 5.Pdef ect βdΓ(d α) d! Γ(α) (β 1)d α(5)The probability that a die has d defects is calculated using thegamma function Γ(x) and constant β defined as:D0 Aβ (6)αWithin the relatively local area in a single die, it is assumedthat defects are randomly distributed (Poisson) across the coresand uncore area. Multiple defects may fall into the samecore, resulting in more functional cores after binning. Theprobability of a die with d defects and c binnable modularcores to have g good, functional cores is:cS(d, c g) c g(c g)!Pgood (7)cdwhere S(d, c g) is the Stirling number of the second kind.Equation 7 assumes that the whole die is partitionable forbinning. In real designs, non-modular uncore units like interconnect fabric and system management contribute significantdie area and are not easily disabled. Figure 3 shows an eightcore processor with (a) fully partitionable die area (b) 50%binnable core area and 50% critical uncore area, representativeof modern designs. Equation 7 can be expanded to account fornon-modular critical area percentage η:Pgoodη Pgood (1 η)d(8)2 Although mobile systems have grown in heterogeneous complexity, highperformance systems continue to scale along modular units, with benefits todesign and software effort. For simplicity, the analysis here addresses the mostcommon homogeneous systems, but it is similarly applicable to heterogeneoussystems of sufficient modularity.

10.90.80.70.60.50.40.30.20.101 Chip2 Chiplets0.64x024611 Chip0.91.18x2 Chiplets0.80.70.60.50.4 0.62x0.30.20.10802(a) 8 core, 𝐷 0.2 𝑚𝑚1.46x468(b) 8 core, 𝐷 0.5 𝑚𝑚10.90.80.70.60.50.40.30.20.101 Chip2 Chiplets4 Chiplets1.98x0.42x0242832(c) 32 core, 𝐷 0.2 𝑚𝑚10.90.80.70.60.50.40.30.20.101 Chip2 Chiplets4 Chiplets0.42x03.94x242832(d) 32 core, 𝐷 0.5 𝑚𝑚Fig. 4. Yield distribution of binned dies after manufacturing for each functional core count bin.C. Core Binning and Cost Results for Eight-Core ProcessorBy taking the sum of products of Equations 5 and 8 acrossall defect counts, we can determine the yield distribution foreach number of functional cores. We first apply our modelsto investigate a mainstream eight-core desktop/workstationconsumer processor with A 200 mm2 , α 3, and η 0.5,as shown in Figure 3b. Binning is performed at multiples oftwo cores, as in modern commercially available processors.For the two-chiplet design, a greedy matching process isused to produce as many fully-enabled processors as possible.A per-chiplet bond yield Ybond 99% [15] is included inthe two-chiplet system yield distribution to reflect pessimisticintegration losses. Binned yield distribution results are shownin Figure 4 (a) and (b) for two potential defect rates: a matureprocess with D0 0.2 cm 2 and a cutting-edge process, orpotentially a low yield future process, with D0 0.5 cm 2 .The yield improvement from chiplet partitioning and KGDtesting translates to a reduction in unsalvageable chips and anincrease in the number of fully enabled, high margin chips. Atthe defect rates for a mature process and an emerging process,the number of fully functional cores is estimated to increase by1.18 and 1.46 and the number of failing systems decreasesby 0.64 and 0.62 .Speed BinTargetSlow2 core10.84 core1.71.56 core2.528 core53.7TABLE IN ORMALIZED PRICE PER CORE COUNT OF EXISTING CONSUMERPROCESSORS AT TWO SPEED BINS .To measure the total utility of these improvements to yieldand functionality, we can utilize the estimated price of equivalent commodity processors as a representative value metric.Table I lists normalized, approximate price ratios for eachcore count at two speed bins based on previously publishedconsumer devices [3]. To model parametric yield, which canalso be improved through die partitioning and known good diematching [9], a Gaussian frequency distribution is assumed foreach core, with any cores with frequency below one standarddeviation of the mean binned to “Slow” and average and fastercores binned to “Target.” Under this simple parametric model,about half of the four-core chiplets will achieve the targetspeed, while only a quarter of the eight-core chips can meetthe target. Through a combination of functional and parametricyield improvements, the utility value metric of the two-chipletsystem is improved by 20.8% when D0 0.2 cm 2 and by41.4% when D0 0.5 cm 2 .D. Core Binning and Cost Results for 32-Core ProcessorWhile modest yield improvements are seen from chiplet partitioning for the consumer processor at mature defect densities,increasingly significant gains are seen for larger area circuitslike server processors that exhibit greater yield challenges.Yield distributions for an example 32-core server processorwith A 600 mm2 are shown in Figure 4 (c) and (d) for thesame D0 0.2 cm 2 and D0 0.5 cm 2 , respectively. Diepartitioning results in a 0.42 reduction in failing chips anda 1.98 improvement in the number of fully enabled chipsfor the mature process, and a 0.42 reduction in failures anda very sizable 3.94 improvement in full enablement for theemerging process.III. I NTERPOSER S ELECTION : P ERFORMANCE ANDS CALABILITYIn the previous section, significant improvements in manufacturability are shown from the chiplet partitioning oflarge monolithic systems. This technique can be enabled bymultiple emerging packaging technologies, but the requirements for high bandwidth, high efficiency, and low latencyin performance-targeting systems are difficult to achieve withcoarse-featured package-level integration techniques. The finefeatured die-level integration of passive or active interposers,however, is able to concurrently meet these performance goals.Within this interposer design space, circuit-level differencesbetween active and passive interposers determine the feasibleNoC architecture designs and resulting performance. In thissection, we analyze these interposer NoC architectures in termsof scalability, area overhead, and link frequency in order toassist designers in the proper interposer technology selectionto meet system requirements.A. Active and Passive Interposer NoC DesignThe interconnect-only nature of passive interposers, versusthe embedded routers and low latency repeated wires of activeinterposers, leads to major differences in NoC design between

.Passive InterposerChipletµbumpµbumpRouter.RouterActive InterposerFig. 5. NoC integration topology for passive and active interposer.256256 µBumps0.41 mm²256-bit Router(16nm) 0.33 mm²256-bitRouter(65nm)4.47 mm²Chiplet 50 mm²512512 µBumps0.82 mm²512-bit Router(16nm) 1.08 mm²512-bit Router (65nm)17.7 mm²Fig. 6. Scale comparison of 40 µm pitch microbump arrays to 256-bit and512-bit flit width routers in 16nm and 65nm technologies.the two interposer types. For the passive interposer, all routersmust be fabricated into the chiplet dies, contributing chipletarea overhead. Each network link is driven from the outputchannel through the microbumps into the passive interposer,where it travels along a long unbuffered interconnect link before again passing through a microbump to the receiving routerinput channel. With routers in the chiplets, all inter-chipletNoC links, in all directions, must pass through these die-dieconnections, which often include electrostatic discharge (ESD)protection overheads [25]. The active interposer, however, onlyneeds to add a single high bandwidth hop from a chiplet nodeto an on-interposer router. Within the active interposer, the flitcan be passed between routers without the overhead of diedie microbump transmissions. Additionally, repeaters alongthe links can reduce interconnect transmission delay and increase the achievable network frequency. The increased designflexibility of the active interposers, with reduced constraintson microbump utilization and router placement, presents awide range of network architecture opportunities to meetperformance requirements [9], which for exascale systems maybe multiple terabytes per second of memory bandwidth [20].The network architecture differences between interposer typesare demonstrated in Figure 5.1) Router and Microbump Technology Scalability: Onenecessary design consideration for interposer-based NoC isthe area scalability of the microbump arrays versus the area ofthe process technology-dependent routers. Modern microbumptechnology is standardizing on 40µm pitch, with potentialreduction to 5 µm pitch in the future [18]. At current pitches,a 512-bit link spans an area of at least 0.82 mm2 (notincluding any local microbump allocation for power or clocks),and a 256-bit link is half this area at 0.41 mm2 . A 5x5router for a passive interposer will have 2 unidirectional linksinternal to the chiplet and 8 through-interposer links for the 4cardinal directions, thus requiring 8 microbump arrays of thelink width. This is still a reasonably small percentage of thepeak available chiplet bandwidth (only 13% of even a small50 mm2 chiplet), but it could limit the number of routersper chiplet. Of more significant concern is the scalability ofmicrobump pitch with router size. Using the McPAT modelingframework for quick and repeatable estimation, the areas fora 5x5 NoC router can be generated for a range of processtechnologies from 16nm to 65nm and beyond [16]. Figure 6illustrates potential scaling issues for both active and passiveinterposers. For passive interposers, the area of a router ina modern 16nm process is slightly smaller than the area ofa single microbump array of the same width, but because8 unidirectional links are required between the chiplet andpassive interposer, sufficient fan-out wiring must be added,further consuming chiplet resources. For an active interposerin an aging technology node like 65 nm, the router area can bean order of magnitude larger than a single microbump array.This facilitates the low-overhead communication between theinterposer and chiplet, but it limits to the number of routersin the active interposer when older processes are selected.(a)𝑅𝜇𝐶𝑤2𝑅𝑊 𝑤2𝑁𝐶𝑤2𝑅𝑊 /𝑁𝐶𝑤2𝑁𝐶𝐿𝐶𝜇 𝐶𝑤2𝑁𝐶𝐿Fig. 7. NoC circuit differences between (a) passive and (b) active interposers.10Active (No ESD Load)9Max Bitrate (Gbps)Chiplet8Active (High Load)7Passive6543210123456789101112Link Distance (mm)Fig. 8. Maximum bitrate versus link distance for the passive interposer,active interposer with same load capacitance as passive interposer, and activeinterposer without ESD load overhead.2) Link Frequency in Active and Passive Interposers:As discussed in Section III-A, the lack of active devices inthe passive interposer requires that routers are placed in thechiplet dies and that links must route through the die-diemicrobumps and across longer unrepeated interconnect.Thecircuit models for the different interposer types are shownin Figure 7 for the passive interposer link, with microbumpRC, and for an active interposer link with N repeaters. Toachieve high bandwidth and low latency routing, the activeinterposer has the advantage of lower RC (without die-die

connections) and reduced interconnect delay from repeaters.Further, the die-die interconnect needs ESD protection on thebumps to protect the circuit during manufacturing, resultingin additional capacitive load for each passive interposer link.To model the difference in link delay and maximum networkfrequency, the circuits were simulated using HSPICE usingthe 65nm PTM models for transistor and interconnect [29].For each specified link distance, the drivers and repeaters wereoptimized to minimize link delay. Maximum bitrate results areshown in Figure 8 for interconnect settings with 350 nm wirewidth and spacing [18], 1.2 µm thickness, starting driver widthof 2x, and maximum repeater width of 64x. To demonstratesensitivity, two curves are shown for the active interposer:one with the same capacitive load as the passive interposerwith ESD protection overhead (200 f F ) and one with a lowerload of 50 f F . The microbumps, with self capacitance ofonly 15 f F [18], introduce limited overhead compared tothe lengthy interconnect. The repeaters, however, provide asignificant advantage to the active interposer, which is able toachieve several times less delay than the passive interposer forthe same link length. The active interposer can thus provide agreater range of NoC performance, with reduced latency linksfor higher network frequency or longer physical links at thesame 𝑒Fig. 9. Critical defects (large) cause shorts and cuts in the interconnect, whilesmaller defects are non-critical.IV. I NTERPOSER S ELECTION : C OST AND Y IELDOVERHEADAs demonstrated in Section II, the partitioning of a largemonolithic SoC into multiple chiplets can result in significantimprovements to yield and functionality. Active and passive interposers are able to provide high bandwidth NoCs for chipletreintegration to meet a range of performance requirements, asshown in Section III. Unfortunately, interposer fabrication andchiplet bonding add manufacturing cost overheads that maydiminish the total system cost benefits. Additionally, althoughactive interposers demonstrate lower link latency, higher bitrates, and more flexible NoC architectures, the extra processand design complexity versus passive interposers translates tofurther cost and yield overheads. In this section, we analyzethe relative magnitudes of these overheads versus systemcost improvements across a range of interposer technologychoices. We find that active interposers are indeed consistentlymore expensive than passive interposers, but that with propertechnology selection they are both cost-effective integrationsolutions for high-performance systems. Further, based on thepresented yield and cost breakdowns, the “prepaid” vacant200 𝑚𝑚2600 𝑚𝑚2256-bit(a)512-bit(b)Fig. 10. Chiplet partitions and 2 4 NoC meshes for (a) eight-core and (b)32-core systems.area of the interposer is leveraged for fault tolerance to reducethe active interposer cost overhead. In order to meet systemrequirements, system designers can leverage the analysis andtechniques in this section to balance the cost versus performance trade-offs between active and passive interposers.A. Interconnect Yield Model for InterposersUnlike most silicon circuits, a passive interposer is primarilymetal interconnect, surrounded by vacant space. An activeinterposer is similar in design, but may also have sparse logicactivity for routers and repeaters. The prior assumptions forEquation 1 for critical area and defect density are inaccuratefor interconnect yield, since a wider route is instead moreresilient to a small defects that would disrupt minimallysized features. As shown in Figure 9, failures occur as shortsbetween wires (in the same or adjacent layers) or as opencuts [5]. Large wires and spacings require larger sized defectsto cause a failure, and historically densities for larger defectsizes drop quickly compared to the critical feature size [6].Maximum

University of California, Santa Barbara Santa Barbara, California fdstow, yuanxieg@ece.ucsb.edu Taniya Siddiqua, Gabriel H. Loh . and data localization to meet efficiency requirements [20]. . and warehouse