Computer Science 246 Advanced Computer Architecture

2y ago
90 Views
2 Downloads
329.90 KB
38 Pages
Last View : 1m ago
Last Download : 2m ago
Upload by : Lilly Kaiser
Transcription

Computer Science 246Advanced ComputerArchitectureSpring 2008Harvard UniversityInstructor: Prof. David Brooksdbrooks@eecs.harvard.edu

Why worry about power dissipation?BatterylifeThermal issues: affectcooling, packaging,reliability, timingEnvironment2

Power-Aware Needed across all computing platforms Mobile/portable (cell phones, laptops, PDA) Battery life is critical Desktops/Set-Top (PCs and gamemachines) Packaging cost is critical Servers (Mainframes and compute-farms) Packaging limits Volumetric (performance density)3

Modeling Design First Component (Modeling/Measurement): Come up with a way to:–Diagnose where power is going in yoursystem–Quantify potential savings Second Component (Design) Try out lots of ideas This class will focus on both of these atmany levels of the computing hierarchy4

How CMOS Transistors Work5

MOS Transistors are Switches6

Static CMOS7

Basic Logic Gates8

CMOS Water AnalogyElectron: water moleculeCharge: weight of waterVoltage: heightCurrent: flow rateCapacitance: container cross-section(Think of power-plants that store energy bypumping water into towers)9

Liquid Inverter Capacitance at input Gates of NMOS, PMOS Metal interconnect Capacitance at output Fanout (# connections) toother gates “Diffusion” capacitance of tx Metal InterconnectNMOS conducts when waterlevel is above switchingthresholdPMOS conducts belowNo conduction after containerfull10

Inverter Signal Propagation (1)11

Inverter Signal Propagation (2)12

Delay and Energy Definitions Propagation Delay Time to fill output container to 50% Time to charge output capacitor to 50% Switching Energy Weight * height of water moved Charge * voltage of charge transferred13

Delay and Power Observations Load capacitance increases delay High fanout (gates attached to output) Interconnection Higher current can increase speed Increasing transistor width raises currents butalso raises capacitance Energy per switching event independent ofcurrent Depends on amount of charge moved, not rate14

Feedback-based Latch Pros: Holds data as long as power applied Actively drives output: (can be fast) Con: Fairly big (5 transistors)Can be used for latches or SRAM cells15

Charge-based Latch Pros: Con: Small: 1 transistor, 1 capacitor (may be gate of tx)Charge “leaks” off capacitor ( 1ms)Reads can be destructive (must read follow by write)Can be used for latches or DRAM cells16

Power: The Basics Dynamic power vs. Static power Dynamic: “switching” powerStatic: “leakage” powerDynamic power dominates, but static power increasing inimportanceTrends in eachStatic power: steady, per-cycle energy costDynamic power: capacitive and short-circuitCapacitive power: charging/discharging attransitions from 0Æ1 and 1Æ0Short-circuit power: power due to brief short-circuitcurrent during transitions.Most research focuses on capacitive, but recentwork on others17

Dynamic (Capacitive) Power DissipationIVINVOUTCL Data dependent – a function of switchingactivity18

Capacitive Power dissipationCapacitance:Function of wirelength, transistor sizeSupply Voltage:Has been droppingwith successive fabgenerationsPower ½ CV2AfActivity factor:How often, on average,do wires switch?Clock frequency:Increasing 19

Lowering Dynamic Power Reducing Vdd has a quadratic effect Has a negative ( linear) effect on performancehowever Lowering CL May improve performance as well Keep transistors small (keeps intrinsiccapacitance (gate and diffusion) small) Reduce switching activity A function of signal transition stats and clockrate Clock Gating idle units Impacted by logic and architecture decisions20

Short-Circuit Power DissipationISCVINVOUTCL Short-Circuit Current caused by finite-slopeinput signalsDirect Current Path between VDD and GNDwhen both NMOS and PMOS transistors areconducting21

Short-Circuit Power DissipationPowerSC tscVIpeak Power determined by Duration and slope of input signal, tsc Ipeak determined by transistor sizes, processtechnology, CL Short circuit power can be minimized Try to match rise/fall times of input and outputsignals Have not seen many architectural solutions here Good news: relatively, PowerSC is shrinking22

Leakage CurrentsV INV OUTISubIDSub k e q VTa ka TCLIgate Subthreshold currents grow exponentially with increases intemperature, decreases in threshold voltage But threshold voltage scaling is key to circuit performance!Gate leakage primarily dependent on gate oxide thickness,biasesBoth type of leakage heavily dependent on stacking and inputpatternMore on leakage later in the semester23

Gate vs. Subthreshold LeakageTrendsFrom Mukhopadhyay, et al. TVLSI ‘0324

Lowering Static Power Design-time Decisions Use fewer, smaller transistors -- stack whenpossible to minimize contacts with Vdd/Gnd Multithreshold process technology (multiple oxidestoo!)– Use “high-Vt” slow transistors wheneverpossible Dynamic Techniques Reverse-Body Bias (dynamically adjust threshold)– Low-leakage sleep mode (maintain state), e.g.XScale Vdd-gating (Cut voltage/gnd connection to circuits)– Near zero-leakage sleep mode– Lose state, overheads to enable/disable25

What do we mean by Power? Max Power: Artificial code generating max CPU activity Worst-case App Trace: Practical applications worst-case Thermal Power: Running average of worst-case app power over atime period corresponding to thermal time constant Average Power: Long-term average of typical apps (minutes) Transient Power: Variability in power consumption for supply net26

Power vs. Energy Power consumption in Watts Determines battery life in hours Sets packaging limits Energy efficiency in joules Rate at which energy is consumed over time Energy power * delay (joules watts *seconds) Lower energy number means less power toperform a computation at same frequency27

Power vs. Energy28

Power vs. Energy Power-delay Product (PDP) Pavg * t PDP is the average energy consumed perswitching event Energy-delay Product (EDP) PDP * t Takes into account that one can tradeincreased delay for lower energy/operation Energy-delay2 Product (EDDP) EDP * t Why do we need so many formulas?!!? We want a voltage-invariant efficiencymetric! Why? Power ½ CV2Af, Performance f (and V)29

E vs. EDP vs. ED2P Power CV2f V3 (fixed microarch/design)Performance f V (fixedmicroarch/design)(For the nominal voltage range, f variesapprox. linearly with V)Comparing processors that can only usefreq/voltage scaling as the primary methodof power control: (perf)3 / power, or MIPS3 / W or SPEC3 /W is afair metric to compare energy efficiencies. This is an ED2 P metric. We could also use:(CPI)3 * W for a given application30

E vs. EDP vs. ED2P Currently have a processor design: 80W, 1 BIPS, 1.5V, 1GHz Want to reduce power, willing to lose someperformance Cache Optimization:–IPC decreases by 10%, reduces power by20% Final Processor: 900 MIPS, 64W–Relative E MIPS/W (higher is better) 14/12.5 1.125x Energy is better, but is this a “better”processor?31

Not necessarily 80W, 1 BIPS, 1.5V, 1GHz Cache Optimization:– IPC decreases by 10%, reduces power by 20% Final Processor: 900 MIPS, 64W– Relative E MIPS/W (higher is better) 14/12.5 1.125x– Relative EDP MIPS2/W 1.01x– Relative ED2P MIPS3/W .911xWhat if we just adjust frequency/voltage onprocessor? How to reduce power by 20%?P CV2F CV3 Drop voltage by 7% (and also Freq) .93*.93*.93 .8xSo for equal power (64W)– Cache Optimization 900MIPS– Simple Voltage/Frequency Scaling 930MIPS32

Analysis Abstraction LevelsAbstractionLevelAnalysis Analysis Analysis AnalysisEnergyCapacity Accuracy Speed Resources stApplicationBehavioralArchitectural (RTL)Logic (Gate)Transistor (Circuit)Least33

Power/Performance abstractions Low-level: Medium-Level: HspicePowerMillRTL ModelsArchitecture-level: PennState SimplePowerIntel TempestPrinceton WattchIBM PowerTimerUmich/Colorado PowerAnalyzer34

Low-level models: Hspice Extracted netlists from circuit/layoutdescriptions Diffusion, gate, and wiring capacitance ismodeled Analog simulation performed Detailed device models used Large systems of equations are solved Can estimate dynamic and leakage powerdissipation within a few percent Slow, only practical for 10-100K transistors PowerMill (Synopsys) is similar but about10x faster35

Medium-level models: RTL Logic simulation obtains switching eventsfor every signalStructural VHDL or verilog with zero or unitdelay timing modelsCapacitance estimates performed Device Capacitance–Gate sizing estimates performed, similar tosynthesis Wiring Capacitance–Wire load estimates performed, similar toplacement and routing Switching event and capacitance estimatesprovide dynamic power estimates36

Architecture level models Two major classes: Cycle/Event-Based: Arch. Level power models interfacedwith cycle-driven performance simulationInstruction-Based: Measurement/Characterization basedon instruction usage and interactionsComponents of Arch. Level power model Could be based on ckt schematicmeasurements/extrapolationOr Capacitance modelsBoth may need to consider Circuit design styles Clock gating styles & Unit usage statistics Signal transition statistics37

Paper Readings Background Material (available on website) Power-Aware Microarchitecture: Design andModeling Challenges for Next-GenerationMicroprocessors,” IEEE MICRO. “Power: A First-Class Architectural DesignConstraint,” IEEE Computer.38

Computer Science 246 Advanced Computer Architecture Spring 2008 Harvard Universit

Related Documents:

WAC (3/27/2017 1:20 PM) [ 1 ] NOT FOR FILING Chapter 246-221 WAC. RADIATION PROTECTION STANDARDS. Last Update: 6/10/16. WAC. 246-221-001 Purpose and scope. 246-221-005 Radiation protection programs. 246-221-010 Occupational dose limits for adults. 246-221-015 Compliance with requirements for summation of external and internal doses. 246-221-020

the licensee shall follow the applicable requirements in WAC 246-221-270, WAC 246-232-060, chapter 246-246 WAC, and WAC 246-247-080. 25. The licensee must respond in the manner, and within the period, specified to any and all Department correspondence necessary to keep the license and related information current.

Computer Science 246 David Brooks Computer Science 246 Computer Architecture Spring 2010 Harvard University Instructor: Prof. David Brooks dbrooks@eecs.harvard.edu D

SHASTA FAMILY YMCA 1155 N. COURT ST, REDDING CA 96001 P 530 246 9622 F 246 9645 WWW.SFYMCA.ORG BEST SUMMER EVER SHASTA FAMILY YMCA CAMP MCCUMBER SUMMER 201 9 . 2 P a g e SHASTA FAMILY YMCA 1155 N. COURT ST, REDDING CA 96001 P 530 246 9622 F 246 9645 WWW.SFYMCA.ORG . Because of Y

Practicing Place Value Write 246 on the board and have students practice saying it to a partner. Ask students: What would 246 look like written in expanded form? What if we wanted to round 246 to the nearest ten? To the nearest hundred? How many groups of 10 are in 246? (24). How many 1s would be left? (6)

6.11 Atmospheric Chemistry of Biogenic Hydrocarbons 233 6.11.1 Atmospheric Chemistry of Isoprene 233 6.11.2 Monoterpenes (α-Pinene) 241 6.12 Atmospheric Chemistry of Reduced Nitrogen Compounds 244 6.12.1 Amines 245 6.12.2 Nitriles 246 6.12.3 Nitrites 246 6.13 Atmospheric Chemistry (Gas Phase) of Sulfur Compounds 246 6.13.1 Sulfur Oxides 246

This handbook supplement applies to students entering the fourth year of their degree in Computer Science, Mathematics & Computer Science or Computer Science . Undergraduate Course Handbook 1.2 Mathematics & Computer Science The Department of Computer Science offers the following joint degrees with the Department of Mathematics: BA .

Artificial intelligence is an artefact, built intentionally. Definitions for communicating right now. Romanes, 1883 – Animal Intelligence, a seminal monograph in comparative psychology. Intelligence is doing the right thing at the right time. A form of computation (not math)–transforms sensing into action. Requires time, space, and energy. Agents are any vector of change, e.g .