Computer Architecture? - University Of Pittsburgh

2y ago
31 Views
3 Downloads
609.92 KB
12 Pages
Last View : 8d ago
Last Download : 3m ago
Upload by : Lucca Devoe
Transcription

Welcome to CS2410!CS2410: Computer ArchitectureTechnology, software,performance, and cost issues This is a grad-level introduction to Computer Architecture Let’s take a look at the course info. Sheet ScheduleSangyeun ChoComputer Science DepartmentUniversity of PittsburghCS2410: Computer ArchitectureComputer architecture? University of PittsburghComputer architecture?A Computer Science discipline that explores: Principles and practices to exploit characteristics of hardware &software artifacts relevant for computer systems hardware design; Computer hardware design itself; and Changing interaction between hardware and software “Application pull”GoalsOperatingsystemsCompilerSoftware layersInstruction Set Architecture Sustain the historic computer performance (what is performance?)architectimprovement rate and expand a computer’s capabilities Keep the cost downApplications,e.g., games“Architecture”Processor Organization“Microarchitecture”VLSI Implementation“Physical hardware”“Technology push”Semiconductor technologiesCS2410: Computer ArchitectureUniversity of PittsburghCS2410: Computer ArchitectureUniversity of Pittsburgh

Uniprocessor performance Performance 1 / timeTime IC CPI CCT Instructions/programToday’s topics “Switches” Impact of CMOS scaling Also called “instruction count” (IC above) Represents how many (dynamic) instructions are required to finishthe program Highly depends on “architecture” Clocks/instruction Time/clocks Cost IC chip cost Also called clock cycle time (inverse of frequency) Highly depends on circuit & VLSI chip realization University of PittsburghUniprocessor performance trendPerformance Benchmarks Summarizing performance measurements Quantitative approach to computer design Also called CPI (Clocks Per Instruction) Depends on pipelining and “microarchitecture” implementationCS2410: Computer ArchitectureTechnology trendsApplication trendsCS2410: Computer ArchitectureUniversity of PittsburghUniprocessor performance hurdles Maximum power dissipation 100W 150W Little instruction-level parallelism leftLittle-changing memory latency“We are dedicating all of our future product development tomulticore designs. This is a sea change in computing.” Paul Otellini, President, Intel (2004)CS2410: Computer ArchitectureUniversity of PittsburghCS2410: Computer ArchitectureUniversity of Pittsburgh

How does technology scaling help? Time (inst. count) (clocks per inst.) (clock cycle time) Faster circuit Scaling makes transistors not only smaller but also faster Faster clock smaller clock cycle time More transistors Larger L2 caches (relatively simple design change) Smaller CPI Moore’s note (1965)CS2410: Computer ArchitectureUniversity of PittsburghSwitches Design changes enabled by scaling Deep pipeline using more pipeline registersSuperscalar pipeline using more functional unitsLarger, more sophisticated branch predictors MulticoresCS2410: Computer ArchitectureUniversity of PittsburghHistory of switchesBuilding block for digital logic NAND, NOR, NOT, Technology advances have provided designers with switchesthat areCalled “relay”; Mark I (1944)Bell lab. (1947); Kilby’s first IC (1957) Faster; Lower power; More reliable (e.g., vacuum tube vs. transistor); and Smaller. Nano-scale technologies will not continue promising thesame good propertiesCS2410: Computer ArchitectureUniversity of PittsburghVacuum tubes; ENIAC (1946, 18k tubes)CS2410: Computer ArchitectureSolid-state MOS devicesUniversity of Pittsburgh

MOS transistors MOS transistor scalingToday’s chips heavily depend on CMOS (complementaryMOS)-style logic designCS2410: Computer ArchitectureUniversity of PittsburghImpact of MOS transistor scaling CS2410: Computer ArchitectureUniversity of PittsburghGlobal wire delayIn general Smaller transistors (i.e., density doubling with each new generation) Faster transistors (latency L) Roughly constant wire delay ( relatively slow wires!) Lower supply voltage ( lower dynamic power) Downside Increased global wire delay Increased power density (W/cm2) Increased leakage power Increased susceptibility to noise and transient errors On-chip variation Cost of manufacturingCS2410: Computer ArchitectureUniversity of PittsburghCS2410: Computer ArchitectureUniversity of Pittsburgh

Power densityCS2410: Computer ArchitectureProductivityUniversity of PittsburghComponent-level performance trend Four key components in a computer system Compare 1980 Archaic (or “Nostalgic”) vs. 2000 Modern(or “Newfangled”) Bandwidth: # operations or events per unit time Latency: elapsed time for a single operation or eventCS2410: Computer Architecture 3,600 RPM0.03 GBTracks/inch: 800Bits/inch: 9,550Three 5.25” plattersBandwidth: 0.6 MB/sLatency: 48.3 ms Cache: noneMetricUniversity of PittsburghUniversity of PittsburghDisk: Archaic vs. ModernCDC Wren I, 1983DisksMemoryNetworkProcessors (Patterson) CS2410: Computer ArchitectureSeagate 373453, 2003 (4x)(2,500x)(80x)(60x)Bandwidth: 86 MB/s (140x)Latency: 5.7 ms(8x) Cache: 8MB CS2410: Computer Architecture15,000 RPM73.4 GBTracks/inch: 64,000Bits/inch: 533,000Four 2.5” plattersUniversity of Pittsburgh

LANs: Archaic vs. ModernMemory: Archaic vs. Modern 1980 DRAM(asynchronous)0.06 Mbits/chip64,000 xtors, 35 mm216-bit data bus permodule, 16 pins/chip13 Mbytes/secLatency: 225 ns(no block transfer) 2000 Double Data Rate Synchr.(clocked) DRAM256.00 Mbits/chip(4000X)256,000,000 xtors, 204 mm264-bit data bus perDIMM, 66 pins/chip(4X)1600 Mbytes/sec(120X)Latency: 52 ns(4X)Block transfers (page mode) Ethernet 802.3Year of Standard: 197810 Mbits/slink speedLatency: 3000 secShared mediaCoaxial cableCoaxial Cable: Ethernet 802.3ae Year of Standard: 2003 10,000 Mbits/s(1000X)link speed Latency: 190 sec(15X) Switched media Category 5 copper wirePlastic CoveringBraided outer conductor"Cat 5" is 4 twisted pairs in bundleTwisted Pair:InsulatorCopper coreCS2410: Computer ArchitectureUniversity of PittsburghCPUs: Archaic vs. Modern 1982 Intel 8028612.5 MHz2 MIPS (peak)Latency 320 ns134,000 xtors, 47 mm216-bit data bus, 68 pinsMicrocode interpreter,separate FPU chip(no caches) CS2410: Computer Architecture2001 Intel Pentium 41500 MHz(120X)4500 MIPS (peak)(2250X)Latency 15 ns(20X)42,000,000 xtors, 217 mm264-bit data bus, 423 pins3-way superscalar,Dynamic translation to RISC,Superpipelined (22 stage),Out-of-Order executionOn-chip 8KB Data caches,96KB Instr. Trace cache,256KB L2 cacheUniversity of PittsburghCopper, 1mm thick,twisted to avoid antenna effectCS2410: Computer ArchitectureUniversity of PittsburghLatency lags bandwidth (last 20 years) CPU 21x vs. 2250x “Memory wall”Ethernet 16x vs. 1000x Memory module 4x vs. 120x Disk 8x vs. 143xCS2410: Computer ArchitectureUniversity of Pittsburgh

Rule of thumbs: latency lagging BW Cost trendIn the time that bandwidth doubles, latency improves by nomore than a factor of 1.2 to 1.4 Learning curve (Capacity improves faster than bandwidth) Time Change in yield In other words, bandwidth improves by more than thesquare of the improvement in latencyVolume Decreases cost, increasesefficiency “Shrinking” by deploying next-generation technology (withoutchanging the design itself) Commoditization Standards push this Multiple vendors competeCS2410: Computer ArchitectureUniversity of PittsburghIC (Integrated Circuit) cost CS2410: Computer ArchitectureUniversity of PittsburghIC (Integrated Circuit) costCost of IC (cost of production) / (final test yield)Dies per wafer Cost of production wafer diameter/2) 2die area wafer diameter2 die area Cost of die Cost of testing die Cost of packaging and final test Cost of production at time line NRE (Non-Recurring Engineering) costR&DMask Chip production“Front end”“Back end” – packaging, etc. Test cost Cost of die (cost of wafer) / ((dies per wafer) (die yield))CS2410: Computer ArchitectureUniversity of PittsburghCS2410: Computer ArchitectureUniversity of Pittsburgh

IC (Integrated Circuit) costPerformance analysis defect density die area Die yield wafer yield 1 defect density # defects in unit areadefect density die area will be then average # of defectsper die : manufacturing complexity 2006 CMOS process: 4.0 Which computer is faster for what you want to do? Time matters Workload matters Throughput (jobs/sec) vs. latency (sec/job) Single processor vs. multiprocessor Pentium4 @2GHz vs. Pentium4 @4GHz Commonly used techniques Direct measurement Simulation Analytical modelingCS2410: Computer ArchitectureUniversity of PittsburghPerformance analysis CS2410: Computer ArchitecturePerformance reportCombination of Measurement Interpretation Communication result Machine configuration, compiler flags, Overall performance vs. specific aspects It may in fact mislead; a technique good for a program may be bad forothersPerturbationAccuracyReproducibility CS2410: Computer ArchitectureSingle number is attractive, but It does not show how a new feature affects different programsConsiderations in performance analysis Reproducibility Provide all necessary details so that others can reproduce the same Choice of metric University of PittsburghUniversity of PittsburghCS2410: Computer ArchitectureUniversity of Pittsburgh

Performance analysis techniques Direct measurement Can provide the best result – no simplifying assumptionsNot flexible (difficult to change parameters)Prone to perturbation (if instrumented)Made much easier these days by using performance countersPerformance metrics (Preferably) single number that essentially extracts a desiredcharacteristic Cache hit rate AMAT (Average Memory Access Time) IPC (Instructions Per Cycle) Simulation Time (or delay) Very flexible Time consuming Difficult to model details and validate Energy-delay product Analytical modeling Quick insight for overall behaviors Limited applicability Used to confine simulation scope, validate simulations, etc.CS2410: Computer ArchitectureUniversity of PittsburghComparing twoExecution time Y nExecution time X University of PittsburghBenchmarks SPEC CPU 2006 (desktop and servers)EEMBC, SPECjvm (embedded)TPC-C, TPC-H, SPECjbb, ECperf (servers) Kernels: important pieces of codes from real applications Toy programs: small programs that we easily understand Livermore loops, Quicksort Sieves of Eratosthenes, Execution time Y Performance Xn Execution time X Performance YReal programsBenchmark suites: a set of real applications “X is n times faster than Y”Two different machinesTwo different options (e.g., memory sizes) on a machine CS2410: Computer ArchitectureCS2410: Computer ArchitectureSynthetic program: to mimic a program behavior “uniformly” Dhrystone Whetstone, University of PittsburghCS2410: Computer ArchitectureUniversity of Pittsburgh

Summarizing performance resultsSPEC CPU2006 12 integer programs 9 use C 3 use C 3 use C4 use C 6 use Fortran4 use a mixture of C andFortranPackage available at/afs/cs.pitt.edu/projects/spec-cpu2006CS2410: Computer ArchitectureUniversity of PittsburghSPEC2k scoring method Arithmetic mean Weighted arithmetic mean Geometric mean When dealing with times17 floating-pointprograms When dealing with ratios SPEC CPU uses this methodGeometric mean nn sampleii 1 In the case of SPEC, samplei is the SPECRatio for program iCS2410: Computer ArchitectureUniversity of PittsburghAmdahl’s lawGet execution time of each benchmark Optimization or parallelization usually applies to a portion Places “limitation” of the scope of an optimization Get a ratio for each benchmark by dividing the time withthat of the reference machine Leads us to focus on “common cases” “Make common case fast and rare case accurate” Sun Ultra 5 10, 300MHz SPARC, 256MB memory Its score is 100 eaffectedGet a geometric mean of all the computed ratiosCS2410: Computer ArchitectureUniversity of PittsburghCS2410: Computer ArchitectureUniversity of Pittsburgh

Principle of locality Performance vs. performance-priceLocality found in memory access instructions Temporal locality: if an item is referenced, it will tend to be referencedagain soon Spatial locality: if an item is referenced, items whose addresses areclose by tend to be referenced soon 90/10 locality rule A program executes about 90% of its instructions in 10% of its codeWe will look at how this principle is exploited in variousmicroarchitecture techniquesUniversity of PittsburghKiller apps? Multimedia applicationsGamesmobilityUMTS3GPP-LTEVirtual realityRMS (Recognition, Mining, and Synthesis)GSMGPRSWalking Video mining Voice synthesis (Cf.) Software defined radio and other mobile applicationsHSxPAEDGEUniversity of EE802.16a,dWLANWLAN(IEEE 802.11b)DECTBlueTooth0.1CS2410: Computer Architecture3G Evolution&Beyond 3G 2010CDMA Speech recognition SiemensDegree of Physics simulation University of PittsburghSoftware defined radio 3D graphics CS2410: Computer ArchitectureDrivingCS2410: Computer ArchitectureStationary CS2410: Computer ArchitectureUser data rate100 Mbps(IEEE 802.11a/g/n)110University of Pittsburgh

Multimedia performance needs(K. Uchiyama, ACSAC ‘07)CS2410: Computer ArchitectureUniversity of PittsburghCS2410: Computer ArchitectureUniversity of Pittsburgh

CS2410: Computer Architecture Technology, software, performance, and cost issues Sangyeun Cho Computer Science Department University of Pittsburgh CS2410: Computer Architecture University of Pittsburgh Welcome to CS2410! This is a grad-level introduction to Computer Architecture

Related Documents:

What is Computer Architecture? “Computer Architecture is the science and art of selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals.” - WWW Computer Architecture Page An analogy to architecture of File Size: 1MBPage Count: 12Explore further(PDF) Lecture Notes on Computer Architecturewww.researchgate.netComputer Architecture - an overview ScienceDirect Topicswww.sciencedirect.comWhat is Computer Architecture? - Definition from Techopediawww.techopedia.com1. An Introduction to Computer Architecture - Designing .www.oreilly.comWhat is Computer Architecture? - University of Washingtoncourses.cs.washington.eduRecommended to you b

BSN, University of Pittsburgh, 1977 . MSN, University of Pittsburgh, 1981 . Submitted to the Graduate Faculty of . School of Nursing in partial fulfillment . of the requirements for the degree of . Doctor of Philosophy . University of Pittsburgh . 2010

Paper Name: Computer Organization and Architecture SYLLABUS 1. Introduction to Computers Basic of Computer, Von Neumann Architecture, Generation of Computer, . “Computer System Architecture”, John. P. Hayes. 2. “Computer Architecture and parallel Processing “, Hwang K. Briggs. 3. “Computer System Architecture”, M.Morris Mano.

101 Hill Avenue Carnegie PA 15106-3006 412-276-9233 chabassol@verizon.net Vice President Robert A. Milisits 6382 Morrowfield Avenue Pittsburgh PA 15217-2505 . versity of Pittsburgh Chess Club & Organized by the Pittsburgh Chess League. December 7. 49th Annual Pittsburgh Chess League : Round 4. 30/90, SD/60. Assembly Room, Main Floor , William .

0003e00063000000 17th ward - pittsburgh 425 e carson st pittsburgh pa 15203 5/28/2019 6/5/2019 . pittsburgh 30 -62 s 6th st pittsburgh pa 15203 1/28/2019 2/6/2019 . pittsburgh 11 5th

Printed for the Patent Law Association of Pitts burgh by and with ttie compliments of Pittsburgh Printing Company, 580-584 Fmmando Stareet. THE PATENT BAR OF PITTSBURGH BAYAKI) H. CHRISTY When Mr. Stebbins asked me to prepare an his torical paper, on the Patent Bar of Pittsburgh, my first thought was that compliance should he easy. Remember

CS/COE1541: Introduction to Computer Architecture Pipelining Sangyeun Cho Computer Science Department University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh 2 Five instruction execution steps Instruction fetch Instruction decode and register read Ex

Classical approach to management is a set of homogeneous ideas on the management of organizations that evolved in the late 19 th century and early 20 century. This perspective emerges from the industrial revolution and centers on theories of efficiency. As at the end of the 19th century, when factory production became pervasive and large scale organizations raised, people have been looking for .