CPU Performance Evaluation: Cycles Per Instruction (CPI)

2y ago
34 Views
2 Downloads
768.56 KB
45 Pages
Last View : 15d ago
Last Download : 2m ago
Upload by : Ronan Orellana
Transcription

CPU Performance Evaluation:Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running ata constant clock rate: Or clock frequency: fClock cyclewhere:Clock rate 1 / clock cyclef 1 /Ccycle 1cycle 2cycle 3The CPU clock rate depends on the specific CPU organization (design) andhardware implementation technology (VLSI) used.A computer machine (ISA) instruction is comprised of a number of elementaryor micro operations which vary in number and complexity depending on thethe instruction and the exact CPU organization (Design).– A micro operation is an elementary hardware operation that can beperformed during one CPU clock cycle.– This corresponds to one micro-instruction in microprogrammed CPUs.– Examples: register operations: shift, load, clear, increment, ALUoperations: add , subtract, etc. Thus: A single machine instruction may take one or more CPU cycles tocomplete termed as the Cycles Per Instruction (CPI).Instructions Per Cycle IPC 1/CPI Average (or effective) CPI of a program: The average CPI of all instructionsexecuted in the program on a given CPU design.4th Edition: Chapter 1 (1.4, 1.7, 1.8)3rd Edition: Chapter 4Cycles/sec Hertz HzMHz 106 Hz GHz 109 HzEECC550 - Shaaban#1 Lec # 3 Winter 2010 12-7-2010

Generic CPU Machine Instruction Processing StepsInstructionObtain instruction from program memoryFetchThe Program Counter (PC) points to the instruction to be processedInstructionDecodeOperandFetchDetermine required actions and instruction sizeLocate and obtain operand dataFrom data memory or registersExecuteResultCompute result value or statusStoreDeposit results in storage (data memory orregister) for later useNextDetermine successor or next instructionInstruction(i.e Update PC to fetch next instruction to be processed)CPI Cycles per instructionEECC550 - Shaaban#2 Lec # 3 Winter 2010 12-7-2010

Computer Performance Measures:Program Execution Time For a specific program compiled to run on a specific machine(CPU) “A”, has the following parameters:– The total executed instruction count of the program. I– The average number of cycles per instruction (average CPI). CPI– Clock cycle of machine “A” COr effective CPI How can one measure the performance of this machine (CPU) runningthis program?– Intuitively the machine (or CPU) is said to be faster or has betterperformance running this program if the total execution time isshorter.– Thus the inverse of the total measured program execution time isa possible performance measure or metric:Seconds/programPrograms/secondPerformanceA 1 / Execution TimeAHow to compare performance of different machines?What factors affect performance? How to improve performance?EECC550 - Shaaban#3 Lec # 3 Winter 2010 12-7-2010

Comparing Computer Performance Using Execution Time To compare the performance of two machines (or CPUs) “A”, “B”running a given specific program:PerformanceA 1 / Execution TimeAPerformanceB 1 / Execution TimeB Machine A is n times faster than machine B means (or slower? if n 1) :Speedup n PerformanceAPerformanceBExample:For a given program:Execution time on machine A:Execution time on machine B:Speedup Performance / PerformanceAB Execution TimeBExecution TimeA(i.e Speedup is ratio of performance, no units)ExecutionA 1 secondExecutionB 10 seconds Execution TimeB / Execution TimeA 10 / 1 10The performance of machine A is 10 times the performance ofmachine B when running this program, or: Machine A is said to be 10times faster than machine B when running this program.The two CPUs may target different ISAs providedthe program is written in a high level language (HLL)EECC550 - Shaaban#4 Lec # 3 Winter 2010 12-7-2010

CPU Execution Time: The CPU Equation A program is comprised of a number of instructions executed , I– Measured in:instructions/program The average instruction executed takes a number of cycles perinstruction (CPI) to be completed.Or Instructions Per Cycle (IPC):– Measured in: cycles/instruction, CPIIPC 1/CPI CPU has a fixed clock cycle time C 1/clock rate– Measured in:seconds/cycleC 1/f CPU execution time is the product of the above threeparameters as follows: ExecutedCPUCPUtimetime SecondsSecondsProgramProgramT execution Timeper program in seconds Instructionsxx SecondsInstructions xx nstructionCycleI xNumber ofinstructions executedCPIxAverage CPI for program(This equation is commonly known as the CPU performance equation)CCPU Clock CycleEECC550 - Shaaban#5 Lec # 3 Winter 2010 12-7-2010

CPU Average CPI/Execution TimeFor a given program executed on a given machine (CPU):CPI Total program execution cycles / Instructions countExecuted(i.e average or effective CPI) CPU clock cycles Instruction count x CPICPU execution time T(executed) CPU clock cycles x Clock cycle Instruction count x CPI x Clock cycle Ix CPI xCexecution Timeper program in secondsNumber ofinstructions executedAverageor effectiveCPI forprogramCPU Clock Cycle(This equation is commonly known as the CPU performance equation)CPI Cycles Per InstructionEECC550 - Shaaban#6 Lec # 3 Winter 2010 12-7-2010

CPU Execution Time: Example A Program is running on a specific machine (CPU) withIthe following parameters:– Total executed instruction count: 10,000,000 instructions– Average CPI for the program: 2.5 cycles/instruction.– CPU clock rate: 200 MHz. (clock cycle C 5x10-9 seconds) What is the execution time for this program:CPUCPUtimetime SecondsSecondsProgramProgrami.e 5 nanoseconds Instructionsxx SecondsInstructions xx nstructionCycleCPU time Instruction count x CPI x Clock cycle 10,000,000x 2.5 x 1 / clock rate 10,000,000x 2.5 x 5x10-9 0.125 secondsNanosecond nsec ns 10-9 secondMHz 106 HzT I x CPI x CEECC550 - Shaaban#7 Lec # 3 Winter 2010 12-7-2010

Factors Affecting CPU PerformanceCPUCPUtimetime SecondsSecondsProgramProgramT Instructionsxx SecondsInstructions xx nstructionCycleIx CPIxCInstruction Cycles perClock n SetArchitecture (ISA)Organization(CPU Design)Technology(VLSI)T I x CPI x CEECC550 - Shaaban#8 Lec # 3 Winter 2010 12-7-2010

Aspects of CPU Execution TimeCPU Time Instruction count executed x CPI x Clock cycleT I x CPI x CDepends on:Program UsedCompilerISAInstruction Count I(executed)Depends on:Program UsedCompilerISACPU OrganizationCPI(AverageCPI)ClockCycleCDepends on:CPU OrganizationTechnology (VLSI)EECC550 - Shaaban#9 Lec # 3 Winter 2010 12-7-2010

Performance Comparison: Example From the previous example: A Program is running on a specificmachine (CPU) with the following parameters:– Total executed instruction count, I: 10,000,000 instructions– Average CPI for the program: 2.5 cycles/instruction.– CPU clock rate: 200 MHz. Thus: C 1/(200x10 ) 5x10 secondsUsing the same program with these changes:– A new compiler used: New executed instruction count, I: 9,500,000New CPI: 3.0– Faster CPU implementation: New clock rate 300 MHzThus: C 1/(300x10 ) 3.33x10 secondsWhat is the speedup with the changes?6 -96SpeedupSpeedup OldOldExecutionExecutionTimeTime IoldIold xx CPICPIoldoldNewExecutionTimeIxCPINew Execution Time newInew x CPInewnew-9xx ClockClockcyclecycleoldoldxx ClockCycleClock CyclenewnewSpeedup (10,000,000 x 2.5 x 5x10-9) / (9,500,000 x 3 x 3.33x10-9 ) .125 / .095 1.32or 32 % faster after changes.Clock Cycle C 1/ Clock RateT I x CPI x CEECC550 - Shaaban#10 Lec # 3 Winter 2010 12-7-2010

Instruction Types & CPI Given a program with n types or classes of instructions executed ona given CPU with the following characteristics:e.g ALU, Branch etc.Ci Count of instructions of typei executedCPIi Cycles per instruction for typeiThen:i 1, 2, . nDepends on CPU DesignCPI CPU Clock Cycles / Instruction Count Ii.e average or effective CPIExecutedWhere:CPU clock cycles (CPI C )ni 1iiExecuted Instruction Count I Σ CiT I x CPI x CEECC550 - Shaaban#11 Lec # 3 Winter 2010 12-7-2010

Instruction Types & CPI: An Example An instruction set has three instruction classes:e.g ALU, Branch etc.Instruction classABCCPI123For a specificCPU design Two code sequences have the following instruction counts:Instruction counts for instruction classABC212411ProgramCode Sequence12 CPU cycles for sequence 1 2 x 1 1 x 2 2 x 3 10 cyclesCPI for sequence 1 clock cycles / instruction counti.e average or effective CPI 10 /5 2 CPU cycles for sequence 2 4 x 1 1 x 2 1 x 3 9 cyclesCPI for sequence 2 9 / 6 1.5nCPU clock cycles i 1(CPI C )iiCPI CPU Cycles / IEECC550 - Shaaban#12 Lec # 3 Winter 2010 12-7-2010

Instruction Frequency & CPI Given a program with n types or classes ofinstructions with the following characteristics:i 1, 2, . nCi Count of instructions of typei executedCPIi Average cycles per instruction of typeiFi Frequency or fraction of instruction typei executed Ci/ total executed instruction count Ci/ IWhere: Executed Instruction Count I Σ CThen:iCPI (CPI i F i )ni 1i.e average or effective CPIFraction of total execution time for instructions of type i T I x CPI x CCPIi x FiCPIEECC550 - Shaaban#13 Lec # 3 Winter 2010 12-7-2010

Instruction Type Frequency & CPI:A RISC ExampleCPIi x FiProgram Profile or Executed Instructions MixGivenBase Machine (Reg / Reg)OpFreq, Fi CPIiALU50%1Load20%5Store10%3Branch20%2Typical MixnCPI i.e average or effective CPI (CPIi 1CPIDepends on CPU DesignCPIi x Fi.51.0.3.4% Time23% .5/2.245% 1/2.214% .3/2.218% .4/2.2Sum 2.2i F )iCPI .5 x 1 .2 x 5 .1 x 3 .2 x 2 2.2 .5 1 .3 .4T I x CPI x CEECC550 - Shaaban#14 Lec # 3 Winter 2010 12-7-2010

Metrics of Computer Performance(Measures)Execution time: Target workload,SPEC, ions) of Instructions per second – MIPS(millions) of (F.P.) operations per second – MFLOP/sDatapathControlMegabytes per second.Function UnitsTransistors Wires PinsCycles per second (clock rate).Each metric has a purpose, and each can be misused.EECC550 - Shaaban#15 Lec # 3 Winter 2010 12-7-2010

Choosing Programs To Evaluate PerformanceLevels of programs or benchmarks that could be used to evaluateperformance:– Actual Target Workload: Full applications that run on thetarget machine.– Real Full Program-based Benchmarks: Select a specific mix or suite of programs that are typical oftargeted applications or workload (e.g SPEC95, SPEC CPU2000).– Small “Kernel” Benchmarks:Also called synthetic benchmarks Key computationally-intensive pieces extracted from real programs.– Examples: Matrix factorization, FFT, tree search, etc. Best used to test specific aspects of the machine.– Microbenchmarks: Small, specially written programs to isolate a specific aspect ofperformance characteristics: Processing: integer, floating point,local memory, input/output, etc.EECC550 - Shaaban#16 Lec # 3 Winter 2010 12-7-2010

Types of BenchmarksPros Representative Portable. Widely used. Measurementsuseful in reality.Actual Target WorkloadFull Application Benchmarks Easy to run, early inthe design cycle. Identify peakperformance andpotential bottlenecks.Small “Kernel”BenchmarksMicrobenchmarksCons Very specific. Non-portable. Complex: Difficultto run, or measure. Less representativethan actual workload. Easy to “fool” bydesigning hardwareto run them well. Peak performanceresults may be a longway from real applicationperformanceEECC550 - Shaaban#17 Lec # 3 Winter 2010 12-7-2010

SPEC: System Performance Evaluation CorporationThe most popular and industry-standard set of CPU benchmarks. SPECmarks, 1989:– Programs application domain: Engineering and scientific computation10 programs yielding a single number (“SPECmarks”).SPEC92, 1992:– SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs). SPEC95, 1995:–SPECint95 (8 integer programs): go, m88ksim, gcc, compress, li, ijpeg, perl, vortex–SPECfp95 (10 floating-point intensive programs): tomcatv, swim, su2cor, hydro2d, mgrid, applu, turb3d, apsi, fppp, wave5– Performance relative to a Sun SuperSpark I (50 MHz) which is given a score of SPECint95 SPECfp95 1SPEC CPU2000, 1999:–CINT2000 (11 integer programs). CFP2000 (14 floating-point intensive programs)–Performance relative to a Sun Ultra5 10 (300 MHz) which is given a score of SPECint2000 SPECfp2000 100SPEC CPU2006, 2006:–CINT2006 (12 integer programs). CFP2006 (17 floating-point intensive programs)–Performance relative to a Sun Ultra Enterprise 2 workstation with a 296-MHzUltraSPARC II processor which is given a score of SPECint2006 SPECfp2006 1All based on execution time and give speedup over a reference CPUEECC550 - Shaaban#18 Lec # 3 Winter 2010 12-7-2010

SPEC95 ProgramsPrograms application domain: Engineering and scientific al intelligence; plays the game of GoMotorola 88k chip simulator; runs test programThe Gnu C compiler generating SPARC codeCompresses and decompresses file in memoryLisp interpreterGraphic compression and decompressionManipulates strings and prime numbers in the special-purpose programming language PerlA database programA mesh generation programShallow water model with 513 x 513 gridquantum physics; Monte Carlo simulationAstrophysics; Hydrodynamic Naiver Stokes equationsMultigrid solver in 3-D potential fieldParabolic/elliptic partial differential equationsSimulates isotropic, homogeneous turbulence in a cubeSolves problems regarding temperature, wind velocity, and distribution of pollutantQuantum chemistryPlasma physics; electromagnetic particle simulationResulting Performance relative to a Sun SuperSpark I (50 MHz) which is given a score of SPECint95 SPECfp95 1EECC550 - Shaaban#19 Lec # 3 Winter 2010 12-7-2010

Sample SPECint95 (Integer) ResultsSource URL: http://www.macinfo.de/bench/specmark.htmlSun SuperSpark I (50 MHz) score 1T I x CPI x CEECC550 - Shaaban#20 Lec # 3 Winter 2010 12-7-2010

Sample SPECfp95 (Floating Point) ResultsSource URL: http://www.macinfo.de/bench/specmark.htmlSun SuperSpark I (50 MHz) score 1T I x CPI x CEECC550 - Shaaban#21 Lec # 3 Winter 2010 12-7-2010

SPEC CPU2000 ProgramsCINT2000(Integer)11 programsCFP2000(FloatingPoint)14 lbmk254.gap255.vortex256.bzip2300.twolfCCCCCCC CCCCCCompressionFPGA Circuit Placement and RoutingC Programming Language CompilerCombinatorial OptimizationGame Playing: ChessWord ProcessingComputer VisualizationPERL Programming LanguageGroup Theory, InterpreterObject-oriented DatabaseCompressionPlace and Route 189.lucas191.fma3d200.sixtrack301.apsiFortran 77Fortran 77Fortran 77Fortran 77CFortran 90CCFortran 90CFortran 90Fortran 90Fortran 77Fortran 77Physics / Quantum ChromodynamicsShallow Water ModelingMulti-grid Solver: 3D Potential FieldParabolic / Elliptic Partial Differential Equations3-D Graphics LibraryComputational Fluid DynamicsImage Recognition / Neural NetworksSeismic Wave Propagation SimulationImage Processing: Face RecognitionComputational ChemistryNumber Theory / Primality TestingFinite-element Crash SimulationHigh Energy Nuclear Physics Accelerator DesignMeteorology: Pollutant DistributionPrograms application domain: Engineering and scientific 550 - Shaaban#22 Lec # 3 Winter 2010 12-7-2010

Integer SPEC CPU2000 MicroprocessorPerformance 1978-2006Performance relative to VAX 11/780 (given a score 1)EECC550 - Shaaban#23 Lec # 3 Winter 2010 12-7-2010

Top 20 SPEC CPU2000 Results (As of March 2002)Top 20 SPECint2000Top 20 SPECfp2000#MHzProcessorint peakint 0800400POWER4Pentium 4Pentium 4 XeonAthlon XPAlpha 21264CPentium IIIUltraSPARC-III CuAthlon MPPA-RISC 8700Alpha 21264BAthlonAlpha 21264AMIPS R14000SPARC64 GPUltraSPARC-IIIPA-RISC 8600POWER RS64-IVPentium III XeonItaniumMIPS ssorPOWER4Alpha 21264CUltraSPARC-III CuPentium 4 XeonPentium 4Alpha 21264BItaniumAlpha 21264AAthlon XPPA-RISC 8700Athlon MPMIPS R14000SPARC64 GPUltraSPARC-IIIAthlonPentium IIIPA-RISC 8600POWER3-IIAlpha 21264MIPS R12000Performance relative to a Sun Ultra5 10 (300 MHz) which is given a score of SPECint2000 SPECfp2000 100Source: http://www.aceshardware.com/SPECmine/top.jspfp 456440433422407fp 437397426383382EECC550 - Shaaban#24 Lec # 3 Winter 2010 12-7-2010

Top 20 SPEC CPU2000 Results (As of October 2006)Top 20 SPECint2000# MHz Processorint 600200021601600Core 2 Duo EEXeon 51xxCore 2 DuoXeon 30xxOpteronAthlon 64 FXOpteron AM2POWER5 Pentium 4 EPentium 4 XeonPentium MPentium DCore DuoPentium 4Pentium 4 EEPowerPC 970MPAthlon 64Pentium 4 Xeon LVSPARC64 VItanium 8041774177217061706166816201590int 1017961772170116231612166315011590Top 20 SPECfp2000MHz Processorfp peak fp 0027002160373036003600260017003466Performance relative to a Sun Ultra5 10 (300 MHz) which is given a score of SPECint2000 SPECfp2000 100Source: http://www.aceshardware.com/SPECmine/top.jspPOWER5 DC Itanium 2Xeon 51xxCore 2 Duo EEXeon 30xxItanium 2Core 2 DuoPOWER5OpteronOpteron AM2Pentium 4 EAthlon 64 FXPowerPC 970MPSPARC64 VPentium 4 XeonPentium DPentium 4Athlon 64POWER4 Pentium 4 170016421719EECC550 - Shaaban#25 Lec # 3 Winter 2010 12-7-2010

SPEC CPU2006 ProgramsCINT2006(Integer)12 programsCFP2006(FloatingPoint)17 nto470.lbm481.wrf482.sphinx3CCCCCCCCCC C C FortranFortranCFortranC/FortranC/FortranFortranC C C C C/FortranFortranFortranCC/FortranCPERL Programming LanguageCompressionC CompilerCombinatorial OptimizationArtificial Intelligence: goSearch Gene SequenceArtificial Intelligence: chessPhysics: Quantum ComputingVideo CompressionDiscrete Event SimulationPath-finding AlgorithmsXML ProcessingFluid DynamicsQuantum ChemistryPhysics: Quantum ChromodynamicsPhysics/CFDBiochemistry/Molecular DynamicsPhysics/General RelativityFluid DynamicsBiology/Molecular DynamicsFinite Element AnalysisLinear Programming, OptimizationImage Ray-tracingStructural MechanicsComputational ElectromagneticsQuantum ChemistryFluid DynamicsWeather PredictionSpeech recognitionPrograms application domain: Engineering and scientific 550 - Shaaban#26 Lec # 3 Winter 2010 12-7-2010

Example Integer SPEC CPU2006 Performance ResultsFor 2.5 GHz AMD Opteron X4 model 2356 (Barcelona)ScoreICPICPerformance relative to Base Processor a 296-MHz UltraSPARC IIwhich is given a score of SPECint2006 SPECfp2006 1T(speedup)T on base processorEECC550 - Shaaban#27

EECC550 - Shaaban #3 Lec # 3 Winter 2010 12-7-2010 For a specific program compiled to run on a specific machine (CPU) “A”, has the following parameters: – The total executed instruction count of the program. – The average number of cycles per instruction (average CPI). – Clock cycle of machine “A” How can one measure the performance of this machine (CPU) running

Related Documents:

Adaptive MPI multirail tuning for non-uniform input/output access. EuroMPI'10. CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU . F. Broquedis et al., HWLOC : A generic framework for managing hardware affinities in HPC applications. PDP '10. (2) D. Callahan, et al., Compiling Programs for Distributed Memory Multiprocessors.The .

CPU 315-2 PN/DP 6ES7315-2EH13-0AB0 V2.6 CPU 317-2 DP 6ES7317-2AJ10-0AB0 V2.6 CPU 317-2 PN/DP 6ES7317-2EK13-0AB0 V2.6 CPU 319-3 PN/DP CPU 31x 6ES7318-3EL00-0AB0 V2.7 . SIMATIC S7-300 CPU 31xC and CPU 31x: Specifications CPU 31xC and CPU 31x: Specifications 4 Manual .

CPU 315-2 DP 6ES7315-2AG10-0AB0 V2.0.0 01 CPU 315-2 PN/DP 6ES7315-2EG10-0AB0 V2.3.0 01 CPU 317-2 DP 6ES7317-2AJ10-0AB0 V2.1.0 01 CPU 317-2 PN/DP CPU 31x 6ES7317-2EJ10-0AB0 V2.3.0 01 Note The special features of the CPU 315F-2 DP (6ES7 315-6FF00-0AB0) and CPU 317F-2 DP (6ES7 317-6FF00-0AB0) are described in their Product Information,

79 85 91 97 3 9 5 GPU r) U r (W) e) ex r A15 r rVR 4 U L2 Cache DRAM Cortex-A15 Quad CPU 0 CPU 1 CPU 2 CPU 3 L2 Cache PowerVR SGX544 GPU Cortex-A7 Quad CPU 0 CPU 1 CPU 2 CPU 3 Multi-layer BUS Figure 1: Exynos 5 Octa SoC simplified block diagram. However, 3D games are highly demanding of computational re-sources as well as memory bandwidth on .

chassis-000 0839QCJ01A ok Sun Microsystems, Inc. Sun Storage 7410 cpu-000 CPU 0 ok AMD Quad-Core AMD Op cpu-001 CPU 1 ok AMD Quad-Core AMD Op cpu-002 CPU 2 ok AMD Quad-Core AMD Op cpu-003 CPU 3 ok AMD Quad-Core AMD Op disk-000 HDD 0 ok STEC MACH8 IOPS disk-001 HDD 1 ok STEC MACH8 IOPS disk-002 HDD 2 absent - - disk-003 HDD 3 absent - -

In the case of the FX-8150 CPU, the default, base-level CPU multiplier is x18 (18x200MHz 3600MHz). CPU Multiplier can be adjusted on the fly with AMD OverDrive utility in st eps of 0.5x. CPU Multiplier is unlocked on all of the AMD FX-series CPUs. CPU NB FID: CPU NB Clock Multiplier. De

iii PLC S7-300, CPU Specifications CPU 312 IFM to CPU 318-2 DP A5E00111190-01 Preface Purpose of the Manual This manual gives yo

-ANSI A300 (Part 4)-2002 Lightening Protection Systems Tree Selection (Chapter 6) Tree Planting (Chapter 8 and 9) - ANSI A300 (Part 6)-2005 Transplanting Water Management (Chapter 13) Nutrient Management (Chapter 12) -ANSI A300 (Part 2)-1998 Fertilization Introduction to the "ANSI Z133.1-2000 Pruning, Repairing, Maintaining, and Removing Trees and Cutting Brush-Safety Requirements" Pruning .