Single-Cycle Processors: Datapath & Control

2y ago
128 Views
2 Downloads
219.44 KB
34 Pages
Last View : 6d ago
Last Download : 2m ago
Upload by : Camille Dion
Transcription

1Single-Cycle Processors:Datapath & ControlArvindComputer Science & Artificial Intelligence LabM.I.T.Based on the material prepared byArvind and Krste Asanovic

Instruction Set Architecture (ISA)versus Implementation6.823 L5- 2Arvind ISA is the hardware/software interface– Defines set of programmer visible state– Defines instruction format (bit encoding) and instructionsemantics– Examples: MIPS, x86, IBM 360, JVM Many possible implementations of one ISA– 360 implementations: model 30 (c. 1964), z900 (c. 2001)– x86 implementations: 8086 (c. 1978), 80186, 286, 386, 486,Pentium, Pentium Pro, Pentium-4 (c. 2000), AMD Athlon,Transmeta Crusoe, SoftPC– MIPS implementations: R2000, R4000, R10000, .– JVM: HotSpot, PicoJava, ARM Jazelle, .September 26, 2005

6.823 L5- 3ArvindProcessor PerformanceTimeProgram InstructionsProgram *CyclesInstruction*TimeCycle– Instructions per program depends on source code, compilertechnology, and ISA– Cycles per instructions (CPI) depends upon the ISA and themicroarchitecture– Time per cycle depends upon the microarchitecture and thebase technologythis lectureSeptember 26, 2005MicroarchitectureCPIcycle timeMicrocoded 1shortSingle-cycle unpipelined1longPipelined1short

6.823 L5- 4ArvindMicroarchitecture:statuslinesImplementation of an ISAControllercontrolpointsDatapathStructure: How components are connected.StaticBehavior: How data moves between componentsDynamicSeptember 26, 2005

Hardware Elements Combinational circuitsOpSelect– Mux, Demux, Decoder, ALU, oderSel- Add, Sub, .- And, Or, Xor, Not, .- GT, LT, EQ, Zero, .O0O1AOn-1BResultALUComp?1 Synchronous state elements– Flipflop, Register, Register file, SRAM, DRAMDEnClkffQregisterClkEnDQEnClkD1D2ffffff .Q0Q1Q2Edge-triggered: Data is sampled at the rising edgeSeptember 26, 2005.D0.Dn-1ffQn-1

6.823 L5- 6ArvindRegister FilesClock WEReadSel1ReadSel2WriteSelWriteDataws clkrs1rs2wswdweRegisterfile2R 1WReadData1ReadData2rd1rd2rs1wd3255register 1 we register 032rd132rs23232 register 31532rd2 No timing issues in reading a selected register Register files with a large number of ports are difficultto design– Intel’s Itanium, GPR File has 128 registers with 8 read ports and4 write ports!!!September 26, 2005

6.823 L5- 7ArvindA Simple Memory ataReads and writes are always completed in one cycle a Read can be done any time (i.e. combinational) a Write is performed at the rising clock edgeif it is enabled the write address and datamust be stable at the clock edgeLater in the course we will present a more realisticmodel of memorySeptember 26, 2005

6.823 L5- 8ArvindImplementing MIPS:Single-cycle per instructiondatapath & control logicSeptember 26, 2005

6.823 L5- 9ArvindThe MIPS ISAProcessor State32 32-bit GPRs, R0 always contains a 032 single precision FPRs, may also be viewed as16 double precision FPRsFP status register, used for FP compares & exceptionsPC, the program countersome other special registersData types8-bit byte, 16-bit half word32-bit word for integers32-bit word for single precision floating point64-bit word for double precision floating pointLoad/Store style instruction setdata addressing modes- immediate & indexedbranch addressing modes- PC relative & register indirectByte addressable memory- big endian modeAll instructions are 32 bitsSeptember 26, 2005

6.823 L5- 10ArvindInstruction ExecutionExecution of an instruction involves1.2.3.4.5.instruction fetchdecode and register fetchALU operationmemory operation (optional)write backand the computation of the address of thenext instructionSeptember 26, 2005

6.823 L5- 11ArvindDatapath: Reg-Reg ALU InstructionsRegWrite0x4clkAddinst 25:21 inst 20:16 PCclkaddrinst 15:11 instInst.Memorywers1rs2rd1wswd rd2ALUzGPRsinst 5:0 ALUControlOpCode603126 25September 26, 20055rs5rt21 205rd16 155011RegWrite Timing?6func5rd (rs) func (rt)0

6.823 L5- 12ArvindDatapath: Reg-Imm ALU InstructionsRegWrite0x4clkAddinst 25:21 PCclkaddrinst 20:16 instInst.Memorywers1rs2rd1wswd rd2GPRsinst 15:0 OpCode3126 25September 26, 20055rs5rt2120ALUControlExtSel16immediate16 15zImmExtinst 31:26 6opcodeALUrt (rs) op immediate0

6.823 L5- 13ArvindConflicts in Merging DatapathRegWrite0x4Addwers1rs2rd1wswd rd2inst 25:21 PCclkaddrinst 20:16 inst 15:11 instInst.Memoryinst 15:0 September 26, LUGPRsinst 31:26 inst 5:0 60Introducemuxesclk5rd50immediate6funcrd (rs) func (rt)rt (rs) op immediate

6.823 L5- 14ArvindDatapath for ALU swd rd2 25:21 20:16 inst 15:11 Inst.MemoryGPRs 15:0 60opcodeSeptember 26, 20055rs5rtrsrtzImmExt 31:26 , 5:0 OpCodeALURegDstrt / rd5rdALUControlExtSel50immediateOpSel6funcBSrcReg / Immrd (rs) func (rt)rt (rs) op immediate

6.823 L5- 15ArvindDatapath for Memory InstructionsShould program and data memory be separate?Harvard style: separate (Aiken and Mark 1 influence)- read-only program memory- read/write data memoryat some level the two memories haveto be the samePrinceton style: the same (von Neumann’s influence)- A Load or Store instruction requiresaccessing the memory more than onceduring its executionSeptember 26, 2005

6.823 L5- 16ArvindLoad/Store Instructions:Harvard DatapathRegWrite0x4wers1rs2rd1wswd rd2“base”addrinstInst.MemoryclkWBSrcALU / tdispweaddrwdataALUControlOpCode RegDst6opcode3126 255rsExtSel5rt21 20OpSelBSrc16displacement16 15addressing mode(rs) displacement0rs is the base registerrt is the destination of a Load or the source for a StoreSeptember 26, 2005

6.823 L5- 17ArvindMIPS Control InstructionsConditional (on GPR) PC-relative branch6opcode5rs516offsetBEQZ, BNEZUnconditional register-indirect jumps6opcode5rs516JR, JALRUnconditional absolute jumps6opcode26targetJ, JAL PC-relative branches add offset 4 to PC 4 to calculate thetarget address (offset is in words): 128 KB range Absolute jumps append target 4 to PC 31:28 to calculatethe target address: 256 MB range jump-&-link stores PC 4 into the link register (R31) All Control Transfers are delayed by 1 instructionwe will worry about the branch delay slot laterSeptember 26, 2005

6.823 L5- 18ArvindConditional Branches (BEQZ, BNEZ)PCSrcbrMemWriteRegWritepc 40x4AddAddclkPCclkaddrwers1rs2rd1wswd ontrolOpCode RegDstSeptember 26, 2005ExtSelrdataDataMemoryOpSelBSrczero?WBSrc

6.823 L5- 19ArvindRegister-Indirect Jumps (JR)PCSrcbrrindRegWriteMemWritepc 40x4AddAddclkPCclkaddrwers1rs2rd1wswd ontrolOpCode RegDstSeptember 26, 2005ExtSelrdataDataMemoryOpSelBSrczero?WBSrc

6.823 L5- 20ArvindRegister-Indirect Jump-&-Link (JALR)PCSrcbrrindRegWriteMemWritepc wswd rd2clkweaddrALUGPRszImmExtwdataALUControlOpCode RegDstSeptember 26, 2005ExtSelrdataDataMemoryOpSelBSrczero?WBSrc

6.823 L5- 21ArvindAbsolute Jumps (J, JAL)PCSrcbrrindjabspc emorywers1rs2rd1wswd rd2clkweaddrALUGPRszImmExtwdataALUControlOpCode RegDstSeptember 26, 2005ExtSelrdataDataMemoryOpSelBSrczero?WBSrc

6.823 L5- 22ArvindHarvard-Style Datapath for MIPSPCSrcbrrindjabspc emorywers1rs2rd1wswd rd2clkweaddrALUGPRszImmExtwdataALUControlOpCode RegDstSeptember 26, 2005ExtSelrdataDataMemoryOpSelBSrczero?WBSrc

23Five-minute break to stretch your legs

Single-Cycle Hardwired Control:6.823 L5- 24ArvindHarvard architectureWe will assume clock period is sufficiently long for all ofthe following steps to be “completed”:1.2.3.4.5.instruction fetchdecode and register fetchALU operationdata fetch if requiredregister write-back setup time tC tIFetch tRFetch tALU tDMem tRWB At the rising edge of the following clock, the PC,the register file and the memory are updatedSeptember 26, 2005

6.823 L5- 25ArvindHardwired Control is pureCombinational LogicExtSelBSrcop tRegWritePCSrcSeptember 26, 2005

6.823 L5- 26ArvindALU Control & Immediate ExtensionInst 5:0 (Func)Inst 31:26 (Opcode)ALUop 0?OpSel( Func, Op, , 0? )Decode MapExtSel( sExt16, uExt16,High16)September 26, 2005

6.823 L5- 27ArvindHardwired Control pOp nonononoyesyesyesyesyesnoALUALUALUMem*rdrtrtrt*pc 4pc 4pc 4pc 4pc 4BEQZz 0sExt16*0?nono**brBEQZz 1sExt16*****nonononono****pc iALUiuLWJJALJRJALRBSrc Reg / ImmRegDst rt / rd / R31September 26, 2005nonoWBSrc ALU / Mem / PCPCSrc pc 4 / br / rind / jabs

6.823 L5- 28ArvindPipelined MIPSTo pipeline MIPS: First build MIPS without pipelining with CPI 1 Next, add pipeline registers to reduce cycletime while maintaining CPI 1September 26, 2005

6.823 L5- 29ArvindPipelined swd chdecode & seClock period can be reduced by dividing the execution of aninstruction into multiple cyclestC max {tIM, tRF, tALU, tDM, tRW} ( tDM probably)However, CPI will increase unless instructions are pipelinedSeptember 26, 2005

6.823 L5- 30ArvindAn Ideal Pipelinestage1stage2stage3stage4 All objects go through the same stages No sharing of resources between any two stages Propagation delay through all pipeline stages is equal The scheduling of an object entering the pipelineis not affected by the objects in other stagesThese conditions generally hold for industrialassembly lines.But can an instruction pipeline satisfy the lastcondition?September 26, 2005

How to divide the datapathinto stages6.823 L5- 31ArvindSuppose memory is significantly slower thanother stages. In particular, supposetIMtDMtALUtRFtRW 10 units10 units5 units1 unit1 unitSince the slowest stage determines the clock, itmay be possible to combine some stages withoutany loss of performanceSeptember 26, 2005

6.823 L5- 32ArvindAlternative wers1rs2rd1wswd rd2GPRsALUweaddrrdataDataMemoryImmExtwdatadecode & Reg-fetchphaseexecutephasememoryphasetCC max {tIM, tRF, t ALU,, ttDM,, ttRW}}RWIMRF tALUDM tRW} ttDMDMDM tRWwrite-backphase increase the critical path by 10%Write-back stage takes much less time than other stages.Suppose we combined it with the memory phaseSeptember 26, 2005

6.823 L5- 33ArvindMaximum Speedup by PipeliningAssumptionsUnpipelinedPipelined Speedup1. tIM tDM 10,tALU 5,tRF tRW 14-stage pipelinetCtC27102.72. tIM tDM tALU tRF tRW 54-stage pipeline25102.53. tIM tDM tALU tRF tRW 55-stage pipeline2555.0It is possible to achieve higher speedup with morestages in the pipeline.September 26, 2005

34Thank you !

Instruction Set Architecture (ISA) Arvind versus Implementation ISA is the hardware/software interface – Defines set of programmer visible state – Defines instruction format (bit encoding) and instruction semantics –Examples: MIPS, x86

Related Documents:

smaller in SMALL_REG and the bigger in BIG_REG. Given on below is a complete data path. Notice that you can bring either P or Q on bus #1 (B_ONE) or bus #2 (B_TWO). SMALL_REG is tied only to B_ONE where as BIG_REG is tied only to B_TWO. 1.1 Datapath EE101 Homework on Datapath Design (based on ee201l_hw_8) Instructor: G. Puvvada Datapath Design

Processor Datapath Control Components of the processor that Component of the processor that perform arithmetic operations and holds commands the datapath, memory, data I/O devices according

Implement the datapath for a subset of the MIPS instruction set architecture described in the textbook using Logisim. ! Files to Use datapath.circ!, control.circ, cpu32.circ, misc32.circ, and loop.mem. Acknowledgments: This assignment is an adopted version of an assignment constructed by Thomas M. Parks and Chris Nevison at Colgate University.

Pentium Pro Processors 97 Pentium II Processors 97 Pentium III 99 Celeron 100 Intel Pentium 4 Processors 101 Pentium 4 Extreme Edition 104 Intel Pentium D and Pentium Extreme Edition 106 Intel Core Processors 108 Intel Core 2 Family 108 Intel (Nehalem) Core i Processors 110

Autumn 2010 CSE370 - XIX - Computer Organization 1 Computer organization Computer design – an application of digital logic design procedures Computer processing unit memory system Processing unit control datapath Control finite state machine inputs machine instruction, datapath conditions outputs register transfer control signals, ALU operation codes

The five classic components of a computer are input, output, memory, datapath, and control , with the last two (datapath

Technology transformation of processors is clear from the table, the processors changes from 4 bit to 64 bit, from 2 MHz to 3.6 GHz with Turbo 4.0GHz, from single physical core to multi core (2,4,6,8) and 10 core and 12 core processors are in pipeline and above all the manufacturing technology is changed from 6µm to 22 nm.

Chapter 4: Datapath Components . ld a