Single-Cycle CPU Datapath Design

2y ago
12 Views
2 Downloads
342.61 KB
26 Pages
Last View : 2d ago
Last Download : 3m ago
Upload by : Ellie Forte
Transcription

Single-Cycle CPUDatapath Design"The Do-It-Yourself CPU Kit"CSE 141, S2'06Jeff Brown

The Big Picture: Where are WeNow? The Five Classic Components of a ComputerProcessorInputControlMemoryDatapathOutput Today’s Topic: Datapath Design, then Control DesignCSE 141, S2'06Jeff Brown

The Big Picture: The PerformancePerspective Processor design (datapath and control) will determine:– Clock cycle time– Clock cycles per instruction Starting today:– Single cycle processor: Advantage: One clock cycle per instruction Disadvantage: long cycle time ET Insts * CPI * Cycle TimeExecute anentire instructionCSE 141, S2'06Jeff Brown

The Processor: Datapath & Control We're ready to look at an implementation of the MIPS simplifiedto contain only:– memory-reference instructions: lw, sw– arithmetic-logical instructions: add, sub, and, or, slt– control flow instructions: beq Generic Implementation:– use the program counter (PC) to supply instruction address– get the instruction from memory– read registers– use the instruction to decide exactly what to do All instructions use the ALU after reading the registersmemory-reference? arithmetic? control flow?CSE 141, S2'06Jeff Brown

Review: The MIPS Instruction Formats All MIPS instructions are 32 bits long. The three instruction formats:R-type3126oprs6 bitsI-typeJ-type31op315 bits21rs6 bits16rt5 bits265 bits1160rdshamtfunct5 bits5 bits6 bits160immediatert5 bits16 bits26op6 bitsCSE 141, S2'06210target address26 bitsJeff Brown

The MIPS Subset R-type3126op– add rd, rs, rt– sub, and, or, slt21rs6 bits16rt5 bits5 bits1160rdshamtfunct5 bits5 bits6 bits LOAD and STORE– lw rt, rs, imm16– sw rt, rs, imm163126op21rs6 bits16rt5 bits0immediate5 bits16 bits BRANCH:– beq rs, rt, imm163126op6 bitsCSE 141, S2'0621rs5 bits16rt5 bits0displacement16 bitsJeff Brown

Where We’re Going – TheHigh-level ViewCSE 141, S2'06Jeff Brown

Review: Two Types of LogicComponentsABStateElementC f(A,B,state)clkABCSE 141, S2'06CombinationalLogicC f(A,B)Jeff Brown

Clocking MethodologyClkSetupHoldSetupHold.Don’t Care. All storage elements are clocked by the same clock edgeCSE 141, S2'06Jeff Brown

Storage Element: Register Register– Similar to the D Flip Flop except N-bit input and output Write Enable input– Write Enable:Write EnableData InN 0: Data Out will not change 1: Data Out will become Data In (on the clock edge)CSE 141, S2'06Data OutNClkJeff Brown

Storage Element: Register File Register File consists of (32) registers:– Two 32-bit output buses:– One 32-bit input bus: busW Register is selected by:– RR1 selects the register to put on bus “Read Data 1”– RR2 selects the register to put on bus “Read Data 2”– WR selects the register to be writtenWrite Datavia WriteData when RegWrite is 1 Clock input (CLK)32RR1RR2WR5RegWriteRead Data 132 32-bitRegisters32Read Data 23255ClkCSE 141, S2'06Jeff Brown

Storage Element: MemoryMemWrite MemoryWrite DataAddressRead Data– Two input buses: WriteData, Address 32Clk– One output bus: ReadData Memory word is selected by:MemRead– Address selects the word to put on ReadData bus– MemWrite 1: address selects the memory word to be written via32the WriteData bus Clock input (CLK)– The CLK input is a factor ONLY during write operation– During read operation, behaves as a combinational logic block: Address valid ReadData valid after “access time.”CSE 141, S2'06Jeff Brown

Register Transfer Language (RTL) is a mechanism for describing the movement andmanipulation of data between storage elements:R[3] - R[5] R[7]PC - PC 4 R[5]R[rd] - R[rs] R[rt]R[rt] - Mem[R[rs] immed]CSE 141, S2'06Jeff Brown

Instruction Fetch andProgram Counter ManagementCSE 141, S2'06Jeff Brown

Overview of the Instruction Fetch Unit The common RTL operations– Fetch the Instruction: inst - mem[PC]– Update the program counter: Sequential Code: PC - PC 4 Branch and Jump PC - “something else”CSE 141, S2'06Jeff Brown

Datapath for Register-Register Operations R[rd] - R[rs] op R[rt]Example: add rd, rs, rt– RR1, RR2, and WR comes from instruction’s rs, rt, and rd fields– ALUoperation and RegWrite: control logic after decoding instruction3126op6 bitsCSE 141, S2'0621rs5 bits16rt5 bits1160rdshamtfunct5 bits5 bits6 bitsJeff Brown

Datapath for Load OperationsR[rt] - Mem[R[rs] SignExt[imm16]]3126op6 bitsCSE 141, S2'0621rs5 bitsExample: lw rt, rs, imm1616rt5 bits0immediate16 bitsJeff Brown

Datapath for Store OperationsMem[R[rs] SignExt[imm16]] - R[rt]3126op6 bitsCSE 141, S2'0621rs5 bitsExample: sw rt, rs, imm1616rt5 bits0immediate16 bitsJeff Brown

Datapath for Branch OperationsZ - (rs rt); if Z, PC PC 4 imm16; else PC PC 4beq rs, rt, imm163126op6 bitsCSE 141, S2'0621rs5 bits16rt5 bits0immediate16 bitsJeff Brown

Binary Arithmetic for the Next Address In theory, the PC is a 32-bit byte address into the instruction memory:– Sequential operation: PC 31:0 PC 31:0 4– Branch operation: PC 31:0 PC 31:0 4 SignExt[Imm16] * 4 The magic number “4” always comes up because:– The 32-bit PC is a byte address– And all our instructions are 4 bytes (32 bits) long– The 2 LSBs of the 32-bit PC are always zeros– There is no reason to have hardware to keep the 2 LSBs In practice, we can simplify the hardware by using a 30-bit PC 31:2 :– Sequential operation: PC 31:2 PC 31:2 1– Branch operation: PC 31:2 PC 31:2 1 SignExt[Imm16]– In either case: Instruction Memory Address PC 31:2 concat “00”CSE 141, S2'06Jeff Brown

Putting it All Together: A Single Cycle Datapath We have everything except control signalsCSE 141, S2'06Jeff Brown

The R-Format (e.g. add) DatapathCSE 141, S2'06Jeff Brown

The Load DatapathCSE 141, S2'06Jeff Brown

The store DatapathCSE 141, S2'06Jeff Brown

The beq DatapathCSE 141, S2'06Jeff Brown

Key Points CPU is just a collection of state and combinational logic We just designed a very rich processor, at least in terms offunctionality Performance Insts * CPI * Cycle Time– where does the single-cycle machine fit in?CSE 141, S2'06Jeff Brown

The MIPS Subset R-type –add rd, rs, rt –sub, and, or, slt LOAD and STORE –lw rt, rs, imm16 –sw rt, rs, imm16 BRANCH: –beq rs, rt, imm16 op rs rt rd shamt funct 31 26 21 16 11 6 0 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits op rs rt immediate 31 26 21 16 0 6 bits 5 bits 5

Related Documents:

Adaptive MPI multirail tuning for non-uniform input/output access. EuroMPI'10. CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU . F. Broquedis et al., HWLOC : A generic framework for managing hardware affinities in HPC applications. PDP '10. (2) D. Callahan, et al., Compiling Programs for Distributed Memory Multiprocessors.The .

CPU 315-2 PN/DP 6ES7315-2EH13-0AB0 V2.6 CPU 317-2 DP 6ES7317-2AJ10-0AB0 V2.6 CPU 317-2 PN/DP 6ES7317-2EK13-0AB0 V2.6 CPU 319-3 PN/DP CPU 31x 6ES7318-3EL00-0AB0 V2.7 . SIMATIC S7-300 CPU 31xC and CPU 31x: Specifications CPU 31xC and CPU 31x: Specifications 4 Manual .

smaller in SMALL_REG and the bigger in BIG_REG. Given on below is a complete data path. Notice that you can bring either P or Q on bus #1 (B_ONE) or bus #2 (B_TWO). SMALL_REG is tied only to B_ONE where as BIG_REG is tied only to B_TWO. 1.1 Datapath EE101 Homework on Datapath Design (based on ee201l_hw_8) Instructor: G. Puvvada Datapath Design

CPU 315-2 DP 6ES7315-2AG10-0AB0 V2.0.0 01 CPU 315-2 PN/DP 6ES7315-2EG10-0AB0 V2.3.0 01 CPU 317-2 DP 6ES7317-2AJ10-0AB0 V2.1.0 01 CPU 317-2 PN/DP CPU 31x 6ES7317-2EJ10-0AB0 V2.3.0 01 Note The special features of the CPU 315F-2 DP (6ES7 315-6FF00-0AB0) and CPU 317F-2 DP (6ES7 317-6FF00-0AB0) are described in their Product Information,

79 85 91 97 3 9 5 GPU r) U r (W) e) ex r A15 r rVR 4 U L2 Cache DRAM Cortex-A15 Quad CPU 0 CPU 1 CPU 2 CPU 3 L2 Cache PowerVR SGX544 GPU Cortex-A7 Quad CPU 0 CPU 1 CPU 2 CPU 3 Multi-layer BUS Figure 1: Exynos 5 Octa SoC simplified block diagram. However, 3D games are highly demanding of computational re-sources as well as memory bandwidth on .

chassis-000 0839QCJ01A ok Sun Microsystems, Inc. Sun Storage 7410 cpu-000 CPU 0 ok AMD Quad-Core AMD Op cpu-001 CPU 1 ok AMD Quad-Core AMD Op cpu-002 CPU 2 ok AMD Quad-Core AMD Op cpu-003 CPU 3 ok AMD Quad-Core AMD Op disk-000 HDD 0 ok STEC MACH8 IOPS disk-001 HDD 1 ok STEC MACH8 IOPS disk-002 HDD 2 absent - - disk-003 HDD 3 absent - -

Processor Datapath Control Components of the processor that Component of the processor that perform arithmetic operations and holds commands the datapath, memory, data I/O devices according

Core 2 Quad Q6600 2.4 GHz, 6 GB RAM, Windows 7 64-bit, Tesla C1060, single precision operations-2.0 4.0 6.0 8.0 10.0 12.0 256 K 1,024 K 4,096 K 16,384 K eed Input Size Relative Performance, Black-Scholes Demo Compared to Single Core CPU Baseline Single Core CPU Quad Core CPU Single Core CPU Tesla C1060 Quad Core CPU Tesla C1060