MIPS: A Microprocessor Architecture

2y ago
15 Views
2 Downloads
525.97 KB
6 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Arnav Humphrey
Transcription

MIPS: A Microprocessor ArchitectureJohn Hennessy, Norman Jouppi, Steven Przybylski, Christopher Rowen,Thomas Gross, Forest Baskett, and John GillDepartments of Electrical Engineering and Computer ScienceStanford UniversityAbstract3. The RISC project relies on a straightforward instruction setand straightforward compiler technology. MIPS will requiremore sophisticated compiler technology and will gainsignificant performance benefits from that technology. Thecompiler technology allows a microcode-level instructionset to appear like a normal instruction set to both codegenerators and assembly language programmers.MIPS is a new single chip VLSI microprocessor. It aftempts toachieve high performance with the use of a simplified instructionset, similar to those found in microengines. The processor is a fastpipelined engine without pipeline interlocks. Software solutionsto several traditional hardware problems, such as providingpipeline interlocks, are used.The MIPS architecture is closer to the 801 architecture in manyaspects. In both machines the macroinstruction set maps verydirectly to the microoperations of the processor. Both processorsmay be thought of as architectures with micro-level userinstruction sets. Microcode is created by compilers and codegenerators as it is needed to implement complex operations. Theprimary differences lie in various architectural choices aboutpipeline design, registers, opeodes and in the attempt in the MIPSinstruction set to make all the microengine parallelism available atthe user instruction set level. These attempts are most visiblewithin MIPS in the following ways: the two-part memory/ALUand A L U / A L U instructions, the explicit pipeline interlocks, andthe conditional jump instructions.IntroductionMIPS (Microprocessor without Interlocked Pipe Stages) is a newgeneral purpose microprocessor architecture designed to beimplemented on a single VLSI chip. The main goal of the designis high performance in the execution of comPiled code. Thearchitecture is experimental since it is a radical break with thetrend of modern computer architectures. The basic philosophy ofMIPS is to present an instruction set that is a co !apiler-drivenencoding of the microengine. Thus, little or no decoding isneeded and the instructions correspond closely to microeodeinstructions. The processor is pipelined but provides no pipelineinterlock hardware; this function must be provided by software.MIPS is designed for high performance. To allow the user to getmaximum perf )rmance, the complexity of individual instructionsis minimized. This allows the execution of these instructions atsignificantly higher speeds. To take advantage of simplerhardware and an instruction set that easily maps to themieroinstruction set, additional compiler-type translation isneeded. This compiler technology makes a compact and timeefficient mapping between higher level constructs and thesimplified instruction set. The shifting of the complexity from thehardware to the software has several major advantages:The MIPS architecture presents the user with a fast machine witha simple instruction set. This approach has been used by the IBM8071 project I and is currently being explored by the RISC projectat Berkeley2; it is directly 'opposed to the approach taken byarchitectures such as the VAX. However, there are significantdifferences between the RISC approach and the approach used inMIPS:1. The RISC architecture is simple both in the instruction setand the hardware needed to implement that instruction set.Although the MIPS instruction set has a simple hardwareimplementation (i.e. it requires a minimal amount ofhardware control), the user level instruction set is not asstraightforward, and the simplicity of the user levelinstruction set is secondary to the performance goals. The complexity is paid for only once during compilation.When a user runs his program on a complex architecture,he pays the cost of the architectural overhead cach time heruns his progrmn. It allows the concentration of energies on the software,rather than constructing a complex hardware engine, whichis hard to design, debug, and efficiently utilize. Software isnot necessarily easier to construct, but the WLSI environment makes hardware simplicity important.2. The thrust of the R I S C design is towards cfficientimplementation of a straightforward instruction set. In theM1PS design, high performance from the hardware engineis a primary goal, and the microengine is presented to theend user with a minimal amount of interpretation. Thismakes most of the microcngine's parallelism available at theinstruction set level.0194-1895/82/0000/0017500.75 1982 I E E EThe design of a high performance VLSI processor is drarnaticallyaffected by the technology. Among the most important designconsiderations are: the effect of pin limitations, available silicon17

area, and size/speed tradeoffs. Pin limitations force the carefuldesign of a scheme for multiplexing the available pins, especiallywhen data and instruction fetches are overlapped.Arealimitations and the speed of off-chip intercommunication requirechoices between on- and off-chip functions as well as limiting thecomplete on-chip design. With current state-of-the-art iechnologyeither some vital component of the processor (such as memorymanagement) must be off-chip, or the size of the chip will makeboth its performance and yields unacceptably low. Choosing whatfunctions are migrated off-chip must be done carefully so that theperformance effects of the partitioning are minimized. In somecases, through careful design, the effects may be eliminated atsome extra cost for high speed off-chip functions.architecture is simplicity of the pipeline structure. The simplifiedstructure has a fixed number of pipestages, each of the samelength. Because, the stages .can be used in varying (but related)ways, pipline utilization improves. Also, the absence ofsynchronization between stages of the pipe, increases theperformance of the pipeline and simplifies the hardware. Thesimplified pipeline eases the handling of both interrupts and pagefaults.Although MIPS is a pipelined processor it does not havehardware pipeline interlocks. This approach is often seen in lowand medium performance microengines. MIPS five stage pipelinecontains three active instructions at any time; either the odd oreven pipestages are active. The major pipestages and their tasksare shown in Table 1.Speed/complexity/area tradeoffs are perhaps the most importantand difficult phenomena to deal with. Additional on-chipfunctionality requires more area, which also slows down theperformance of every other function. "Ibis occurs for two equallyimportant reasons: additional control and decoding logic increases the length of the critical path (by increasing the number ofactive elements in the path) and each additional functionincreases the length of internal wire delays. In the processor's datapath these wire delays can be substantial, since thy accumulateboth from bus delays, which occur when the data path islengthed, and control delays, which occur when the decoding andcontrol is expanded or when the data path is widened. In theMIPS architecture we have attempted to control these delays;however, they remain a dominant factor in detexTnining the speedof the processor.TheDesignTable 1" Major pipestages and their functionsStaqeI n s t r u c t i o n FetchHnemonic TaskIFSend out the PC,increment i tInstruction DecodeIDDecode instructionOperand DecodeODCompute effectivoaddress and send tOmemory i f load orstore, use ALUOperand Store/ExecutionOS/EXStore: w r i t e operand/"Execution: use ALUOperand FetchOFLoad: read operandInterlocks that are required because of dependencies brought outby pipelining are not provided by the hardware. Instead, theseinterlocks must be statically provided where they areneeded by apipeline reorganizer. This has two benefits:1. A more regular and faster harclware implementation ispossible since it does not have the usual complexityassociated with a pipelined machine. Hardware interlockscause small delays for ,all instructions, regardless of theirrelationship on other instructions. Also, interlock hardwaretends to be very complex and nonregular 3,4. qhe lack ofsuch hardware is especially important for VLSI implementations, vhere regularity and simplicity is important.microarchitecturephilosophyThe fastest execution of a task on a microengine would be one inwhich all resources of the microengine were used at a 100% dutycycle performing a nonrcdundant and algorithmically efficientencoding of the task. The MIPS microengine attempts to achievethis goal. The user instruction set is an encoding of themicroengine that makes a maximum amount of the microengineavailable. This goal motivated many of the design decisionsfound in the architecture.2. Rearranging operations at compile time is better thandelaying them at mn time.With a good pipelinereorganizer, most cases where interlocks are avoidableshould be found and taken advantage of. This results inperformance better than a comparable machine withhardware interlocks, since usage of resources will not bedelayed. In cases where this is not detected or is notpossible, no-ops must be inserted into the code. This doesnot slow down execution compared to a similar machinewith- hardware interlocks, but does increase code size. Theshifting of work to a reorganizer would be a disadvantage ifit took excessive amounts of computation. It appears this isnot a problem for our first reorganizer.MIPS is a load/store architecture, i.e. data may be operated ononly when it is in a register and only load/store instructions accessmemory. If data operands are used repeatedly in a basic block ofcode, having them in registers will prevent redundant load/storesand redundant addressing calculations; this allows higherthroughput since more operations directly related to thecomputation can be performed. The only addressing modessupported are immediate, based with offset, indexed, or baseshifted. ibese addressing modes may require fields from theinstruction itself, general registers, and one ALU or shifter peration. Another ALU operation available in the fourth stageof every instruction can be used for a (possibly unrelated)computation. Another major benefit derived from the load/storeIn the MIPS pipeline resource usage is permanently allocated to18

lions wasattempted in an irregular fashion.various pipe stages. Rather than having pipeline stages competefor the ase of resources through queues or priority schemes, themachine's resources are dedicated to specific stages so that theyare 100% utilized. In Figure I, the allocation of resources toindividual pipe stages is shown. When concurrendy executingpipe stages are overlayed, all available resources can be used.MIPS has one instruction size, and all instructions execute in the.,ame amount of time (one data memory cycle). This choicesimplifies the construction of code generators for the architecture(by eliminating many nonobvious code sequences for differentfunctions) and makes the construction of a synchronous regularpipeline much easier. Additionally, the fact'that each maeromstruction is a single microinstruction of fixed length and executiontime means that a minimum amount of internal state is needed inthe processor. The absence of this internal state leads to a fasterprocessor and minimizes the difficulty of supporting interruptsand page faults.Figure I: Resource Allocation by PipestageResource Allocation by PipestageFigure 1Time,- 12IF ID34fi6IFR e s o u r c e s of t h e m i c r o e n g i n e789The major functional components of the microengine include:10 ALU resources: A high speed, 32-bit carry lookahead ALUwith hardware support for multiply and divide; and a barrelshitter with byte insert and extract capabilities. Only one ofthe ALU resources is usable at a time. Thus within the classof ALU resources, functional units can not be fully usedeven when the class itself is used 100%.OFooFInstructionMemor.,ALUIO.Dora, icmor' Internal bus resources: Two 32-bit bidirectional busses,each connecting almost all functional components.OF On chip storage: Sixteen 32-bit general,purpose registers.Of) EX :)en IIFID Memory resources: Two memory interfaces, one forinstructions and one for data. reach of the parts of thememory resource can be 100% utilized (subject to packingand instruction space usage) because either one store orload form data memol3, and one instruction fetch can occursimultaneously.OSores ALU reserved for use by OOand EXTo achieve 100% utilization primitive operations in the microengine (e.g., load/store, AI.U operations) must be completelypacked into maeroinstructions. This is not possible for threereasolls:1. Dependencies can prevent full usage of the microengine,for example when a sequence of register loads must be donebefore an ALU operation or when no-ops must be inserted. A multistage PC unit: An incrementable current PC withStorage of ono branch target as well as four previous PCvalues. These are required by the pipelining of'instructionsand interupt and exception handling.The instruction setZ An encoding that preserved all the parallelism (i.e., themicrocontrol word itsel0 would be too large. This is notserious problem since many of the possible microinstructions are not useful.All MIPS instructions are 32-bits. The user instruction set is acompiler-based encoding of the micromachine.Static anddyn,'unie instruction set efficiency, as detcn:ained by a codegenerator, is used to decide what micromachine features toencode into macroinstructim s in the architecture. Multiplesimple (and possibly unrelated)instruction pieces are packedtogetlter into an instruction word. 'lhe basic instruction piecesare-"l. ALU pieces - these instructions are all register/register (2and 3 operand form: ts). 'lllcy all use less that1 1/2 of aninstruction word.Included in this category are byteinsert/extract, two b!t l oolhs multiply step, and one bitnonrcstoring divide step, ,as well as ,,,taudard AI,U andlogical oper, ttions.3. The encoding of the microcngine presented in the instruction set acrifiees some functional specification for immediate data. In the worst case, space in the instrxlcti.on wordused for loading large immediate values takes up the spacenorumlly used for a b;Lse register, displacement, and ALUoperation specification. In this case the memory interfaceand AI,U can nut be used during the pipe stage for whichthey are dedicated.Nevertheless, first results on micrucngine utilization ame, eouraging. Many instructions fully utilize the major resourcesofthe machine. Other instructions, s Jch Io;id immediate whichuse few of the resources of the m:lchine, would mandate greatlyincreased control complexity if ovett tp with surrounding instruc-2. Load/store picce, - these iustrucli,ns load and store19

memory operands. They use between 16 and 32 bits of aninstruction word. When a load instruction is less than 32bits, it may be packaged with an ALU instruction, which isexecuted during the Execution stage of the pipeline.The solution we have chosen to this pl:oblem is to separate thedata and instruction memory systems. Separation of program anddata is a regular practice on many machines; in file MIPS systemit allows us to significantly increase performance. Another benefitof the separation is that it allows the use of a cache only forinstructions. Because the instruction memory can be treated asread-only memory (except when a program is being loaded), thecache control is simple. The use of an instruction cache allowsincreased performance by providing more time during the criticalinstruction decode pipe stage.3. Control flow pieces - these include direct jumps andcompare instructions with relative jumps. MIPS does nothave condition codes, but includes a rich collection of setconditionally and comp,'ire and jump instructions. The setconditional instructions provide a powerful implementationfor conditional expressions. They set a register to all l's orO's based on one of 16 possible comparisons done duringthe operand decode stage. During the Execution stage anALU operation is available for logical operations with otherbooleans. The compare and jump instructions are directencodings of the micromacfiine: the operand decode stagecomputes the address of the branch target and theExecution cycle does the comparison. All branch instructions have a delay in their effect of one instruction; i.e., thenext sequential instruction is always executed.Faults and interruptsThe MIPS architecture will support page faults, externallygenerated interrupts, and internally generated traps (arithmeticoverflow). The necessary hardware to handle such things in apipelined architecture usually large and complex 3,4. Furthermore, this is an area where the lack of sufficient hardware supportmakes the construction of systems software impossible. However,because the MIPS instruction set is not interpreted by amicroengine (with its own state), hardware support for page faultsand interrupts is significantly simplified.4. Other instructions - inc!ude procedure and interruptlinkage. The procedure linkage instructions also fit easilyinto the micromachine format of effective address calculation and register-register computation instructions.To handle interrupts and page faults correctly, two importantproperties are required. First, the architecture must ensure correctshutdown of the pipe, without executing any faulted instructions(such as the instruction which page faulted). Most presentmicroprocessors can not perform this function correctly (e.g.Motorola 68000, Zilog ZS000, and the Intel 8086). Second, theprocessor must be able to correctly restore the pipe ,and continueexecution as if the interrupt or fault had not occurred.MIPS is a word-addressed machine. This provides several majorperformance advantages over a byte addressed architecture. First,the use of word addressing simplifies the memory interface sinceextraction and insertion hardware is not needed. This isparticularly important, since instruction and data fetch/store arein a critical path. Second, when byte data (characters) can behandled in word blocksl the computation is much more efficient.Last, the effectiveness of short offsets from base register ismultiplied by a factor of four.These problems are significantly eased in MIPS because of thelocation of writes within the pipe stages. In MIPS all instructionswhich can page fault do not write to any storage, either registersor memory, before the fault is detected. The occurrence of a pagefault need only turn off writes generated by this and anyinstructions following it which are already in the pipe. Thesefollowing instructions also have not written to any storage beforethe fault occurs.The instruction preceding the faultinginstruction is guaranteed to be executable or to fault in arestartable manner even after the instruction following it faults.The pipeline is drained and control is transferred to a generalpurpose exception handler. To correctly restart execution threeinstructions need to be reexecuted. A multistage PC tracks theseinstructions and aids in correctly executing them.MIPS does not directly support floating point arithmetic. Forapplications where such computations are infrequent, floatingpoint operations implemented with integer opcrations and fieldinsertion/extraction sequences should be sufficient. For moreintensive applications a numeric co-processor similar to the Intel8087 would be appropriate.Systems issuesThe key systems issues are the memory system, and internal trapsand external interrupt support.The memory systemSoftware issuesThe use of memory mapping hardware (off chip in the currentdesign) is needed to support virtual memory. Modern microprocessors (Motorola 68000) are already faced with the problemthat thesum of the memory access time and the memory mappingtime is too long to allow the

MIPS is a new single chip VLSI microprocessor. It aftempts to achieve high performance with the use of a simplified instruction set, similar to those found in microengines. The processor is a fast pipelined engine without pipeline interlocks. Software solutions

Related Documents:

bits, gọi là MIPS-64. MIPS xem xét trong môn học này là MIPS làm việc với các thanh ghi chỉ 32 bit, gọi là MIPS-32. ÞTrong phạm vi môn học này, MIPS dùng chung sẽ hiểu là MIPS-32 Tóm lại, chỉ có 3 loại toán hạng trong một lệnh của MIPS 1. Toán hạng thanh ghi (Register Operands) 2.

Performance on EEMBC benchmarks aggregate for Consumer, Telecom, Office, Network, based on ARM1136J-S (Freescale i.MX31), ARM1026EJ-S, Tensilica Diamond 570T, T1050 and T1030, MIPS 20K, NECVR5000). MIPS M4K, MIPS 4Ke, MIPS 4Ks, MIPS 24K, ARM 968E-S, ARM 966E-S, ARM926EJ-S, ARM7TDMI-S scaled by ratio of Dhrystone MIPS within architecture family.

Table 1: How 2020 MIPS Final Scores Relate to 2022 MIPS Payment Adjustments Final Score Points MIPS Payment Adjustment 0.00 – 11.25 points Negative (-) MIPS payment adjustment of -9% 11.26 – 44.99 points Negative (-) MIPS payment adjustment, between 0% and -9%, on a linear sliding scale 45.00 points (Performance threshold 45.00 points)

Final Project: MIPS-like Microprocessor Objective: The objective of this project is to design, simulate, and implement a simple 32-bit microprocessor with an instruction set that is similar to a MIPS. Note: some of the details are intentionally omitted. You must use what you have learned throughout the semester to complete the project.

ACOs in MIPS receive advantages by being scored under the MIPS APM Scoring Standard, which gives ACOs favorable treatment for their commitment to value-base care. Based on the low bar set for 2019 reporting in MIPS, ACOs should easily avoid penalties under MIPS and will be eligible for MIPS bonuses and exceptional performance bonuses.

Chapter 1: Getting started with mips Remarks This section provides an overview of what mips is, and why a developer might want to use it. It should also mention any large subjects within mips, and link out to the related topics. Since the Documentation for mips is new, you may need to create initial versions of those related topics. Examples

MIPS Architecture Example: subset of MIPS processor architecture – Drawn from Patterson & Hennessy MIPS is a 32-bit architecture with 32 registers – Consider 8-bit subset using 8-bit datapath – Only implement 8 registers ( 0 - 7) – 0 hardwired to 00000000 – 8-bit program counter David Harris has developed labs to implement

MIPS R3000 ISA† MIPS R3000 is a 32-bit architecture Registers are 32-bits wide Arithmetic logical unit (ALU) accepts 32-bit inputs, generates 32-bit outputs All instruction types are 32-bits long MIPS R3000 has: 32 general-purpose registers (for use by integer