Chapter 2 Assemblers - Sinica - Free Download PDF

19d ago
3.89 MB
126 Pages

Chapter 2Assemblers

November 26, 2010Outline Basic Assembler Functions Machine-Dependent Assembler Features Machine-Independent Assembler Features Assembler Design Options Implementation ExamplesCopyright All Rights Reserved by Yuan-Hao Chang2

Basic AssemblerFunctions

November 26, 2010Assemblers The fundamental functions that assemblers mustperform:– Translate mnemonic operation codes to their machinelanguage equivalents.– Assign machine addresses to symbolic labels used bythe programmer. The design of assemblers depends on the machinelanguage because of the existence of differentmachine instruction formats and codes.Copyright All Rights Reserved by Yuan-Hao Chang4

November 26, 2010Assembler Directives In addition to the mnemonic machine instructions, SIC andSIC/XE include the following assembler iptionSpecify name and starting address for the program.Indicate the end of the source program and (optionally) specifythe first executable instruction in the program.Generate character or hexadecimal constant, occupying asmany bytes as needed to represent the constant.Generate one-word integer constant.Reserve the indicated number of bytes for a data area.Reserve the indicated number of words for a data area.Copyright All Rights Reserved by Yuan-Hao Chang5

November 26, 2010Instruction An assembly instruction is a statement that is executed at run time. Itusually consists of four parts:––––Label (optional)Instruction (required)Operands (instruction specific)Comment (optional)Instruction E.g.– ADDLPLabelADDRS, X . ADD 3 to index valueOperandsComment The program in the next slide is a routine that reads records from aninput device (device cdoe F1) and copies them to an output device(code 05). This program will be used throughout the whole chapter.– Subroutine RDREC: read a record into a buffer (buffer size 4096B).– Subroutine WRREC: write the record from the buffer to the output device.Copyright All Rights Reserved by Yuan-Hao Chang6

Line numberLabelInstruction OperandCommentStart of theprogramRead 4096 bytesin each loopRead 4096 bytes orReach End-Of-Recordfrom the INPUTdevice(A) : (ZERO)A Å (EOF)L Å (RETADR)The number ofbytes stored inthe read buffer“BUFFER”X Å (ZERO)A Å (ZERO)Read one byte ineach loopWrite data in thebuffer to the OUTPUTdeviceWrite one byte ineach loopEnd of the program7RETADR Å (L)JSUB RDRECL Å (PC);PC Å RDRECRSUBPC Å (L)November 26, 2010Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010Simple SIC Assembler The translation of source program to object code requiresthe following functions:– 1. Convert mnemonic operation codes to their machine languageequivalents.- E.g., Translate STL to 14 (line 10).– 2. Convert symbolic operands to their equivalent machineaddresses. (This needs forward references)- E.g., Translate RETADR to 1033 (line 10).STL RETADR Æ 141033The reference toRETADR is definedlater in the program– 3. Build the machine instruction in the proper format.– 4. Convert the data constants specified in the source program intotheir internal machine representations.- E.g., Translate EOF to 454F46 (line 80).EOF BYTE C’EOF’ Æ 454F46– 5. Write the object program and the assembly listing.Copyright All Rights Reserved by Yuan-Hao Chang8

Line numberMachine address arting addressSICNo object codegenerated foraddresses 10332038.This storage isreserved by theloader for use bythe programduringexecution.Copyright All Rights Reserved by Yuan-Hao Chang9

November 26, 2010Simple SIC Assembler (Cont.) The assembler must process assembler directives(pseudo-instructions).– Assembler directives are not translated into machine instructionsalthough they may have an effect on the object program.– Example:- BYTE and WORD direct the assembler to generate constants as part ofthe object program.- RESB and RESW instruct the assembler to reserve memory locationswithout generating data values. The assembler must write the generated object code intosome output device.– The generated object program will be loaded into memory forexecution.Copyright All Rights Reserved by Yuan-Hao Chang10

November 26, 201011Simple SIC Object Program The simple SIC object program contains three types of records.– Header record: Contain the program name, starting address, and length.-Col. 1HCol. 2-7 Program nameCol. 8-13 Starting address of object program (hexadecimal)Col. 14-19Length of object program in bytes (hexadecimal)– Text record: Contain the translated (i.e., machine code) instructions anddata of the program, together with an indication of the addresses wherethese are to be loaded.-Col. 1TCol. 2-7 Starting address for object code in this record (hexadecimal)Col. 8-9 Length of object code in this record in bytes (hexadecimal)Col. 10-69Object code, represented in hexadecimal (2 columns per byte ofobject code)– End record: Mark the end of the object program and specify the address inthe program to begin.- Col. 1- Col. 2-7EAddress of first executable instruction in object program (hexadecimal)To avoid confusion, the term column is used to refer to positions within object program records.Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010Simple SIC Object Program (Cont.)Startaddress ofthe record30 bytesProgram’s start addressLength of object program:2079 – 1000 1 107A (Hex)Point to the startaddress of the program is used to separate fieldsvisually. It is not present inthe actual object program.Copyright All Rights Reserved by Yuan-Hao Chang12

November 26, 201013Two Passes of Assemblers Pass 1 (define symbols)– The first pass does little more than scan the source program for labeldefinitions and assign addresses.- 1. Assign addresses to all statements in the program.- 2. Save values (addresses) assigned to all labels for use in Pass 2.- 3. Perform some processing of assembler directives.· This includes processing that affects address assignment, such as determiningthe length of data areas defined by BYTE, RESW, etc. Pass 2– The second pass assembles instructions and generates objectprogram.- 1. Assemble instructions: Translating operation codes and looking upaddresses.- 2. Generate data values defined by BYTE, WORD, etc.- 3. Perform processing of assembler directives not done during Pass 1.- 4. Write the object program and the assembly listing.Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010Data Structures in Assembling Two major internal data structures used in assembling:– Operation Code Table (OPTAB)- Use to look up mnemonic operation codes and translate them to theirmachine language equivalents.– Symbol Table (SYMBTAB)- Use to store values (addressed) assigned to labels. Location Counter (LOCCTR)– LOCCTR is a variable used to help in the assignment of addresses.– LOCCTR is initialized to the beginning address specified in theSTART statement.– After each source statement is processed, the length of theassembled instruction or data area to be generated is added toLOCCTR.- When we reach a label in the source program, the current value ofLOCCTR gives the address to be associated with that label.Copyright All Rights Reserved by Yuan-Hao Chang14

November 26, 2010Operation Code Table (OPTAB) OPTAB must contain the mnemonic operation code and its machinelanguage equivalent. In more complex assemblers, OPTAB might also contain informationabout instruction format and length. OPTAB in two passes:– In Pass 1, OPTAB is used to look up and validate operation codes in thesource program.– In Pass 2, OPTAB is used to translate the operation codes to machinelanguage. The SIC/XE machine has instructions of different lengths,– Pass 1 uses OPTAB to find the instruction length for incrementing LOCCTR.– Pass 2 uses OPTAB to tell which instruction format to use in assembling theinstruction. OPTAB is usually organized as a hash table, with mnemonic operationcode as the key.– OPTAB is usually predefined and is a static table.– The hash table to maintain OPTAB provides fast retrieval with a minimum ofsearching.Copyright All Rights Reserved by Yuan-Hao Chang15

November 26, 2010Symbol Table (SYMTAB) SYMTAB includes the name and value (address) for eachlabel in the source program, together with flags to indicateerror conditions. SYMTAB may also contain other information about the dataarea or instruction labeled (e.g., type or length).– During Pass 1, labels are entered into SYMTAB as they areencountered in the source program, along with their assignedaddresses (from LOCCTR).– During Pass 2, symbols used as operands are looked up inSYMTAB to obtain the addresses to be inserted in the assembledinstructions. SYMTAB is usually organized as a hash table for efficiencyof insertion and retrieval.– Entries in SYMTAB are rarely deleted, so that efficiency of deletionis not an important consideration.– A prime table length often gives good hashing efficiency.Copyright All Rights Reserved by Yuan-Hao Chang16

November 26, 2010Communications Between Two Passes There is certain information that should becommunicated between two passes. For thisreason,– Pass 1 usually writes an intermediate file that containseach source statement with its assigned address, errorindicators, etc.- The intermediate file retains the results of certain operations thatmay be performed during Pass 1, so as to avoid repeating manyof the table-searching operations.- E.g.,· The operand field for symbols and addressing flags is scanned.· Pointers into OPTAB and SYMTAB may be retained for each usedoperation code and symbol.Copyright All Rights Reserved by Yuan-Hao Chang17

November 26, 2010Algorithm for Assembler For simplicity, we assume the source lines are written in afixed format with fields LABEL, OPCODE, and OPERAND. We assume that Pass 1 write four fields to each line of theintermediate file:– LOC , LABEL, OPCODE, and OPERAND.(LOC is the assigned machine address) If one of these fields contains a character string thatrepresents a number, we denote its numeric value with theprefix #. E.g., #[OPERAND] The simplified algorithm for Pass 1 and Pass 2 ofassembler is listed in the following four slides.Copyright All Rights Reserved by Yuan-Hao Chang18

November 26, 2010Algorithm for Pass 1Handle the first line toinitialize the location countercommentHandle theLABEL field ofthe current inputlineThe instruction length ofSIC machine is 3 bytes.Copyright All Rights Reserved by Yuan-Hao Chang19

November 26, 201020Algorithm for Pass 1 (Cont.)Declare a WORD constantReserve # of WORDsHandle theOPCODE field ofthe current inputlineReserve # of bytesDeclare a BYTE constantCopyright All Rights Reserved by Yuan-Hao Chang

November 26, 201021Algorithm for Pass 2Output the current line to the listing file that is for thedebugging purpose.Each line of the listing file includes:Line, Loc, LABEL, OPCODE, OPERAND , and Object codeThe current line isan instruction.Use the operandaddress to replacethe symbol in theOPERAND fieldHandle theOPERAND ofan instruction.Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010Algorithm for Pass 2 (Cont.)Handle the OPERANDof an instruction.There is no symbolin the OPERANDfieldThe current line is todeclare a BYTE orWORD constant.Current Text recordis full so finalize itand initialize a newText record.Copyright All Rights Reserved by Yuan-Hao Chang22

Machine-DependentAssembler Features

November 26, 2010Machine-Dependent Assembler Features Many real machines have certain architectural features thatare similar to those we consider in SIC/XE machine.– Indirect addressing is indicated by the prefix @ to the operand.– Immediate operands are denoted with the prefix #.– The assembler directive BASE is used in conjunction with baserelative addressing.– The extended instruction format (Format 4) is specified with theprefix added to the operation code in the source statement. The main advantages of SIC/XE, compared to SIC:– Involve the use of register-to-register instructions. E.g.,- COMPR A, S- TIXR T– Immediate and indirect addressing is supported. Programmers need to specify the used addressing mode.Copyright All Rights Reserved by Yuan-Hao Chang24

Line numberLabelInstructionOperandObjectNovembercode 26, 2010SIC/XECopyright All Rights Reserved by Yuan-Hao Chang25

November 26, 2010Advantage of SIC/XE Architecture Register-to-register instructions are faster thanthe corresponding register-to-memory operations.– Register-to-register instructions are shorter and do notrequire another memory reference.– E.g., Changing COMP to COMPR results in animprovement in execution speed. When using immediate addressing, the operandis already present in the instruction and need notbe fetched from anywhere. When using indirect addressing, it often avoidsthe need for another instruction.Copyright All Rights Reserved by Yuan-Hao Chang26

Line numberMachine address Label able programCopyright All Rights Reserved by Yuan-Hao Chang27

November 26, 201028Instruction Formats and AddressingModes The START statement specifies a beginning programaddress of 0.– This indicates a relocatable program. Program will be translated as ifit were really to be loaded at machine address 0. Register-to-register instructions such as CLEAR (line 125)and COMPR (line 150) only need to– Convert the mnemonic operation code (using OPTAB).– Change each register mnemonic to its numeric equivalent. (Pass 2)- It often preloads register names and their values to SYMTAB. Register-to-memory instructions are assembled with eitherprogram-counter relative or base relative addressing.– The assembler must calculate a displacement to be assembled aspart of the object instruction.– The displacement must be small enough to fit in the 12-bit field inthe instruction.Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 201029Instruction Formats and AddressingModes (Cont.) If the displacements are too large, the 4-byte extendedinstruction format (Format 4) must be used to contain thefull memory address.150006CLOOP JSUB RDREC Translation precedence of our assembler:– If the extended format is not specified,4B101036opcode 48n i x b p e1 1 0 0 0 1addr 01036- First the program-counter relative addressing is adopted.- Then the base relative addressing is adopted if the requireddisplacement is out of range of program-counter relative addressing.– Displacement of program-counter relative addressing is between-2048 and 2047.– Displacement of program-counter relative addressing is between0 and 4095.Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 201030Program-Counter (PC) Relative Addressing The computation that the assembler needs to perform is the targetaddress calculation in reverse.OperandObject codePC is advanced before instruction is executed.100000FIRSTSTL950030RETADRRESW 1LineLocLabelInstructionRETADR17202Dopcode n i x b p edisp141 1 0 0 1 0 02D 30-3 disp 0006 – (PC) 0006 – 001A -14 FEC (2’s complement in 12 bits)150006400017CLOOP JSUB RDREC4B101036J3F2FECCLOOPopcode 3C, disp FECn i x b p e1 1 0 0 1 0Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 201031Base Relative Addressing The base relative addressing needs to reference the base register. But theprogrammer must tell the assembler what the base register will contain.– The statement BASE LENGTH (line 13) informs theassembler that the base register will contain theaddress of LENGTH.– The preceding instruction (LDB #LENGTH) loads this value into the registerduring program execution.– The assembler assumes for addressing purposes that register B contains thisregister until it encounters another BASE statement.– The programmer must use another assembler directive (perhaps NOBASE) toinform the assembler that the contents of the base register can no longer berelied upon for addressing.– The programmer must provide instructions that load the proper value into thebase register during execution.– BASE and NOBASE are assembler directives and produce no executable code.Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010Base Relative Addressing (Cont.) Register B would contain 0033.12130003LDB#LENGTHBASE LENGTH10010500330036LENGTHBUFFERRESW 1RESB 4096160104EFIRSTSTCH BUFFER, Xopcode 10n i x b p e1 1 0 1 0 0disp 000(33-33 00)Base relative addressing is adoptedbecause the displacement of PCrelative addressing is out of range.PC 1052, BUFFER 0036 (largerthan 7FF)1751056EXITSTXopcode 68n i x b p e0 1 0 0 1 0disp 02D(33-06 2D)69202D57C003opcode 54n i x b p e1 1 1 1 0 0disp 003(36-33 03)LENGTHCopyright All Rights134000Reserved by Yuan-Hao Chang32

November 26, 2010PC Relative vs. Base Relative Addressing12130003LDB#LENGTHBASE LENGTHSuppose we choose PCrelative addressing beforechoosing base relativeaddressing.PCrelative69202Dopcode 00, disp 026 (33-0D 26)n i x b p e1 1 0 0 1 020000ALDA1000033LENGTHRESW 11751056EXITSTXIf we use PC relative addressing, thedisplacement would be too large to fit in the12-bit disp field.disp 0x1059 – 0x33 0x1026 0x7FF ( 2047)LENGTHLENGTHBaserelativereg. B 33032026134000opcode 10n i x b p e1 1 0 1 0 0disp 000(33-33 00)Copyright All Rights Reserved by Yuan-Hao Chang33

November 26, 2010Immediate Addressing The assembly of an instruction that specifies immediate addressing issimpler because no memory reference is involved. It is to convert the immediate operand to its internal representation andinsert it into the instruction.Immediate addressingPC relative addressing is not combined because 3 is a value.550020LDA133103C LDTFormat 4The operand (4096) is too large to fit inthe 12-bit displacement field.The value of a symbol isthe address assigned to it.Therefore, #LENGTH 33120003LDBopcode 00n i x b p e#30100030 1 0 0 0 0#4096 75101000 disp 003opcode 68n i x b p e0 1 0 0 1 0disp 02D(33-06 2D)#LENGTHopcode 74n i x b p e0 1 0 0 0 1addr 0100069202DCopyright All Rights Reserved by Yuan-Hao Chang34

November 26, 2010Indirect AddressingJump to the place whose address is stored in thevariable [email protected] 1Indirectaddressing3E2003opcode 3Cn i x b p e1 0 0 0 1 0disp 003(030-02D 003)Copyright All Rights Reserved by Yuan-Hao Chang35

November 26, 2010Program Relocation We usually do not know exactly when jobs will be submitted,exactly how long they will run.– In such a situation, the actual starting address of the program is notknown until load time.SIC 55101BLDATHREE00102DThe operand only refers toaddress 102D so that theprogram can’t be relocated toother addresses.opcode 00x 0addr 102D The assembler does not know the actual location where the programwill be loaded, but it can identify for the loader these parts of theobject program that need modification. An object program that contains the information necessary to performthis kind of modification is called a relocatable program.Copyright All Rights Reserved by Yuan-Hao Chang36

November 26, 2010Example of Program Relocation10361036No matter where theprogram is loaded,RDREC is always1036 bytes past thestarting address of theprogram.1036Copyright All Rights Reserved by Yuan-Hao Chang37

November 26, 2010Program Relocation Some values in the object program depend on the starting address ofthe program.– In order to support program relocation, the values depending on the startingaddress of the program need to be relocated when program is loaded. Solve the relocation problem as follows:– When the assembler generates the object code for the JSUB instruction,it inserts the address of RDREC relative to the start of the program.- This is the reason we initialized the location counter to 0 for the assembly.– The assembler produce

Assemblers •The fundamental functions that assemblers must perform: –Translate mnemonic operation codes to their machine language equivalents. –Assign machine addresses to symbolic labels used by the programmer. •The design of assemblers depends on the machine language because of the existence of different machine instruction formats ...