Lexical And Syntax Analysis - Virginia Tech

1y ago

8 Views

2 Downloads

877.42 KB

39 Pages

Last View : 2d ago

Last Download : 3m ago

Upload by : Dahlia Ryals

Report this link

Download PDF

Transcription

Lexical and Syntax Analysis In Text: Chapter 4 N. Meng, F. Poursardar

Lexical and Syntactic Analysis Two steps to discover the syntactic structure of a program – Lexical analysis (Scanner): to read the input characters and output a sequence of tokens – Syntactic analysis (Parser): to read the tokens and output a parse tree and report syntax errors if any 2

Compilation Process 3

Interaction between lexical analysis and syntactic analysis 4

Reasons to Separate Lexical and Syntax Analysis Simplicity - less complex approaches can be used for lexical analysis; separating them simplifies the parser Efficiency - separation allows optimization of the lexical analyzer Portability - parts of the lexical analyzer may not be portable, but the parser is always portable 5

Scanner Pattern matcher for character strings – If a character sequence matches a pattern, it is identified as a token Responsibilities – Tokenize source, report lexical errors if any, remove comments and whitespace, save text of interesting tokens, save source locations, (optional) expand macros and implement preprocessor functions 6

Tokenizing Source Given a program, identify all lexemes and their categories (tokens) 7

Lexeme, Token, & Pattern Lexeme – A sequence of characters in the source program with the lowest level of syntactic meanings o E.g., sum, , - Token – A category of lexemes – A lexeme is an instance of token – The basic building blocks of programs 8

Token Examples Token Informal Description Sample Lexemes keyword All keywords defined in the language if else comparison , , , , , ! , ! id Letter followed by letters and digits pi, score, D2 number Any numeric constant 3.14159, 0, 6 literal Anything surrounded by “’s, but exclude “ “core dumped” 9

Another Token Example Consider the following example of an assignment statement: result oldsum – value / 100; Following are the tokens and lexemes of this statement: 10

Lexeme, Token, & Pattern Pattern – A description of the form that the lexemes of a token may take – Specified with regular expressions 11

Motivating Example Token set: – – – – – – – – – assign - : plus - minus - times - * div - / lparen - ( rparen - ) id - letter(letter digit)* number - digit digit* digit*(.digit digit.)digit* 12

Motivating Example What are the lexemes in the string “a var: b*3” ? What are the corresponding tokens ? How do you identify the tokens? 13

Lexical Analysis Three approaches to build a lexical analyzer: – Write a formal description of the tokens and use a software tool that constructs a table-driven lexical analyzer from such a description – Design a state diagram that describes the tokens and write a program that implements the state diagram – Design a state diagram that describes the tokens and handconstruct a table-driven implementation of the state diagram 14

State Diagram A state transition diagram, or just state diagram, is a directed graph. The nodes of a state diagram are labeled with state names. The arcs are labeled with the input characters that cause the transitions among the states. An arc may also include actions the lexical analyzer must perform when the transition is taken. 15

State Diagram State diagrams of the form used for lexical analyzers are representations of a class of mathematical machines called finite automata. Finite automata can be designed to recognize members of a class of languages called regular languages. Regular grammars are generative devices for regular languages. The tokens of a programming language are a regular language, and a lexical analyzer is a finite automaton. 16

State Diagram Design A naïve state diagram would have a transition from every state on every character in the source language - such a diagram would be very large! Reason? Because every node in the state diagram would need a transition for every character in the character set of the language being analyzed. Solution: Consider ways to simplify 17

State Diagram Design - Example Design a lexical analyzer that recognizes only arithmetic expressions, including variable names and integer literals as operands. Assume that the variable names consist of strings of uppercase letters, lowercase letters, and digits but must begin with a letter. Names have no length limitation. How many transitions for initial state? How can we simplify it? 18

Example (continued) There are 52 different characters (any uppercase or lowercase letter) that can begin a name, which would require 52 transitions from the transition diagram’s initial state. However, a lexical analyzer is interested only in determining that it is a name and is not concerned with which specific name it happens to be. Therefore, we define a character class named LETTER for all 52 letters and use a single transition on the first letter of any name. 19

Example (continued) Another opportunity for simplifying the transition diagram is with the integer literal tokens. There are 10 different characters that could begin an integer literal lexeme. This would require 10 transitions from the start state of the state diagram. define a character class named DIGIT for digits and use a single transition on any character in this character class to a state that collects integer literals 20

Lexical Analysis (continued) In many cases, transitions can be combined to simplify the state diagram – When recognizing an identifier, all uppercase and lowercase letters are equivalent o Use a character class that includes all letters – When recognizing an integer literal, all digits are equivalent - use a digit class 21

Lexical Analysis (continued) Reserved words and identifiers can be recognized together (rather than having a part of the diagram for each reserved word) – Use a table lookup to determine whether a possible identifier is in fact a reserved word 22

State Diagram 23

Lexical Analysis (continued) Convenient utility subprograms: – getChar - gets the next character of input, puts it in nextChar, determines its class and puts the class in charClass – addChar - puts the character from nextChar into the place the lexeme is being accumulated – lookup - determines whether the string in lexeme is a reserved word (returns a code) 24

/* Function declarations */ void addChar(); void getChar(); void getNonBlank(); int lex(); /* Character classes */ #define LETTER 0 #define DIGIT 1 #define UNKNOWN 99 /* Token codes */ #define INT LIT 10 #define IDENT 11 #define ASSIGN OP 20 #define ADD OP 21 #define SUB OP 22 #define MULT OP 23 #define DIV OP 24 #define LEFT PAREN 25 #define RIGHT PAREN 26 25

Implementation Pseudo-code static TOKEN nextToken; static CHAR CLASS charClass; int lex() { switch (charClass) { case LETTER: // add nextChar to lexeme addChar(); // get the next character and determine its class getChar(); while (charClass LETTER charClass DIGIT) { addChar(); getChar(); } nextToken ID; break; 26

case DIGIT: addChar(); getChar(); while (charClass DIGIT) { addChar(); getChar(); } nextToken INT LIT; break; case EOF: nextToken EOF; lexeme[0] ‘E’; lexeme[1] ‘O’; lexeme[2] ‘F’; lexeme[3] 0; } printf (“Next token is: %d, Next lexeme is %s\n”, nextToken, lexeme); return nextToken; } /* End of function lex */ 27

Lexical Analyzer Implementation: front.c (pp. 166-170) - Following is the output of the lexical analyzer of front.c when used on (sum 47) / total Next Next Next Next Next Next Next Next token token token token token token token token is: is: is: is: is: is: is: is: 25 11 21 10 26 24 11 -1 Next Next Next Next Next Next Next Next lexeme lexeme lexeme lexeme lexeme lexeme lexeme lexeme is is is is is is is is ( sum 47 ) / total EOF 28

The Parsing Problem Given an input program, the goals of the parser: – Find all syntax errors; for each, produce an appropriate diagnostic message and recover quickly – Produce the parse tree, or at least a trace of the parse tree, for the program 29

The Parsing Problem (continued) The Complexity of Parsing – Parsers that work for any unambiguous grammar are complex and inefficient ( O(n3), where n is the length of the input ) – Compilers use parsers that only work for a subset of all unambiguous grammars, but do it in linear time ( O(n), where n is the length of the input ) 30

Two Classes of Grammars Left-to-right, Leftmost derivation (LL) Left-to-right, Rightmost derivation (LR) We can build parsers for these grammars that run in linear time 31

Grammar Comparison E E’ T T’ F LL LR - T E’ E - E T T - T E’ ε T - T * F F - F T’ F - id - * F T’ ε - id 32

Two Categories of Parsers LL(1) Parsers – L: scanning the input from left to right – L: producing a leftmost derivation – 1: using one input symbol of lookahead at each step to make parsing action decisions LR(1) Parsers – L: scanning the input from left to right – R: producing a rightmost derivation in reverse – 1: the same as above 33

Two Categories of Parsers LL(1) parsers (predicative parsers) – Top down o Build the parse tree from the root o Find a left most derivation for an input string LR(1) parsers (shift-reduce parsers) – Bottom up o Build the parse tree from leaves o Reducing a string to the start symbol of a grammar 34

Top-down Parsers Given a sentential form, xAα, the parser must choose the correct A-rule to get the next sentential form in the leftmost derivation, using only the first token produced by A The most common top-down parsing algorithms: – Recursive descent - a coded implementation – LL parsers - table driven implementation 35

Bottom-up parsers Given a right sentential form, α, determine what substring of α is the right-hand side of the rule in the grammar that must be reduced to produce the previous sentential form in the right derivation The most common bottom-up parsing algorithms are in the LR family 36

Recursive Descent Parsing Parsing is the process of tracing or constructing a parse tree for a given input string Parsers usually do not analyze lexemes; that is done by a lexical analyzer, which is called by the parser A recursive descent parser traces out a parse tree in top-down order; it is a top-down parser Each nonterminal has an associated subprogram; the subprogram parses all sentential forms that the nonterminal can generate The recursive descent parsing subprograms are built directly from the grammar rules Recursive descent parsers, like other top-down parsers, cannot be built from left-recursive grammars 37

Recursive Descent Example Example: For the grammar: term - factor {(* /) factor } Simple recursive descent parsing subprogram: void term() { factor(); /* parse the first factor*/ while (next token ast code next token slash code) { lexical(); /* get next token */ factor(); /* parse the next factor */ } } 38

Related Documents:

Auditory lexical decision in children with specific ...

test whether temporal speech processing limitation in SLI could interfere with the autonomous pre-lexical process (Montgomery, 2002) -lexical contact and lexical . It is worth noting that the auditory lexical decision task and the receptive vocabulary measure taps two different levels of processing; the last one. Lexical decision in children .

11 Views

2y ago

central Kenya Bantu The syntax and semantics of causative affixes in

causative constructions found in languages viz. non-lexical and lexical. The non-lexical causative, . The non-lexical causative shows ambiguity when used with adverbs Downloaded by [Kenyatta University] at 00:03 08 March 2016 . 388 but the lexical causative does not have this ambiguity (Cooper, 1976:323). To illustrate,

13 Views

1y ago

Step one: lexical analysis Lexical scanning Lexical Analysis Regular ...

Lexical analyzer generator -It writes a lexical analyzer Assumption -each token matches a regular expression Needs -set of regular expressions -for each expression an action Produces -A C program Automatically handles many tricky problems flex is the gnu version of the venerable unix tool lex. -Produces highly .

11 Views

1y ago

An Analysis of Lexical Collocation Errors in Students' Writing

lexical collocations, and using the correct lexical collocations continuously in oral and written communication. The study of lexical collocation has been conducted by many researchers in the past few decades. The first previous study was by Martelli (2004) about a study of English lexical collocations written by Italian

8 Views

1y ago

Structural Ambiguity and Lexical Relations - ACL Anthology

Resolving ambiguity through lexical asso- ciations Whittemore et al. (1990) found lexical preferences to be the key to resolving attachment ambiguity. Similarly, Taraban and McClelland found lexical content was key in explaining people's behavior. Various previous propos- als for guiding attachment disambiguation by the lexical

12 Views

1y ago

Research Article Reading Errors of Dyslexic

lexical and sub-lexical treatment to improve the speed and accuracy of reading in dyslexic subjects. The objective of this research is to report the results of the quantitative and qualitative analysis of dysphonetic and . dyseidetic errors after lexical and sub-lexical treatment. Since this

10 Views

2y ago

Lexical and Structural Ambiguity Found in The Script of Emily in Paris ...

3.1 The Types of Lexical Ambiguity The researcher identified the types of lexical ambiguity from the data and found 2 types based on types of lexical ambiguity framework used by Murphy (2010) which are absolute homonymy and polysemy. The researcher found 38 utterances which were lexically ambiguous. 3.1.1 Absolute

12 Views

1y ago

Student’s Intention and Perception towards Pursuing ...

advanced accounting program. Understanding students’ intentions in pursuing their studies to higher level of accounting courses is an important step to attract students to accounting courses. Beside intention, students’ perception on advanced accounting programs and professional courses may

71 Views

3y ago

Recent Views

Saint Robert Bellarmine - WordPress

Aug 08, 2018 · Sister Laura Gorman Sister Anna Frances Portisch Sister Mary Edward Haren Sister Dolores Priske (Helen Julie) Sister Scholastica Healy Sister olette Marie Quinn Sister lara . S. Heidelman Sister Alice Mary Reilly Sister Genevieve Henneberry (Fidelis) Sister Genevieve Rigney

2y ago

160 Views

Sunday, September 12, 2021 10:00 a.m.

Sep 12, 2021 · On our 154th Church Anniversary, We salute the members of Mount Pleasant Baptist Church who have served for 50 years or more. Sister Brenda Bradley Sister Mary Lockett Sister Aaronita Brown Sister June Marshall Deacon Carlton Brown Sister Barbara Moore Sister Gwendolyn Brown Sister Frances Robinson Deaconess Josephine Byrd Sister Frances Ross

2y ago

344 Views

MRS Title 21-A. ELECTIONS - Maine Legislature

stepgrandchild, stepsister, stepbrother, mother-in-law, father-in-law, brother-in-law, sister-in-law, son-in-law, daughter-in-law, guardian, former guardian, domestic partner, the half-brother or half-sister of a person's spouse, or the spouse of a person's half-brother or half-sister. [PL 2009, c. 253, §2 (AMD).] 21. Incoming voting list.

1y ago

118 Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

745 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

496 Views

Companies Law - Cayman Islands dollar

Law 1 of 1971-15th December, 1970 Law 7 of 2000- 20th July, 2000 Law 7 of 1973-28th June, 1973 Law 5 of 2001-20th April, 2001 Law 24 of 1974-22nd November, 1974 Law 10 of 2001-25th May, 2001 Law 25 of 1975-9th December, 1975 Law 29 of 2001-26th September, 2001 Law 19 of 1977-10th November, 1977 Law 46 of 2001-14th January, 2002

3y ago

454 Views

It’s the Law!

ciples stated in Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law. Students will be able to explain the application of Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law to observations or events related to SCUBA diving. MateriaLs None audio/visuaL MateriaLs None teachinG tiMe

2y ago

378 Views

WHAT LAW IS ? An Introduction to Law

common law system civil law system!! sources of law in civil law !! a1. primary: statutes (written law) enacted by legislative power are the principal source of law. ! a2. two subsidiary sources of law: ! a2.1 administrative regulations a.2.2 customs!! ! sources of law in common law !!! b1. two primary sources of

2y ago

385 Views

Immaculata, Pennsylvania 19345-0200 Catholic Schools

Fall, 2012 Cover Sister Monica Therese Sicilia, I.H.M. IHM Best Practices Sister Margaret Rose Adams, I.H.M For Teachers: Sister Adrienne Saybolt, I.H.M. “Helping K-2 Students Struggling with Reading and Writing” Prime Times Sister Rita James Murphy, I.H.M. Good Writer

2y ago

117 Views

Winter 2012 - IHM EDUCATIONAL RESOURCES - Home

IHM Best Practices Sister Margaret Rose Adams, IHM For Teachers: Sister Adrienne Saybolt, IHM “Helping K-2 Students Struggling with Reading and Writing” Prime Times Sister Elaine deChantal Brookes, IHM Sister

2y ago

138 Views

Tributes in Honor of: SISTER JANET AHLER, CSA CSA SISTERS .

Everett & Jeannine Solon SISTER CORINNE HEIMANN, CSA St Mary's Hospital Board of Directors Teresa Hebble John & Mary Sterba SISTER MARY VERONICA HEIMANN, CSA Sybil Teehan Teresa Hebble Rebecca & Gary Tirevold MR EDWARD HELSTOSKY Bonnie Young Barbara Britz SISTER JOELLEN FLYNN, CSA RAY HINZ Susan Flynn Carol Hinz Fran Frigo JEAN W HOFF

2y ago

341 Views

How to Use These “Snippets” and Poems

For Sale By Shel Silverstein One sister for sale! One sister for sale! One crying and spying young sister for sale! I’m really not kidding, So who’ll start the bidding? Do I hear the dollar? A nickel? A penny? Oh, isn’t there, isn’t there, isn’t there any One kid that will buy this old sister for sale,

2y ago

367 Views

CODIS2006 - Mixture Interpretation - Butler FINAL

“Things we do not do: Calculate mixture ratios for casework – Calculation used for this study: Find loci with 4 alleles (2 sets of sister alleles). Make sure sister alleles fall within 70%, then take the ratio of one allele from one sister set to one allele of the second sister set, figure ratios for all combinations and average.

2y ago

315 Views

CONSECRATA

Salesian Sisters of St. John Bosco Sister Marie Amata D’Amico, C.K. School Sisters of Christ the King Sister Mary Stephany Rose, O.S.H.J. Oblate Sisters of the Sacred Heart of Jesus Sister Brigid Mary Meeks, R.S.M. Religious Sisters of Mercy of Alma Sister Hae-Jin Lim, F.M.A. Salesian Sisters of

2y ago

111 Views

Sister Makes House Calls During the Pandemic

Sister Patricia Deckert, RSM. As an elementary school teacher, Sister Patricia (Pat) taught in the Trenton, Metuchen and Camden dioceses in New Jersey, serving eight years at Cathedral School in Trenton, and seven years at St. James School in Red Bank. Attending nursing school at the age of 50, Sister Pat first ministered at McAuley Hall

11m ago

86 Views

Lexical And Syntax Analysis - Virginia Tech

It looks like you're using an ad-blocker