Learning CCG Parser - TAU

1y ago

3 Views

2 Downloads

1.04 MB

65 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Kairi Hasson

Report this link

Download PDF

Transcription

Learning CCG parserDor Muhlgay

Outline Introduction CCG The PCCG model Learning Results Conclusion

IntroductionGiven a sentence in NL which represents some task, our goal is to recoverenough information to complete the task. Converting questions to database queries Converting robots instructions to path planningWe will achieve this by mapping the sentence to its logical formlist flights to bostonλx.flight(x) to(x, boston)

IntroductionWe define a learning task - given a data set of sentences and their representationin the logical form, we will learn a function from sentences to their logical form.f: Sentence Logical formlist flights to bostonλx.flight(x) to(x, boston)

Outline Introduction CCG The PCCG model Learning Results Conclusion

Lambda calculusThe logical form of a sentence is represented by Lambda calculus. Lambdacalculus is a formal system in mathematical logic for expressing computation. Thelambda–calculus expressions we use are formed from the following items: Constants - represents entities in the world by name of predicate. TEXAS, NYC, boston(x) Logical connectors: conjunction ( ), disjunction ( ), negation ( ) Quantifiers Lambda expressions:Lambda expressions represent functions For example λx.borders(x, texas), λx.flight(x) to(x, boston)

Lambda calculusLambda calculus has three basic types: e: the type of entities t: the type of truth values r: the type of real numbers. Functional types t1, t2 : The type assigned to functions that map fromobjects from type t1 to objects from type t2.In specific domains, we will specify subtype hierarchies for e. For example, ina geography domain we might distinguish different entity subtypes such ascities, states, and rivers

Combinatory Categorial GrammarCCG is a categorial formalism, which provides transparent interface betweensyntax and semantics.We will use CCG to construct sentences in logicalform from sentences in natural language.CCG parsing tree example:

CCG categoriesThe CCG categories are the CCG building blocks. The categories representthe connection between the syntax of the sentence to its semantics.CCG categories examples:

CCG categoriesThe expression on the left side of the colon represents the syntactic part.It may consists: Simple part of speech types such as N, NP, or S Syntactic combination operator “\” or “/”.

CCG categoriesThe expression on the right side of the colon represents the semantic part ofthe category. The semantic part is simply a lambda expression The semantic part always matches the syntactic part.

CCG lexiconThe CCG lexicon maps between words in the natural language, to the CCGcategory that represents them: Each lexical entry consists a natural word and its corresponding CCGcategory The CCG lexicon is used to map the words of the sentence to the leavesof the CCG parsing tree. There are many lexical entries in the lexicon.CCG lexicon example:

CCG combinatorsCCG makes use of a set of CCG combinators, which are used to combinecategories to form larger pieces of syntactic and semantic structure. There is a small set of CCG combinatorsEach CCG combinator receives 1 or 2 CCG categories, and output one“larger” CCG category.The CCG combinators operate jointly on both the syntactic andsemantic parts of the categories.

CCG combinators - applicationThe functional application rules: The first rule states that a category with syntactic type A/B can becombined with a category to the right of syntactic type B to create a newcategory of type A. It also states that the new semantics will be formedby applying the function f to the expression g.The second rule handles arguments to the left - the direction of theapplication is determined by the direction of the slash.The rule is equivalent to function application.

CCG combinators - compositionThe functional composition rules: The notion of functional application can be extended to functioncombination.As before, the direction of the slash implies the direction of the functionapplication

CCG combinators - type shiftingExample of type shifting rules: Type shifting rules are unary operation, that receives a category andreturns a modified category.The rules are used to change a category during parsing.The rules helps keeping a compact lexicon

CCG combinators - coordinationThe words “and” and “or” are mapped to special categories in the lexicon.The coordination rule is defined as follows on the syntactic part:The semantic part of the rule is simply connecting the two logicalforms with the appropriate logical connector

CCG parsing exampleUse the lexicon to map the words and phrases in the sentence to CCG categories

CCG parsing exampleApply the type shifting rule to change the adjectives types

CCG parsing exampleApply the function composition rule:

CCG parsing exampleApply the coordination rule to combine the categories adjacent to “or”

CCG parsing exampleUse the function application rule to apply the adjective on the noun:

Extended CCGZettlemoyer and Collins extended CCG and added the followingcombinators: A set of functional combinators that allow the parser to significantly relaxconstraints on word order. A set of type-raising rules which allow the parser to cope with missingfunction words.

Outline Introduction CCG The PCCG Model Learning Results Conclusion

CCG ambiguityA sentence might have more than one valid parse tree and logical form.Ambiguity can be caused by:One source of ambiguity is lexical items having more than one entry inthe lexicon. Multiple lexical items are mapped to same word/phrase. For example,New York might have entries NP : new york city and NP : new yorkstate. A single logical form might be derived from multiple parse trees(spurious ambiguity).

PCCGA CCG receives a sentence S, and outputs a pair (L,T), where L is the logicalform and T is the parse tree. In CCG there might be multiple pairs (L,T) thatmatches a sentence S.We resolve the ambiguity issue by generalizing CCG to probabilistic CCG(PCCG): A PCCG defines a conditional distribution P(L,T S) over possible (L,T)pairs for a given sentence S. PCCGs deal with ambiguity by ranking alternative parses for a sentencein order of probability.

PCCG - Feature VectorWe assume a function f mapping (L, T, S) triples to feature vectors in .f is defined by d individual features, so that f(L, T, S) f1(L, T, S), . . . ,fd(L, T, S) .Each feature fj is typically the count of some sub-structure within (L, T,S). The features are restricted to be sum of local features, i.e featuresthat depend only on a specific node in the parse tree. The count of a specific lexical entry in the subtree The number of times a predicate is conjoined with itself:

PCCG - Log linear modelWe chose to represent the probability P(L,T S) in the log-linear model:The model is parameterized by the vector

PCCG - ParsingIn order to Parse a sentence in the PCCG model, we would like to findGiven the parameters vector and a lexicon, we can use a variant of the CKYalgorithm with beam search to find some approximation of y*: We keep a parse chart C, where each entry C[i,j] represents thesub-sentence from word i to j. Each entry C[i,j] has a set of possible parse-tree root for the sub sentencethey represents and their score. The root holds a full CCG category. We prune the roots sets to keep only the N-highest scoring roots in eachentry. After filling the chart, we look for the highest scoring root in C[1,n], andbacktrack to recover the entire tree.

Outline Introduction CCG The PCCG Model Learning Results Conclusion

LearningOur goal is to learn a CCG parser from our supervised data set. In particular, we would like to learn the parameter vector and the CCGlexicon. The combinators rules are predefined. How does the supervised data looks like?

Learning - supervised dataA fully supervised data set includes the sentence, parse tree and logical form.However, usually the data set doesn’t contain the parse tree, and we will treatit as a hidden variable.

LearningWe will introduce an algorithm that learns both the lexicon and parametervector . The algorithm iterates over the examples, and perform the followingsteps for each example: Step 1- Lexicon update: Add new lexical entries according to theexample. Step 2- Parameters update: Re-estimate the parameters according tothe new lexicon. Can be done in the following ways: Gradient ascent (Zettlemoyer and Collins 2005) Hidden variable Structured perceptron (Zettlemoyer and Collins2007)

Gradient AscentWe will define the log-likelihood function:Where P(L,T S) is represented by the log-linear model:Our goal is to findthat maximizes the log-likelihood function.

Gradient AscentGradient ascent is an iterative optimization algorithm tofind a local maximum. At each iteration the algorithm takes stepsproportional to the the gradient of the function at thecurrent point. The gradient points to in the direction of the greatestrate of increase of the function. At each step weincrease the function until we reach a local maxima. The size of each step is defined by the step-rateparameter. The steps should be large enough so thealgorithm will converge quickly, but small enough sowe won’t overshoot (skip the local maxima).

Gradient AscentWe will use gradient ascent to maximize the log-likelihood function:The j element in the gradient vector is defined as follows:

Gradient AscentA few notes: The jth parameter is updated by taking the difference between theexpected score of the jth feature given the sentence and the label, and theexpected score of the jth feature when only the sentence is given.Intuitively, it is the difference between what we the feature should be andwhat we think it should be with our current parameters. The step rate is defined by the parameters and c. The parameter t is thecount of iterations - we decrease the step size at each iteration. The gradient ascent is a batch method which requires a pass over theover the entire training set, which effects on the algorithm efficiency. Wewill introduce the perceptron algorithm, which is fully online.

Structured perceptron (Collins 2002)

Structured perceptronThe structured perceptron: Fully online algorithm, that is the learning is done example by example. The parameters update is done in additive fashion - Simple and easy toimplement. Requires an efficient algorithm for argmax. In some applications of theperceptron algorithm, a set of candidates is generated and the highestscoring output is found from the candidates set. If there is a linear separator that classifies all the examples correctly, thealgorithm guarantees convergence - we will show a formal proof.

Structured perceptron convergenceBasic definitions: Let B(xi) be the set of incorrect label for an example xi, i.e B(xi) {z z ! yi}. A training set is “separable with margin 0” if there exists some vector Uwith Uj 1 ( - norm 2), such that for every i end every z in B(xi) : Let R be a constant such that for every i end every z in B(xi) :We will prove that if a training set is “separable with margin ”, then

Structured perceptron convergenceProof outline: Letbe the parameter vector just before the algorithm made the kmistake.We will assume that the weight vector is initialized to the be the zerovector. We will prove the lower bound: We will prove the upper bound: The claim immediately follows:

Structured perceptron convergenceLower boundproof: Let z be the proposed label when the algorithm made k’th mistake. According to the update rule: We take the inner product of both sides of the equality with vector U, andapply the “separable with margin” property: Becausewe getOn the other hand: ., and hence:

Structured perceptron convergenceUpper boundproof: Let z be the proposed label when the algorithm made k’th mistake. According to the update rule: The following holds: The inequality holds because of the assumption,And because(z is the highest scoring label with thethe parameters ).By induction we get the upper bound:

Hidden variable structured perceptron Structured perceptron doesn’t work for our learning task as is, since wehave latent data - the parse tree.Structured perceptron can be extended to support hidden variable.

Hidden variable structured perceptron

Hidden variable structured perceptronThe hidden variable structured perceptron: Similar to the gradient ascent algorithm, where the expectation werereplaced with argmax. Fully online and efficient algorithm. No known convergence guarantees, but works well empiricallyWe will now put all the parts together and introduce an online algorithm forCCG parser (Zettlemoyer and Collins 2007), based on the hidden variablestructured perceptron algorithm.

Online learning algorithm for CCGThe algorithm iterates over the example T times. Foreach sample, the algorithm follows 3 steps: Step 1 (check correctness): We try to parsethe sentence in the example with our currentparameters and lexicon. If the logical form weobtained matches the label, we move to nextexample. Step 2 (Lexical generation): We add newlexical entries to our lexicon. Step 3 (Update parameters): We try to parsethe sentence again with the new lexicon. If wefail, we update the parameters so they would fitthe new lexicon.

Online learning algorithm for CCG- The current lexiconW - The current parameter vectorF - The feature function- The set of all the parse trees for thesentence x. We parse the sentence xi according to ourcurrent lexicon and parameters.The parsing is done by finding the parse tree thatmaximizes w*f(x,y), and it can be computedefficiently by a CKY-style algorithm.

Online learning algorithm for CCGL(y) - Extract the logical form from the parse tree y Check if the logical form outputted by the parsetree is correct.If the result equals to the example label, thealgorithms goes to next example. Else, thealgorithm goes to next step - fixing the lexicon.

Online learning algorithm for CCG- The current lexicon- The new temporary lexicon- given a sentence and its logicalform, GENLEX generates a set of possible lexicalentries. The algorithm runs GENLEX to generate a noisyset of possible lexical entries.A temporary lexicon is created from our currentlexicon and the new lexical entries.

Online learning algorithm for CCG- the set of all the parse trees forsentence x, with the lexicon and the logical form z. The algorithm parse the sentence xi according tothe temporary lexicon and current parameters,given that the logical form is zi.The parsing is done by finding the parse tree y*that maximizes w*f(x,y), from all the parse treeswhose logical forms are zi. We can use a similar algorithm to the onewe used for parsing, where in the pruningstep we eliminate trees that aren’tconsistent with the logical form zi.

Online learning algorithm for CCG We obtain the lexical entries of y*We add the new lexical entries to the lexicon.

Online learning algorithm for CCG We parse the sentence xi according to the newlexicon and current parameters, and obtains theparse tree y’.

Online learning algorithm for CCG If the logical form obtained from the parse tree y’doesn’t match the label zi, we do an additiveupdate of the parameter vector.

Online learning algorithm for CCG After iterating over all the samples T times, thealgorithm returns the lexicon and parametervector w.

GENLEXGENLEX receives a pair of sentence S and its logical form L, and returns a setof lexical entries. GENLEX generates a large set of lexical entries, some ofthem may be illegal. Those items are eliminated from the lexicon in a laterstage of the algorithm.The set returned by GENLEX is defined as follows:Where W(S) is the set of all sub-sentences in S, and C(L) is a set ofcategories derived from L.

GENLEX - categories generationThe categories generation function C(L) s defined through a set of rules thatexamine L and produce categories based on its structure: Each rule consists of a trigger that identifies some substructure within thelogical form L. For each sub-structure in L that matches the trigger, a category is createdand added to C(L).

GENLEX - categories generationAn example of the C(L) rules:

Outline Introduction CCG The PCCG Model Learning Results Conclusion

ResultsExperiments setup: The training set and data set were extracted from ATIS and Geo880domains. Parameters setting: The number of passes over the training set The initial weight of the initial lexical entries’ features The initial weight of the learned lexical entries’ features

ResultsResults compercement with He and Young 2006 on ATIS data set:Results compercement with Zettlemoyer and Collins 2005 onGeo880 data set:

ResultsResults of the ATIS development set for the full algorithm and restricted versionsof it without the GCC extensions:

Outline Introduction CCG The PCCG Model Learning Results Conclusion

Conclusion We introduced the concepts of lambda calculus and combinatory categorialgrammar (CCG), which allows us to map sentences to their logical form. We presented two algorithms for parameter estimation in a log-linear parsingmodel: gradient ascent and hidden variable structured perceptron. We showed an online algorithm for learning CCG, which combines structuredperceptron with lexical learning. We presented some experiments results that shows the system achievessignificant accuracy improvements in both the ATIS and Geo880 domains, incompercement with previous works.

CCG ambiguity A sentence might have more than one valid parse tree and logical form. Ambiguity can be caused by: One source of ambiguity is lexical items having more than one entry in the lexicon. Multiple lexical items are mapped to same word/phrase. For example, New York might have entries NP : new york city and NP : new york state.

Related Documents:

SAKURA-CON 2014

Live-action, Korea (SC-13, sub) (4:45) CCG Open Gaming CCG Open Gaming CCG Open Gaming CCG Open Gaming CCG Open Gaming Open . Replay (SC-13) Closed for setup Closed for setup Morning of All Things AMV (SC-13) Morning of All Things AMV . Black Butler Tea Party (SC-

32 Views

2y ago

NHS Stockport Clinical Commissioning Group Council of ...

Dr Ranjit Gill . Chief Clinical Officer . NHS Stockport CCG . John Greenough . Lay Member . NHS Stockport CCG . Louise Hayes . Head of Communications and Engagement NHS Stockport CCG . Diane Jones . Director of Service Reform . NHS Stockport CCG . Dr Deborah Kendall .

36 Views

2y ago

The Dark Sentinel - Emperor's Hammer TIE Corps - News

having a real good idea in what direction a CCG group is supposed to head and how to set it up. This has changed since - many of you I know from a Star Trek related CCG site, and since you are also SW:CCG and/or YJ:CCG players, you decided to join the EHS and thus the EH. And I am happy you d

13 Views

2y ago

Procedural Document Unified Parser 2 - NetApp

output XML file created by Unified Parser. .uptrm is a trimmed version of input XML file. If Unified Parser should detect any formatting or data collection issues with the given input file, then Unified Parser will try to rename .xml file to .uperr and automatically unselects it on the grid view.

19 Views

1y ago

TAU EMPIRE EPIC ARMY LIST

THE TAU EMPIRE The Tau are a relatively young, aspiring race, whose homeworld is situated deep in the galactic eastern rim of Ultima Segmentum. The ancestors of the Tau had been discovered by an Imperial exploratory mission

16 Views

2y ago

District President Candidacy Packet - Weebly

Stanley C. Rosenberg, Tau Beta Sigma -Delta Delta Honorary, Spring 2014 . Tyler Jason Ramsay, Tau Beta Sigma – Delta Delta, Spring 2015 . BethAyn Curtis, Tau Beta Sigma – Delta Delta Honorary, Spring 2015 . Awards and Recognition: District Alumni Secretary Award, 2013 . Tau Beta Sigma,

30 Views

2y ago

Tau Beta Pi New York Tau - Tau Beta Pi About Recruit

As in past years, Tau Beta Pi members participated prepared and facilitated a popsicle stick bridge building event. It was held during our Watson School of Engineering’s E-Week Community Day. For three hours, kids from the community could come in and we would help them design and construct a bridge usi

14 Views

2y ago

Advanced level Specification - CIPD

Advanced level Speciflcation summary 1. 2 Advanced level Speciflcation summary Qualification objective CIPD Advanced level qualifications provide a depth of knowledge alongside the opportunity to specialise in chosen areas of expertise. Candidates will be able to develop their understanding of organisations and the external context within which HR operates. Using critical analysis, self .

62 Views

3y ago

Recent Views

Stock Market Development and Economic Growth: Empirical Evidence from China

measures used to proxy for stock market size and the size of real economy. Most of the existing studies use stock market index as a proxy for measuring the growth and development of stock market in a country. We argue that stock market index may not be a good measure of stock market size when looking at its association with economic growth.

1y ago

263 Views

Lasso Technique Application In Stock Market Modelling: An Empirical .

This research tries to see the influence of G7 and ASEAN-4 stock market on Indonesian stock market by using LASSO model. Stock market estimation method had been conducted such as Stock Market Forecasting Using LASSO Linear Regression Model (Roy et al., 2015) and Mali et al., (2017) on Open Price Prediction of Stock Market Using Regression Analysis.

3m ago

18 Views

The Stock Market Profits Blueprint - Liberated Stock Trader

The stock market profits blueprint has been hand crafted to enable you to understand all the factors that play on the stock market. It is called a blueprint because a blueprint is in effect an architectural document to show how something is designed. The Blueprint will show you a powerful way to envisage how the stock market and the stock market

1y ago

181 Views

Factors Affecting Performance of Stock Market: Evidence from . - HRMARS

We used the data of Colombo Stock Exchange (CSE) for Sri Lankan stock market in this research which is the main stock exchange of Sri Lanka. The market capitalization of CSE is over 20 billion USD. Colombo stock exchange is the first south Asian region stock market and overall 52nd who obtain the membership of World Federation of Exchanges.

11m ago

103 Views

Stock Market Development in the Philippines: Past and Present

Philippine stock market. This paper may serve as a basis for further research on the stock market development in the country. This paper is organized as follows: Section 2 traces the origins of the stock market in the Philippines while section 3 outlines the reforms that have been implemented to strengthen the stock market.

1y ago

128 Views

Columbus,Ohio 1890

Slicing Steaks 3563 Beef Tender, Select In Stock 3852 Angus XT Shoulder Clod, Choice In Stock 3853 Angus XT Chuck Roll, Choice 20/up In Stock 3856 Angus XT Peeled Knuckle In Stock 3857 Angus XT Inside Rounds In Stock 3858 Angus XT Flats, Choice In Stock 3859 Angus XT Eye Of Round, Choice In Stock 3507 Point Off Bnls Beef Brisket, Choice In Stock

2y ago

268 Views

Buying Your First Stock - Stock-Trak

Stock Market Game Time: 15 Minutes Requires: StockTrak Curriculum , Computer Access Buying Your First Stock This lesson is an introduction to buying a stock. Students will be introduced to basic vocabulary that is involved with a buying and owning a stock. Stu-dents will be going through the entire process of buying a stock from looking

1y ago

164 Views

1.11.1. Where to Find Wall Street Training - Investing 101

investing and day trading, how to trade stock options, online free stock trading, market timing strategies, and mutual funds. But, first—learn what these terms mean. Play stock market games:Play stock market games: A stock simulation market game will train you to be comfortable with investing

2y ago

125 Views

Stock Price Prediction Using RNN and LSTM - JETIR

1. BASIC INTRODUCTION OF STOCK MARKET A stock market is a public market for trading of company stocks. Stock market prediction is the task to find the future price of a company stock. The price of a share depends on the number of people who want to buy or sell it. If there are more buyers, then prices will rise. If the seller has a number of .

1y ago

114 Views

Stock Market Wealth Effects - Harvard University

negative stock return and a subsequent decline in household spending and employment. We use a local labor market analysis to address this empirical challenge and provide quantitative evidence on the stock market consumption wealth e ect. Our empirical strategy combines regional heterogeneity in stock market wealth with aggregate movements in stock

1y ago

104 Views

Artificial Intelligence Approach for Stock Market - IJSER

The forecast of stock market helps investors to make investment decisions, via giving them strong insights about the behavior of stock market for avoiding investment risks. It was found that news has an influence on the stock price behavior [2]. The stock market is a constantly changing indicator of economic activity all over the world.

1y ago

109 Views

The Stock Market Game Student Activity Packet - Maryland Council on .

1. The Stock Market Game Kick Off! (3 mins) 2. Intro to Investing (4 mins) 3. Intro to Companies (3 mins) 4. Intro to Stocks (4 mins) 5. Building Your Portfolio (5 mins) 6. The Stock Market Game Trading Portfolio (6 mins) 7. The Stock Market Game Rules (6 mins) 8. Conducting Research (5 mins) 9. Entering Stock Trades (4 mins) 10. Assessing Risk .

1y ago

114 Views

Stock Market Uncertainty and the Stock-Bond Return Relation

implied volatility and stock turnover may prove useful for ﬁnancial applications that need to under-stand and predict stock and bond return co-movements. Finally, our empirical results suggest that the beneﬁts of stock-bond diversiﬁcation increase during periods of high stock market uncertainty. This study is organized as follow.

1y ago

158 Views

The Stock Market Crash of 1929, Great Depression, Dust .

The Stock Market Crash of 1929 In 1929, the Stock Market Crashed!! The stock of a business represents the original money paid into or invested in the business by its founders. So the stock represents how much mone

2y ago

358 Views

Web Based Stock Forecasters - Winlab

Stock market prediction is the act of trying to determine the future value of a company stock or other financial instrument traded on a financial exchange. The successful prediction of a stock's future price could yield significant profit. The stock market is not an efficient market.

1y ago

102 Views

Learning CCG Parser - TAU

It looks like you're using an ad-blocker