NLP In Algorithms Graph-based - MIT OpenCourseWare

2y ago

7 Views

2 Downloads

419.03 KB

57 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Maxine Vice

Report this link

Download PDF

Transcription

Graph-based Algorithms in NLPRegina BarzilayMITNovember, 2005

Graph-Based Algorithms in NLP In many NLP problems entities are connected by arange of relations Graph is a natural way to capture connectionsbetween entities Applications of graph-based algorithms in NLP:– Find entities that satisfy certain structuralproperties deﬁned with respect to other entities– Find globally optimal solutions given relationsbetween entities

Graph-based Representation Let G(V, E) be a weighted undirected graph– V - set of nodes in the graph– E - set of weighted edges Edge weights w(u, v) deﬁne a measure of pairwisesimilarity between nodes u,v0.20.40.30.40.70.1

Graph-based Representation33231253155423504551 2233 4533 55550

Examples of Graph-based Citation Netyescitationreference relationTextnosentsemantic connectivity

Hubs and Authorities Algorithm(Kleinberg, 1998) Application context: information retrieval Task: retrieve documents relevant to a given query Naive Solution: text-based search– Some relevant pages omit query terms– Some irrelevant do include query termsWe need to take into account the authority of the page!

Analysis of the Link Structure Assumption: the creator of page p, by including alink to page q, has in some measure conferredauthority in q Issues to consider:– some links are not indicative of authority (e.g.,navigational links)– we need to ﬁnd an appropriate balance betweenthe criteria of relevance and popularity

Outline of the Algorithm Compute focused subgraphs given a query Iteratively compute hubs and authorities in thesubgraphHubsAuthorities

Focused Subgraph Subgraph G[W ] over W V , where edgescorrespond to all the links between pages in W How to construct G for a string ?– G has to be relatively small– G has to be rich in relevant pages– G must contain most of the strongestauthorities

Constructing a Focused Subgraph:NotationsSubgraph ( , Eng, t, d) : a query stringEng: a text-based search enginet, d: natural numbersLet R denote the top t results of Eng on

Constructing a Focused Subgraph:AlgorithmSet Sc : R For each page p R Let (p) denote the set of all pages p points toLet (p) denote the set of all pages pointing to pAdd all pages in (p) to S If (p) d thenAdd all pages in (p) to S ElseAdd an arbitrary set of d pages from (p) to S EndReturn S

Constructing a Focused Subgraphbaseroot

Computing Hubs and Authorities Authorities should have considerable overlap interms of pages pointing to them Hubs are pages that have links to multipleauthoritative pages Hubs and authorities exhibit a mutually reinforcingrelationshipHubsAuthorities

An Iterative Algorithm For each page p, compute authority weight x(p) andhub weight y (p)– x(p) 0, x(p) 0 (p) 2(p) 2–(x) 1,(y) 1p s p s Report top ranking hubs and authorities

I operationGiven {y (p) }, compute:x(p) y (p)q:(q,p) Eq1q2q3page px[p]: sum of y[q]for all q pointing to p

O operationGiven {x(p) }, compute:y (p) x(p)q:(p,q) Eq1page py[p]: sum of x[q]for all q pointed to by pq2q3

Algorithm:IterateIterate (G,k)G: a collection of n linked pagedk: a natural numberLet z denote the vector (1, 1, 1, . . . , 1) RnSet x0 : zSet y0 : zFor i 1, 2, . . . , kApply the I operation to (xi 1 , yi 1 ), obtaining new x-weights x iApply the O operation to (x i , yi 1 ), obtaining new y-weights yi Normalize x i , obtaining xiNormalize yi , obtaining yiReturn (xk , yk )

Algorithm: FilterFilter (G,k,c)G: a collection of n linked pagedk,c: natural numbers(xk , yk ) : Iterate(G, k)Report the pages with the c largest coordinates in xk as authoritiesReport the pages with the c largest coordinates in yk as hubs

ConvergenceTheorem: The sequence x1 , x2 , x3 and y1 , y2 , y3converge. Let A be the adjacency matrix of g Authorities are computed as the principaleigenvector of AT A Hubs are computed as the principal eigenvector ofAAT

Subgraph obtained from ord.comFord Motor Companyhttp://www.eff.org/blueribbon.htmlCampaign for Free Speechhttp://www.mckinley.comWelcome to Magellan!http://www.netscape.comWelcome to Netscape!http://www.linkexchange.comLinkExchange — Welcomehttp://www.toyota.comWelcome to Toyota

Authorities obtained fromwww.honda.com0.202http://www.toyota.comWelcome to w.ford.comFord Motor Company0.173http://www.bmwusa.comBMW of North America, .saturncars.comSaturn Web Site0.155http://www.nissanmotors.comNISSAN

PageRank Algorithm (Brin&Page,1998)Original Google ranking algorithm Similar idea to Hubs and Authorities Key differences:– Authority of each page is computed off-line– Query relevance is computed on-line Anchor text Text on the page– The prediction is based on the combination ofauthority and relevance

Intuitive JustiﬁcationFrom The Anatomy of a Large-Scale Hypertextual WebSearch Engine (Brin&Page, 1998)PageRank can be thought of as a model of used behaviour. Weassume there is a “random surfer” who is given a web page atrandom and keeps clicking on links never hitting “back” buteventually get bored and starts on another random page. Theprobability that the random surfer visists a page is its PageR ank. And, the d damping factor is the probability at each pagethe “random surfer” will get bored and request another ran dom page.Brin, S., and L. Page. "The Anatomy of a Large-Scale Hypertextual Web Search Engine."WWW7 / Computer Networks 30 no. 1-7 (1998): 107-117.Paper available at http://dbpubs.stanford.edu:8090/pub/1998-8.

PageRank ComputationIterate PR(p) computation:pages q1 , . . . , qn that point to page pd is a damping factor (typically assigned to 0.85)C(p) is out-degree of pP R(qn )P R(q1 ))P R(p) (1 d) d ( . C(q1 )C(qn )

Notes on PageRank PageRank forms a probability distribution over webpages PageRank corresponds to the principal eigenvectorof the normalized link matrix of the web

Extractive Text SummarizationTask: Extract important information from a textFigure removed for copyright reasons. Screenshots of several website text paragraphs.

Text as a GraphS1S2S6S3S5S4

Centrality-based Summarization(Radev) Assumption: The centrality of the node is anindication of its importance Representation: Connectivity matrix based onintra-sentence cosine similarity Extraction mechanism:– Compute PageRank score for every sentence u(1 d)P ageRank(u) dNP ageRank(v)deg(v)v adj [u], where N is the number of nodes in the graph– Extract k sentences with the highest PageRanks score

Does it work? Evaluation: Comparison with human createdsummary Rouge Measure: Weighted n-gram overlap (similarto Bleu)MethodRouge 666

Graph-Based Algorithms in NLP Applications of graph-based algorithms in NLP:– Find entities that satisfy certain structuralproperties deﬁned with respect to other entities– Find globally optimal solutions given relationsbetween entities

Min-Cut: Deﬁnitions Graph cut: partitioning of the graph into twodisjoint sets of nodes A,B Graph cut weight: cut(A, B) u A,v B w(u, v)– i.e. sum of crossing edge weights Minimum Cut: the cut that minimizescross-partition similarity0.20.40.30.40.70.10.20.40.30.40.70.1

Finding Min-Cut The problem is polynomial time solvable for 2-classmin-cut when the weights are positive– Use max-ﬂow algorithm In general case, k way cut is N P -complete.– Use approximation algrorithms (e.g.,randomized algorithm by Karger)MinCut ﬁrst used for NLP applications byPang&Lee’2004 (sentiment classiﬁcation)

Min-Cut for Content SelectionTask: Determine a subset of database entries to beincluded in the generated documentTEAM STAT COMPARISONOakland RaidersNew England -1227:40032:201st DownsTotal YardsPassingRushingPenalties3rd Down Conversions4th Down ConversionsTurnoversPossessionINDIVIDUAL LEADERSNew England PassingOakland PassingCollinsJordanCrockettC/ATT YDS18/39265TD3Oakland RushingCAR T0New England RushingLG1419DillonFaulkOakland ReceivingREC53C/ATT YDS24/38 306CAR235YDS6311TD20LG104New England igure by MIT OCW.

Parallel Corpus for Text GenerationPassingPLAYERCP/AT YDS AVGTD INTGarcia.14/21 195 9.3. . .10.Brunell 17/38 192 6.00 0RushingPLAYERSuggs.REC YDS AVG2282 3.7. . .LG TD251.FumblesPLAYERColesPortisDavisLittle.FUM LOST REC110110001001. . .YDS0000.Suggs rushed for 82 yards and scored a touchdownin the fourth quarter, leading the Browns to a 17-13win over the Washington Redskins on Sunday. Jeff Gar cia went 14-of-21 for 195 yards and a TD for theBrowns, who didn’t secure the win until Coles fum bled with 2:08 left. The Redskins (1-3) can pin theirthird straight loss on going just 1-for-11 on third downs,mental mistakes and a costly fumble by Clinton Por tis. “My fumble changed the momentum”, Portissaid. Brunell ﬁnished 17-of-38 for 192yards, but was unable to get into any rhythm becauseCleveland’s defense shut down Portis. The Browns fakeda ﬁeld goal, but holder Derrick Frost was stopped shortof a ﬁrst down. Brunell then completed a 13-yard passto Coles, who fumbled as he was being taken down andBrowns safety Earl Little recovered.

Content Selection: Problem Formulation Input format: a set of entries from a relational database– “entry” “raw in a database” Training: n sets of database entries with associatedselection labelsJordanCrockettOakland RushingTDCAR YDS181703208LG1419Figure by MIT OCW. Testing: predict selection labels for a new set of entries

Simple SolutionFormulate content selection as a classiﬁcation task: Prediction: {1,0} Representation of the 0Goal: Learn classiﬁcation function P (Y X) that canclassify unseen examplesX Smith, 28, 9, 1 Y1 ?

Potential Shortcoming: Lack of Coherence Sentences are classiﬁed in isolation Generated sentences may not be connected in ameaningful wayExample: An output of a system that automaticallygenerates scientiﬁc papers (Stribling et al., 2005):Active networks and virtual machines have a long history ofcollaborating in this manner. The basic tenet of this solutionis the reﬁnement of Scheme. The disadvantage of this typeof approach, however, is that public-private key pair and redblack trees are rarely incompatible.

Enforcing Output CoherenceSentences in a text are connectedThe New England Patriots squandered a couple big leads. That wasmerely a setup for Tom Brady and Adam Vinatieri, who pulled out oneof their typical last-minute wins.Brady threw for 350 yards and three touchdowns before Vinatieri kickeda 29-yard ﬁeld goal with 17 seconds left to lead injury-plagued New Eng land past the Atlanta Falcons 31-28 on Sunday.Simple classiﬁcation approach cannot enforce coherenceconstraints

Constraints for Content SelectionCollective content selection: consider all the entriessimultaneously Individual constraints:3Branch scores TD710 Contextual constraints:3Brady passes to Branch733Branch scores TD710

Individual PreferencesindY0.80.20.5M0.10.50.9NY M N entries

Combining Individual and 20.10.9NY M N entries

Collective Classiﬁcationx C selected entitiesind (x)preference to be selectedlinkL (xi , xj )xi and xj are connected by link of type LMinimize penalty:ind (x) x C ind (x) x C linkL (xi , xj )Lxi C xj C Goal: Find globally optimal label assignment

Optimization Frameworkind (x) x C ind (x) x C linkL (xi , xj )Lxi C xj C Energy minimization framework (Besag, 1986,Pang&Lee, 2004) Seemingly intractable Can be solved exactly in polynomial time (scores arepositive) (Greig et al., 1989)

Graph-Based FormulationUse max-flow to compute minimal cut partitionlinkindY0.80.21.00.5M0.10.50.20.10.9NY M N entries

Learning TaskYMN Learning individual preferences Learning link structure

Learning Individual Preferences Map attributes of a database entry to a feature vectorJordanCrockettOakland RushingTDCAR YDS181703208LG1419Figure by MIT OCW.X Jordan, 18, 17, 0, 14 , Y 1X Crockett, 3, 20, 8, 19 , Y 0 Train a classiﬁer to learn D(Y X)

Contextual Constraints: Learning LinkStructure Build on rich structural information available indatabase schema– Deﬁne entry links in terms of their databaserelatednessPlayers from the winning team that hadtouchdowns in the same quarter Discover links automatically– Generate-and-prune approach

Construction of Candidate Links Link space:– Links based on attribute sharing Link type template:create Li,j,k for every entry type Ei and Ej , and forevery shared attribute kEi Rushing, Ej Passing, and k NameEi Rushing, Ej Passing, and k TD

Link FilteringEi Rushing, Ej Passing, and k NameEi Rushing, Ej Passing, and k TDNew England PassingT. BradyNew England PassingC/ATT YDS AVG TD INT24/38 3068.1 2 0T. BradyNew England RushingC. DillonK. FaulkT. BradyTeamCAR235331YDS6311-173C/ATT YDS AVG TD INT24/38 3068.1 2 0New England RushingAVG TD LG2.7 2 102.2 04-0.3 002.4 2 10C. DillonK. FaulkT. BradyTeamFigure by MIT OCW.CAR235331YDS AVG TD LG632.7 2 102.2 0411-1 -0.3 00732.4 2 10

Link FilteringEi Rushing, Ej Passing, and k NameEi Rushing, Ej Passing, and k TDMeasure similarity in label distribution using 2 test Assume H0 : labels of entries are independent Consider the joint label distribution of entry pairsfrom the training set H0 is rejected if 2

Collective Content SelectionlinkindY0.20.81.00.5MY M N entries0.10.50.20.10.9N Learning– Individual preferences– Link structure Inference– Minimal Cut Partitioning

Data Domain: American Football Data source: the ofﬁcial site of NFL Corpus: AP game recaps with correspondingdatabases for 2003 and 2004 seasons– Size: 468 recaps (436,580 words)– Average recap length: 46.8 sentences

Data: Preprocessing Anchor-based alignment (Duboue &McKeown,2001, Sripada et al., 2001)– 7,513 aligned pairs– 7.1% database entries are verbalized– 31.7% sentences had a database entry Overall: 105, 792 entries– Training/Testing/Development: 83%, 15%, 2%

Results: Comparison with HumanExtraction Precision (P): the percentage of extracted entries that appear inthe text Recall (R): the percentage of entries appearing in the text thatare extracted by the modelR F-measure: F 2 (PP R)MethodPRFClass Majority Baseline29.468.1940.09Standard Classiﬁer44.8862.2349.75Collective Model52.7176.5060.15Previous Methods

Summary Graph-based Algorithms: Hubs and Authorities,Min-Cut Applications: information Retrieval, Summarization,Generation

Crockett 3 20 8 19 Dillon 23 63 2 10 Faulk 5 0 4 18/39 265 3 0 Brady 24/38 306 2 0 YDS TD INT YDS TD INT Oakland Receiving New England Passing New England Rushing New England Receiving 27:40 32:20 New England Patriots REC YDS TD LG REC YDS TD LG Moss 5 130 1 73 Porter 3 48 0 27 Branch 7 99 1 29 2 55 0 35

Related Documents:

Natural Language Processing Techniques Using Deep learning ANN

NLP experts (e.g., [52] [54]). This process gave rise to a total of 57 different NLP techniques. IV. CLASSIFYING NLP TECHNIQUES BY TASKS We first classify the NLP techniques based on their text-processing tasks. Figure 1 depicts the relationship between NLP techniques, NLP tasks, NLP resources, and tools. We define an NLP task as a piece of .

21 Views

6m ago

A PRACTICAL INTRODUCTION TO NLP NLP - Excellence Assured

have been so impressed with NLP that they have gone on to train in our Excellence Assured NLP Training Academy and now use NLP as NLP Practi-tioners, Master Practitioners and Trainers with other people. They picked NLP up and ran with it! NLP is about excellence, it is about change and it is about making the most of life. It will open doors for

18 Views

1y ago

The Power of - cprsuccess.com

5. Using NLP to Overcome Mental Barriers 6. Using NLP to Overcome Procrastination 7. Using NLP in Developing Attraction 8. Using NLP in Wealth Manifestation 9. How to Use NLP to Overcome Social Phobia 10. Using NLP to Boost Self-Condidence 11. Combining NLP with Modelling Techniques 12. How to Use NLP as a Model of Communication with Others 13.

13 Views

1y ago

1. NLP Weltkongress - NLP Institutes

1.NLP's state of development calls for the 1st NLP World Congress The field of NLP has now existed for approximately 34 years. "The wild days" (a book describing the first 10 years of NLP) are over. NLP has grown and can be said to have grown up, integrating depth

9 Views

1y ago

NLP & COACHING PRE - STUDY

methods that are still part of good NLP Practitioner and NLP Master Practitioner trainings today, such as anchoring, sensory acuity and calibration, reframing, representational systems Today NLP is still evolving as NLP’ers continue experimenting with the application of NLP. Like most things,

70 Views

2y ago

NLP For Wizardry

NLP Training Videos 9 Introducing Neuro-Linguistic Programming (NLP) (A) 10 History of NLP 12 The Presuppositions of NLP (A) 13 NLP Communication Model 20 Anatomy of the Mind (A) 21 Creating Excellence in your life 25 Modeling 26 Sensory Acuity 28 BMIRS 29 Eye Accessing 30 Eye Access

117 Views

2y ago

NLP Training Guide 2013

To discuss your NLP training options, leave a message on 44 (0) 7944 388621, or visit www.business-nlp-training.uk to book a consultation, or to complete the on-line contact form. PERSONAL BUSINESS NLP TRAINING 1:1 NLP TRAINING SUMMARY Business NLP Ltd offers unique, personalised an

30 Views

2y ago

NLP and You Design - NLP Life Training

NLP LIFE TRAINING www.nlplifetraining.com UK (0)845 260 7930 . NLPLifeTraining.com 2 Special NLP Report NLP and You The Keys to Success, Health & Happiness in Business & in Life Contents Part 1: 3 Introduction - Your First Steps in NLP Part 2: 4 Richard Bandler - A Life In Change Part 3: 9

15 Views

1y ago

Recent Views

Cyber Security Guide for NZ Law Firms - WordPress

2 Incident Response Solutions Cyber Security Guide for NZ Law Firms Welcome to the Cyber Security Guide for NZ Law Firms The storage of sensitive client information and management of large funds make law firms an attractive target for cybercriminals. It is therefore critical for law firms to understand and mitigate the cyber risks they face.

1y ago

135 Views

New Prudential Regime for Investment Firms - Allen Overy

(iii) Investment firms - often referred to as 'Class 2 firms' - these are non-systemic investment firms that do not carry out dealing on own account or underwriting activities. This category of firms are subject to the full scope of the prudential regime is set out in the IFR and IFD. (iv) Small and non-interconnected investment firms -

1y ago

98 Views

The new EU prudential regime for investment firms

In any event, many bank and non-bank financial groups operating through investment firms in the UK have created new EU27 investment firms (or are scaling up existing EU27 investment firms) to serve EU27 clients as part of their Brexit planning. These firms will be subject to the new EU prudential regime. New Classification of Investment Firms

4m ago

48 Views

Actionable Intelligence: Successful Bi for Law Firms

Source: Gartner, Business Intelligence Imperative, 2001 ACTIONABLE INTELLIGENCE: SUCCESSFUL BI FOR LAW FIRMS - 3. A decade later, the fact gap remains a core issue. Law firms have more data than ever about . 1990 Mid-2000s 2015 A CONDENSED HISTORY OF BUSINESS INTELLIGENCE ACTIONABLE INTELLIGENCE: SUCCESSFUL BI FOR LAW FIRMS - 5.

1y ago

129 Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

745 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

496 Views

Investment banks hedge funds private equity

investment banks, hedge funds, and private equity firms can use the book to broaden their understanding of their industry and competitors. Finally, professionals at law firms, accounting firms, and other firms that advise investment banks, hedge funds, and private equity firms should

2y ago

372 Views

2021 Report on the State of the Legal Market

1 Thomson Reuters Peer Monitor data are based on reported results from 162 U.S.-based law firms, including 45 Am Law 100 firms, 56 Am Law Second 100 Firms, and 61 additional Midsize firms. 2 Malcolm Gladwell, The Tipping Point

2y ago

136 Views

Cyber Security for Law Firms

Cyber Security and Legal Practice (Australia) Cyber security threats are increasing. 2019 Cyber Security Report - American Bar Association (ABA)(United States) Over a quarter of firms report that they have experienced some sort of security breach Less than a third of law firms have an incident response plan. 2019 PwC Law Firms' Survey

1y ago

130 Views

MARTINDALE-HUBBELL TOP RANKED LAW FIRMS METHODOLOGY TOP - Fee, Smith

view the entire list online at: fortune.com & law.com martindale-hubbell top ranked law firms methodology ranked firms law top page proof—for approval only presents leal leaders coming in 2015 featured in women leaders law in the 2015 for more information call: 855-808-4520 or e-mail legalleaders@alm.com page proof—for approval .

1y ago

92 Views

Companies Law - Cayman Islands dollar

Law 1 of 1971-15th December, 1970 Law 7 of 2000- 20th July, 2000 Law 7 of 1973-28th June, 1973 Law 5 of 2001-20th April, 2001 Law 24 of 1974-22nd November, 1974 Law 10 of 2001-25th May, 2001 Law 25 of 1975-9th December, 1975 Law 29 of 2001-26th September, 2001 Law 19 of 1977-10th November, 1977 Law 46 of 2001-14th January, 2002

3y ago

454 Views

It’s the Law!

ciples stated in Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law. Students will be able to explain the application of Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law to observations or events related to SCUBA diving. MateriaLs None audio/visuaL MateriaLs None teachinG tiMe

2y ago

378 Views

WHAT LAW IS ? An Introduction to Law

common law system civil law system!! sources of law in civil law !! a1. primary: statutes (written law) enacted by legislative power are the principal source of law. ! a2. two subsidiary sources of law: ! a2.1 administrative regulations a.2.2 customs!! ! sources of law in common law !!! b1. two primary sources of

2y ago

385 Views

Growth Processes of High- Growth Firms in the UK - Nesta

Interest in high-growth ﬁrms (HGFs) has exploded in recent years, once the job-creating prowess of a minority of fast-growing ﬁrms became recognized - roughly 4% of ﬁrms can be expected to generate 50% of jobs (Storey, 1994, p. 117). Research into high-growth ﬁrms has itself undergone high-growth. However, the level of analysis has of-

1y ago

120 Views

Socio-economic profile Coastal and marine ecosystem and economy

According to the Philippine Plastics Industry Association, Inc. (PPIA), there are 1,088 firms throughout the Philippines. The majority of the plastics companies are situated in the National Capital Region (NCR) with 642 firms. This is followed by CALABARZON area with 176 firms. While Central Luzon registered 87 firms. Central Visayas have 87 firms.

1y ago

120 Views

NLP In Algorithms Graph-based - MIT OpenCourseWare

It looks like you're using an ad-blocker