Information Extraction - Open And Ontological

2y ago
30 Views
3 Downloads
1.18 MB
43 Pages
Last View : 27d ago
Last Download : 3m ago
Upload by : Josiah Pursley
Transcription

Information ExtractionLecture 10 – Ontological and Open IECIS, LMU MünchenWinter Semester 2015-2016Dr. Alexander Fraser, CIS

Administravia Suggested Klausur date is in the lastweek of the Vorlesung (the weekbefore Fasching) Klausur: February 3rd There will be a review for the Klausur onWed January 27th NEW: there is a conflict with a differentcourse, I will look into this2

Before I start on Ontological IE, twotopics I wanted to briefly talk abouttoday: Semantic Role Labeling Wikification3

Syntactic Parsing and RelationExtraction We saw in the previous two lecturesthat syntactic features are useful forrelation extraction (and eventextraction) For instance.4

Parse Features for Relation ExtractionAmerican Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner saidMention 1Mention 2 Base syntactic chunk sequence from one to the otherNPNP PP VP NP NP Constituent path through the tree from one to the otherNP NP S S NP Dependency pathAirlines matchedWagner saidSlide from D. Jurafsky

Semantic Role Labeling A generalization beyond syntacticparsing is Semantic Role Labeling (oftenabbreviated to SRL) Here the idea is to identify the argumentsto a verb So this can capture the same information as,e.g., a dependency parse It should be clear that this will be useful in IE But the difference is that the argumentsare captured in terms of their semanticfunction rather than their syntacticfunction6

Subcategorization Frame Consider the sentences: The man was bitten by the dog The dog bit the man In terms of the verb and thesubcategorized arguments, there is nodifference here In Semantic Role Labeling, these willhave the same representation! Consider also: The man was bitten.7

Semantic Role LabelingExample from Kozhevnikov and TitovList of SRL tools (see also the rison-of-semantic-role-labelers.html8

Last Word: Training Data The critical problem for statisticalapproaches is labeled training data There are two mature data sets fortraining semantic role labelers for English Framenet is the one that may be more usefulfor many IE purposes (but Propbank is alsointeresting) There has been some work on projectingthese two resources to other languagesusing machine translation techniques E.g., for German, the "Salsa" project at Uni SB9

Wikification Wikification is the problem ofautomatically annotating entities infree text with their (English) Wikipediapage Let's start with motivation.10

Wikification: The Reference ProblemBlumenthal (D) is a candidate for the U.S. Senate seat now held byChristopher Dodd (D), and he has held a commanding lead in the racesince he entered it. But the Times report has the potential tofundamentally reshape the contest in the Nutmeg State.Blumenthal (D) is a candidate for the U.S. Senate seat now held byChristopher Dodd (D), and he has held a commanding lead in the racesince he entered it. But the Times report has the potential tofundamentally reshape the contest in the Nutmeg State.11Slide from ACL 2014 Roth Tutorial

Wikification: Motivation Dealing with Ambiguity of Natural Languageo Mentions of entities and concepts could have multiple meanings Dealing with Variability of Natural Languageo A given concept could be expressed in many ways Wikification addresses these two issues in a specific way: The Reference Problemo What is meant by this concept? (WSD Grounding)o More than just co-reference (within and across documents)12Slide from ACL 2014 Roth Tutorial

Ontological IE In the last two lectures, we discussed how toextract relations and events from text We looked in detail at relations expressed in asingle sentence Event extraction captures relations which areoften expressed at either the sentence or at thedocument level (i.e., in multiple sentences) Consider the CMU Seminar task – the task is to extractevents (seminars), with speaker, location, start time andend time Today we will discuss updating a knowledgebase with the extracted relations or events This is called "Ontological IE"13

OntologiesAn ontology is a consistent knowledge basewithout redundancyPersonNationalityAngela Merkel GermanMerkelGermanyA. MerkelFrenchEntityRelation EntityAngela Merkel citizenOf Germany Every entity appears only with exactly the same name There are no semantic contradictions14Slide from Suchanek

Ontological IEOntological Information Extraction (IE) aims to create orextend an ontology.EntityRelationEntityAngela Merkel citizenOf GermanyAngela Merkel is theGerman chancellor.Merkel was born inGermany.A. Merkel has Frenchnationality.PersonNationalityAngela Merkel GermanMerkelGermanyA. MerkelFrench15Slide from Suchanek

Ontological IE ChallengesChallenge 1:Map names to names that are already knownEntityRelationEntityAngela Merkel citizenOf GermanyMerkelAngieA. Merkel16Slide from Suchanek

Ontological IE ChallengesChallenge 2:Be sure to map the names to the right known namesEntityRelationEntityAngela Merkel citizenOf GermanyUna MerkelcitizenOf USA?Merkel is great!17Slide from Suchanek

Ontological IE ChallengesChallenge 3:Map to known relationshipsEntityRelationEntityAngela Merkel citizenOf Germany has nationality has citizenship is citizen of 18Slide from Suchanek

Ontological IE ChallengesChallenge 4:Take care of consistencyEntityRelationEntityAngela Merkel citizenOf Germany Angela Merkel isFrench 19Slide from Suchanek

TriplesA triple (in the sense of ontologies) is a tuple of an entity, arelation name and another entity:EntityRelationEntityAngela Merkel citizenOf GermanyMost ontological IE approaches produce triples asoutput. This decreases the variance in any20Slide from Suchanek

TriplesA triple can be represented in multiple forms:EntityRelationEntityAngela Merkel citizenOf Germany citizenOf Angela Merkel, citizenOf, Germany 21Slide from Suchanek

YAGOExample: Elvis in YAGO22Slide from Suchanek

Let's talk about ontological IE usingextraction from Wikipedia as anexample Then we will go on to open IE, whichuses similar ideas to extract from all thetext on the web!23

WikipediaWikipedia is a free online encyclopedia 3.4 million articles in English 16 million articles in dozens of languagesWhy is Wikipedia good for information extraction? It is a huge, but homogenous resource(more homogenous than the Web) It is considered authoritative(more authoritative than a random Web page) It is well-structured with infoboxes and categories It provides a wealth of meta information(inter article links, inter language links, user discussion,.)24Slide from Suchanek

Ontological IE from WikipediaWikipedia is a free online encyclopedia 3.4 million articles in English 16 million articles in dozens of languagesEvery article is (should be) unique We get a set of unique entitiesthat cover numerous areas of interestAngela MerkelGermanyUna MerkelTheory of Relativity25Slide from Suchanek

Wikipedia SourceExample: Elvis on Wikipedia Birth name Elvis Aaron Presley Born {{Birth date 1935 1 8}} br / [[Tupelo, Mississippi Tupelo]]26Slide from Suchanek

IE from WikipediabornOnDate 1935(hello regexes!)Elvis PresleyBlah blah blubfasel (do notread this, betterlisten to the talk)blah blah Elvisblub (you are stillreading this) blahElvis blah blublater becameastronaut blah Infobox Born: 1935.born1935Exploit InfoboxesCategories: Rock singers27Slide from Suchanek

IE from WikipediaElvis PresleyBlah blah blubfasel (do notread this, betterlisten to the talk)blah blah Elvisblub (you are stillreading this) blahElvis blah blublater becameastronaut blah Infobox Born: 1935.Categories: Rock singersRock Singertypeborn1935Exploit InfoboxesExploit conceptual categories28Slide from Suchanek

Consistency istRock Singertype1977diedInPlaceborn1935Check uniqueness of functional argumentsCheck domains and ranges of relationsCheck type coherence30Slide from Suchanek

Ontological IE from WikipediaYAGO 3m entities, 28m facts focus on precision95%(automatic checking of facts)http://yago-knowledge.orgDBpedia 3.4m entities 1b facts (also from non-English Wikipedia) large communityhttp://dbpedia.orgCommunity project on top of Wikipedia(bought by Google, but still open)http://freebase.com31--- Now integrated into Wikidata!!!Slide modified from Suchanek

Ontological IE by Reasoningborn1935Elvis was born in 1935Recap: The challenges:died in, was killed in deliver canonic relations deliver canonic entitiesElvis, Elvis Presley, The King deliver consistent factsborn (Elvis, 1970)born (Elvis, 1935)Idea: These problems are interleaved,solve all of them together.32Slide from Suchanek

Using ReasoningOntologyFirst Order Logictype(Elvis Elvis was born in 1935ConsistencyRulesbirthdate deathdateappears(“Elvis”,”was born in”,”1935”).means(“Elvis”,Elvis Presley,0.8)means(“Elvis”,Elvis Costello,0.2).born(X,Y) & died(X,Z) Y Zappears(A,P,B) & R(A,B) expresses(P,R)appears(A,P,B) & expresses(P,R) R(A,B).born1935SOFIEsystemSlide from Suchanek

Ontological IE by ReasoningReasoning-based approaches use logical rulesto extract knowledge from natural language documents.Current approaches use either Weighted MAX SAT or Datalog or Markov LogicInput: often an ontology manually designed rulesCondition: homogeneous corpus helps34Slide from Suchanek

Ontological IE SummaryOntological Information Extraction (IE) tries tocreate or extend an ontology throughinformation extraction.nationalityCurrent hot approaches: extraction from Wikipedia reasoning-based approaches integrating uncertainty35Slide modified from Suchanek

Open Information ExtractionOpen Information Extraction/Machine Readingaims at information extraction from the entire Web.Vision of Open Information Extraction: the system runs perpetually, constantly gatheringnew information the system creates meaning on its ownfrom the gathered data the system learns and becomes more intelligent,i.e. better at gathering information36Slide from Suchanek

Open Information ExtractionOpen Information Extraction/Machine Readingaims at information extraction from the entire Web.Rationale for Open Information Extraction: We do not need to care for every single sentence,but just for the ones we understand The size of the Web generates redundancy The size of the Web can generate synergies37

KnowItAll &CoKnowItAll, KnowItNow and TextRunner are projectsat the University of Washington (in Seattle, WA).SubjectEgyptiansVerb ObjectCountbuilt pyramids 400Americans built.pyramids 20.Valuablecommon senseknowledge(if trunner/38Slide from Suchanek

KnowItAll r/39Slide from Suchanek

Read the Web“Read the Web” is a project at theCarnegie Mellon University in Pittsburgh, PA.Initial OntologyNatural LanguagePattern ExtractorKrzewski coachesthe Blue Devils.Table ExtractorKrzewski Blue AngelsMillerRed AngelsMutual exclusionsports coach ! scientistType CheckIf I coach, am I a coach?http://rtw.ml.cmu.edu/rtw/Slide from Suchanek

Open IE: Read the Webhttp://rtw.ml.cmu.edu/rtw/Slide from Suchanek

Open Information ExtractionOpen Information Extraction/Machine Readingaims at information extraction from the entire Web.Main hot projects TextRunner (University of Washington) Read the Web (Carnegie Mellon) Prospera/SOFIE (Max-Planck Informatics Saarbrücken)Input The Web Read the Web: Manual rules Read the Web: initial ontologyConditions none42Slide modified from Suchanek

Slide sources– Many of the slides today on Ontological IE andOpen IE are from Fabian Suchanek (TélécomParisTech)– See the web page I mentioned for a list ofsemantic role labelers– Some of the Wikification slides are from DanRoth's tutorial, this is highly recommended43

Thank you for your attention!44

Elvis, Elvis Presley, The King born (Elvis, 1970) born (Elvis, 1935) Ontological IE by Reasoning Idea: These problems are interleaved, solve all of them together. Elvis was born in 1935 Slide from Suchanek . Ontology Documents Elvis was born in 1935 means(“

Related Documents:

Advance Extraction Techniques - Microwave assisted Extraction (MAE), Ultra sonication assisted Extraction (UAE), Supercritical Fluid Extraction (SFE), Soxhlet Extraction, Soxtec Extraction, Pressurized Fluid Extraction (PFE) or Accelerated Solvent Extraction (ASE), Shake Flask Extraction and Matrix Solid Phase Dispersion (MSPD) [4]. 2.

Anselm’s argument: stage 2 7 Descartes’ ontological argument 9 The two stages of the argument: a summary 11 Kant’s criticism of the ontological argument (first stage) 11 Kant’s criticism of the ontological argument (second stage) 16 The ontological argument revisited: Findlay and Malcolm 19 Karl Barth: a theological interpretation 25

COUNTY Archery Season Firearms Season Muzzleloader Season Lands Open Sept. 13 Sept.20 Sept. 27 Oct. 4 Oct. 11 Oct. 18 Oct. 25 Nov. 1 Nov. 8 Nov. 15 Nov. 22 Jan. 3 Jan. 10 Jan. 17 Jan. 24 Nov. 15 (jJr. Hunt) Nov. 29 Dec. 6 Jan. 10 Dec. 20 Dec. 27 ALLEGANY Open Open Open Open Open Open Open Open Open Open Open Open Open Open Open Open Open Open .

Licensing the ENVI DEM Extraction Module DEM Extraction User's Guide Licensing the ENVI DEM Extraction Module The DEM Extraction Module is automatically installed when you install ENVI. However, to use the DEM Extraction Module, your ENVI licen se must include a feature that allows access to this module. If you do not have an ENVI license .

follows here is a brief overview of how flowsheet data are used in pinch analysis. Data extraction is covered in more depth in "Data Extraction Principles" in section 10. 3.1 Data Extraction Flowsheet Data extraction relates to the extraction of information required for Pinch Analysis from a given process heat and material balance.

(Yang et al., 2007), extraction of major catechin and caffeine from green tea using different solvents (Perva-Uzunalić et al., 2006), solvent extraction of catechin from Korean tea (Row and Jin 2006), extraction of bioactive compounds from green tea using aqueous extraction (Komes et al., 2010). In addition, comparison of the hot and cold .

All in all, the DNA extraction labs are very workable. Try some and then decide if you would like to modify any to fit your needs better. Good luck!! Onion DNA Extraction Wheat Germ DNA Extraction Lima Bean Bacteria DNA Extraction Yeast DNA Extraction Thymus DNA

defines adventure tourism as a trip that includes at least two of the following three elements: physical activity, natural environment and cultural immersion. It’s wild and it’s mild The survey also asked respondents to define the most adventurous activity undertaken on holiday. For some people, simply going overseas was their greatest adventure whilst others mentioned camping in the .