Ontology Learning (from Text!) - Tilburg University

1y ago
13 Views
2 Downloads
992.24 KB
20 Pages
Last View : 8d ago
Last Download : 3m ago
Upload by : Kairi Hasson
Transcription

OutlineOntology Learning(from text!) Definitions and description Machine Learning and Natural LanguageProcessing for Ontology LearningMarie-Laure Reinbergermarielaure.reinberger@ua.ac.beCNTS Ontology Building ApplicationsApril 28, 052What’s (an) ontology?Part IDefinitions and descriptionApril 28, 05 Branch of philosophy which studies thenature and the organization of reality Structure that represents a domainknowledge (the meaning of the terms andthe relations between them) to provide to acommunity of users a common vocabularyon which they would agree3April 28, 05What about: Thesauri – Semantic lexicons –Semantic networks ?Thesaurus: example Roget: thesaurus of English words and phrases- groups words in synonym categories or concepts Thesauri: standard set of relations betweenwords or terms Semantic lexicons: lexical semanticrelations between words or more complexlexical items Semantic networks: broader set of relationsbetween objects Sample categorization for the concept “Feeling”:AFFECTIONS IN GENERALAffectionsFeelingwarmth, glow, unction, vehemence;fervor, fervency;heartiness, cordiality;earnestness, eagerness;empressment, gush, ardor, zeal, passion. Differ in the type of objects and relationsApril 28, 0545April 28, 0561

Thesaurus: exampleSemantic lexicon: example WordNet: set of semantic classes (synsets) {board, plank}, {board, committee} treewoody plant ligneous plantvascular plant tracheophyteplant flora plant lifelife form organism being living thingentity something tree tree diagram abstraction MeSH (Medical Subject Headings)- provides for each term term variants thatrefer to the same concept MH gene librarybank, genebanks, geneDNA librariesgene banksgene librarieslibraries, DNAlibraries, genelibrary, DNAlibrary, geneApril 28, 057April 28, 05Semantic network: example8Semantic Network: example UMLS: Unified Medical Language System Metathesaurus: groups term variants thatcorrespond to the same concept UMLS: Unified Medical Language System Semantic Network: organises all concepts of themetathesaurus into semantic types and relations ( 2semantic types can be linked by several relations):HIVHTLV-IIIHuman Immunodeficiency Virus.pharmacologic substance affects pathologic functionpharmacologic substance causes pathologic functionpharmacologic substance prevents pathologic function.April 28, 059April 28, 05Semantic Network: example10So, what’s an ontology? CYC: contains common sense knowledge:trees are outdoorspeople who died stop buying things # mother :(# mother ANIM FEM)isa: # FamilyRelationSlot # BinaryPredicate Ontologies are defined as a formalspecification of a shared conceptualizationBorst, 97 An ontology is a formal theory thatconstrains the possible conceptualizationsof the worldGuarino, 98See: ontoweb-lt.dfki.deApril 28, 0511April 28, 05122

What an ontology is (maybe) Why ontologies? Information retrieval Word Sense Disambiguation Automatic Translation Topic detection Text summarization Indexing Question answering Query improvement Enhance Text MiningCommunity agreementRelations between termsPragmatic informationCommon sense knowledgeMeaning of concepts vs. words: explorelanguage more deeplyApril 28, 0513April 28, 05Problem: building an ontologyWhat can be used? Efficiency of the engineering– Time– Difficulty of the task: ambiguity, completeness Agreement of the communityApril 28, 0515 TextsExisting ontologies or core ontologiesDictionaries, encyclopediaeExpertsMachine Learning and Natural LanguageProcessing toolsApril 28, 05What kind of ontology? 16Supervised/unsupervised One extreme: from scratch Other extreme: manual building Using a core ontology, structured data More or less domain specificSupervised/unsupervisedInformal/formalFor what purpose? determines the granularity, the material,the resources April 28, 0514 Different strategies Different tools Advantages and inconveniences17April 28, 05183

Operations on ontologiesComponents Extraction: building of an ontology Pruning: removing what is out of focus; danger:keep the coherence Refinement: fine tuning the target (e.g.considering user requirements) Merging: mixing of 2 or more similar oroverlapping source ontologies Alignment: establishing links between 2 sourceontologies to allow them to share information Evaluation: task-based, necessity of a benchmark! April 28, 0519 Classes of words and concepts Relations between concepts Axioms defining different kind ofconstraints Instances that can represent specificelementsApril 28, 05RelationsRelations Taxonomichypernym (is a)car vehiclehyponymfruit lemonevents to superordinatefly travelevents to subtypeswalk strollApril 28, 05 MeronymicFrom group to membersteam goalkeepercopilot crewFrom parts to wholesbook coverwheels carFrom events to subeventssnore sleep21April 28, 05Relations22Relations Thematic roles Thematic rolesagent: causer of an event“the burglar” broke the windowexperiencer (of an event)“the woman” suffers injuries from the caraccidentforce: non voluntary causer of an event“the earthquake” destroyed several buildingstheme: participant most directly affected by aneventthe burglar broke “the door”April 28, 0520instrument (used in an event)I’ve eventually forced the lock “with ascrewdriver”source: origin of an object of a transfer eventhe’s coming “from Norway”beneficiary (of an event)she’s knitting socks “for her grandchildren”23April 28, 05244

Relations Thematic roles can be augmented by the notion ofsemantic restrictions Selectional restrictions: semantic constraintimposed by a lexeme on the concepts that can fillthe various arguments roles associated with itPart IIText Mining and Natural LanguageProcessing for ontology extraction fromtext– “I wanna eat some place that’s close to the cinema.”“I wanna eat some spicy food.”– “Which airlines serve Denver?”“Which airlines serve vegetarian meals?”April 28, 0525April 28, 05TM and NLP for ontology extraction fromtext26Lexical acquisition lexical information extractioncollocations syntactic analysisn-grams semantic information extractionApril 28, 0527April 28, 0528CollocationsMutual information A collocation is an expression consisting oftwo or more words that correspond to someconventional way of saying thingsI(x,y) log[f(x,y)/(f(x)*f(y)] Technique: count occurrences, rely onfrequencies (pb with sparse data)April 28, 05 extract multiwords units group similar collocates or words toidentify different meanings of a word– bank river– bank investment29April 28, 05305

High similarity?So Mutual information shows somedissimilarity between “strong” and“powerful”, but how can we measure thatdissimilarity?strong tea vs.* powerful tea Strong powerful I(strong, tea) I(powerful, tea) I(strong, car) I(powerful, car) T-testApril 28, 0531April 28, 0532Mutual informationT-test Measure of dissimilarity Used to differentiate close words (x and y) For a set of words, the t-test compares foreach word w from this set the probability ofhaving x followed by w to the probabilityof having y followed by 323119842169powerfulminorityI(x,y) log[f(x,y)/(f(x)*f(y)]April 28, 0533April 28, ,668,588,358,32April 28, 10strong104334Statistical inference: believercurrentswlegacytoolstormsminority35 Consists of taking some data and making some inferencesabout their distribution: counting words in corpora Example: the n-grams model The assumption that the probability of a word dependsonly on the previous word is a Markov assumption. Markov models are the class of probabilistic models thatassume that we can predict the probability of some futureunit without looking too far into the past– A bigram is a first-order Markov model– A trigram is a second-order Markov model– April 28, 05366

Problems Example “eat” is followed by: on, some, lunch,dinner, at, Indian, today, Thai, breakfast, in,Chinese, Mexican, tomorrow, dessert,British “restaurant” is preceded by: Chinese,Mexican, French, Thai, Indian, open, the, a Intersection: Chinese, Mexican,Thai, IndianWordform / lemmaCapitalized tokensSparse dataDeal with huge collections of textsApril 28, 0537April 28, 05TM and NLP for ontology extraction fromtextTechnique: parsing lexical information syntactic analysis semantic information extractionApril 28, 0539Part Of Speech taggingChunkingSpecific relationsUnsupervised?Shallow?Efficiency? (resources, processing time)April 28, 05Example: Shallow Parser40Chunker output Tokenizer outputThe patients followed a ‘ healthy ‘ diet and20% took a high level of physical exercise. Tagger outputThe/DT patients/NNS followed/VBD a/DT‘/” healthy/JJ ‘/” diet/NN and/CC 20/CD%/NN took/VBD a/DT high/JJ level/NNof/IN physical/JJ exercise/NN . /.April 28, 053841[NP The/DT patients/NNS NP][VP followed/VBD VP][NP a/DT ‘/” healthy/JJ ‘/” diet/NN NP]and/CC [NP 20/CD %/NN NP][VP took/VBD VP][NP a/DT high/JJ level/NN NP]{PNP [Prep of/IN Prep] [NP physical/JJexercise/NN NP] PNP} . /.April 28, 05427

TM and NLP for ontology extraction fromtextTechniques Selectional restrictions lexical information Semantic similarity syntactic analysis Clustering semantic information extractionApril 28, 05 Pattern matching43April 28, 0544Selectional preferences or restrictionsSemantic similarity The syntactic structure of an expressionprovides relevant information about thesemantic content of that expression Most verbs prefer arguments of a particulartypedisease prevented by immunizationinfection prevented by vaccinationhypothermia prevented by warm clothes Automatically acquiring a relative measureof how similar a new word is to knownwords (or how dissimilar) is much easierthan determining its meaning.April 28, 0545 Vector space measures: vector similarity Add probabilistic measures: refinementApril 28, 05More statistical measuresStatistical measures Frequency measure:F(c,v) f(c,v) / f(c) f(v) Resnik: R(c,v) P(c v) * SR(v)withSR(v) {P(c v) * log[P(c v)/ P(c)]}selectional preference strength focus on the verb Standard Probability measure:P(c v) f(c,v) / f(v) Jaccard: J(c,v) log2 P(c v) * log2 f(c)/ # c ctxwith# c ctx number of contexts of appearance Hindle Mutual Information measure:H(c,v) log{P(c,v) / [P(v)*P(c)]} focus on the verb-object cooccurrenceApril 28, 0546for the compound c focus on the nominal string47April 28, 05488

ClusteringSemantic dissimilarity: Contrastive corpus Used to discard Unsupervised method that consists ofpartitioning a set of objects into groups orclusters, depending on the similaritybetween those objects Clustering is a way of learning bygeneralizing.– general terms– unfocused domain terms Wall Street Journal vs. Medical corpusApril 28, 0549April 28, 0550ClusteringTypes of clustering Generalizing: assumption that anenvironment that is correct for one memberof the cluster is also correct for the othermembers of the cluster Example: preposition to use with “Friday” ?1.Existence of a cluster “ Monday, Sunday,Friday”2. Presence of the expression “on Monday”3. Choice of the preposition “on” for“Friday” Hierarchical: each node stands for a subclass ofits mother’s node; the leaves of the tree are thesingle objects of the clustered sets Non hierarchical or flat: relations betweenclusters are often undetermined Hard assignment: each object is assigned to oneand only one cluster Soft assignment allows degrees of membershipand membership in multiple clusters (uncertainty) Disjunctive clustering: “true” multiple assignmentApril 28, 0551April 28, 05HierarchicalExample bottom-up Bottom-up (agglomerative): starting witheach objet as a cluster and grouping themost similar ones Three of the 10000 clusters found by Brown et al,(1992), using a bigram model and a clusteringalgorithm that decreases perplexity:- plan, letter, request, memo, case, question,charge, statement, draft- day, year, week, month, quarter, half- evaluation, assessment, analysis, understanding,opinion, conversation, discussion Top-down (divisive clustering): all objectsare put in one cluster and the cluster isdivided into smaller clusters (use ofdissimilarity measures)April 28, 055253April 28, 05549

Non hierarchicalExamples AutoClass (Minimum Description Length):the measure of goodness captures both howwell the objects fit into the clusters andhow many clusters there are. A highnumber of clusters is penalized. EM alorithm K-means Often starts with a partition based onrandomly selected seeds (one seed percluster) and then refine this initial partition Several passes are often necessary. Whento stop? You need to have a measure ofgoodness and you go on as long as thismeasure is increasing enoughApril 28, 0555April 28, 05Pattern matching / Association rulesSrikant and Agrawal algorithmPattern matching consists of findingpatterns in texts that induce a relationbetween words, and generalizing thesepatterns to build relations between conceptsApril 28, 0557This algorithm computes association rulesXk Yk, such that measures for supportand confidence exceed user-definedthresholds.Support of a rule Xk Yk is thepercentage of transactions that contain XkU Yk as a subsetConfidence is defined as the percentage oftransactions that Yk is seen when Xkappears in a transaction.April 28, 05Example58References Manning and Schutze, “Foundations ofStatistical natural Language Processing” Mitchell, “Machine Learning” Jurafsky and Martin, “Speech andLanguage Processing” Church et al., “Using Statistics in LexicalAnalysis”. In Lexical Acquisition (ed. UriZernik) Finding associations that occur betweenitems, e.g. supermarket products, in a set oftransactions, e.g. customers’ purchases. Generalization:“snacks are purchased with drinks” is ageneralization of“chips are purchased with bier” or“peanuts are purchased with soda”April 28, 055659April 28, 056010

Part III: Ontology Building Systems1. Text To Onto1. TextToOnto (AIFB, Karlsruhe)2. CORPORUM-OntoBuilder(Ontoknowledge project)3. OntoLearn4. Mumis (European project)5. OntoBasis (CNTS)April 28, 05This system supports semi-automatic creationof ontologies by applying text miningalgorithms.61The Text-To-Onto systemApril 28, 0562Semi-automatic ontology engineering Generic core ontology used as a top level structure Domain specific concepts acquired and classifiedfrom a dictionary Shallow text processing Term frequencies retrieved from texts Pattern matching Help from an expert to remove conceptsunspecific to the domainApril 28, 0563April 28, 05Learning and discovering algorithmsLearning algorithm The term extraction algorithm extracts from texts a set ofterms that can potentially be included in the ontology asconcepts. The rules extraction algorithm extracts potentialtaxonomic and non-taxonomic relationships betweenexisting ontology concepts. Two distinct algorithms:the regular expression-based pattern matchingalgorithm mines a concept taxonomy from a dictionarythe learning algorithm for discovering generalizedassociation rules analyses the text for non-taxonomicrelations The ontology pruning algorithm extracts from a set oftexts the set of concepts that may potentially be removedfrom the ontology.April 28, 0564 Text corpus for tourist information (in German),that describes locations, accomodations,administrative information Example: Alle Zimmer sind mit TV, Telefon,Modem und Minibar ausgestattet. (All roomshave TV, telephone, modem and minibar.) Dependency relations output for that sentence:Zimmer – TV (room – television)65April 28, 056611

Example Tourist information text corpus Concepts pairs derived fromthe text:area – hotelhairdresser – hotelbalcony – accessroom – television Domain upport0.380.10.390.290.340.33 Discovered relations(area, accomodation)(area, hotel)(room, furnishing)(room, television)(accomodation, address)(restaurant, 2April 28, 0567April 28, 0568Ontology: example2. Ontoknowledge- rdfs:Class rdf:about "test:cat" rdfs:subClassOf rdf:resource "test:animal" / /rdfs:Class - rdfs:Class rdf:about "test:persian cat" rdfs:subClassOf rdf:resource "test:cat" / /rdfs:Class !-- properties of cars and cats -- - rdf:Property rdf:about "test:color" rdfs:domain rdf:resource "test:car" / rdfs:domain rdf:resource "test:cat" / /rdf:Property !-- properties between cars and cats -- - rdf:Property rdf:about "test:runs over" rdfs:domain rdf:resource "test:car" / rdfs:range rdf:resource "test:cat" / /rdf:Property n Knowledge-Management throughEvolving OntologiesApril 28, 0569April 28, 05The overall architecture and languageOntoShareRDF OntoEditOIL-CoreOMM Ontowrapper: structured documents(names, telephone numbers ) OntoExtract: unstructured documents- provide initial ontologies throughsemantic analysis of the content of webpages- refine existing ontologies (key words,clustering )LINROSesameOIL-Core ontology repositoryAnnotated Data RepositoryRDFpers05RDFtelOntoWrapperApril 28, tractΤηισ τεξτ ισαβουτ χαρσεϖεν τηουγηψου χαν’ τρεαδ ιτ71April 28, 057212

OntoWrapperOntoExtractTaking a single text or document as input,OntoExtract retrieves a document specificlight-weight ontology from it. Deals with data in “regular” pages Uses personal “extraction rules”Ontologies extracted by OntoExtract arebasically taxonomies that represent classes,subclasses and instances. Outputs instantiated schemataApril 28, 0573April 28, 05OntoExtract: How?OntoExtract: Why? concept extraction relations extraction semantic discourse representation ontology generation part of document annotations document retrieval document summarising .April 28, 0574Extraction Technology based on– tokeniser– morphologic analysis– lexical analysis– syntactic/semantic analysis– concept generation– relationships75April 28, 05OntoExtract76OntoExtract- Classes, described in the text which is analysed.- Subclasses, classes can also be defined as subclassof other classes if evidence is found that a class isindeed a subclass of another class.- Facts/instances: Class definitions do not containproperties. As properties of classes are found,they will be defined as properties of an instanceof that particular class. learning initial ontologies- propose networked structure refining ontologies- add concepts to existing onto’s- add relations “across” boundariesThe representation is based on relations betweenclasses based on semantic information extracted.April 28, 0577April 28, 057813

ExampleOntology: example rdfs:Class rdf:ID "news service" rdfs:subClassOf rdf:resource "#service"/ /rdfs:Class news service rdf:ID "news service 001" hasSomeProperty financial /hasSomeProperty /news service April 28, 0579April 28, 0580Query exampleMuseum rator.nl/sesame/actionFrameset.jsp?repository museumselect X, X, Y from {X : X} cult:paints {Y} using namespace cult http://www.icom.com/schema.rdf#select X, Z, Y from {X} rdf:type {Z}, {X} cult:paints {Y} using namespace rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# , cult http://www.icom.com/schema.rdf#select X, Y from {X : cult:Cubist } cult:paints {Y} using namespace cult http://www.icom.com/schema.rdf#select X, X, Y from {X : X} cult:last name {Y} where ( X cult:Painter andY like "P*") or ( X cult:Sculptor and not Y like "B*") using namespacecult http://www.icom.com/schema.rdf#select PAINTER, PAINTING, TECH from {PAINTER} cult:paints{PAINTING}. cult:technique {TECH} using namespace cult http://www.icom.com/schema.rdf#April 28, 0581April 28, 05Query example82OntoLearnselect PAINTER, PAINTING, TECH from {PAINTER} cult:paints{PAINTING}. cult:technique {TECH} using namespace cult http://www.icom.com/schema.rdf#Query results: PAINTER PAINTING TECHhttp://www.european-history.com/picasso.html http://www.europeanhistory.com/jpg/guernica03.jpg "oil on html http://www.museum.es/woman.qti"oil on t.htmlhttp://www.artchive.com/rembrandt/artist at his easel.jpg "oil on canvas"@enhttp://www.european- rembrandt/abraham.jpg "oil on lhttp://192.41.13.240/artchive/graphics/saturn zoom1.jpg "wall painting(oil)"@en5 results found in 323 ms.http://www.ontoknowledge.orgApril 28, 0583An infrastructure for automated ontology learningfrom domain text.April 28, 058414

Semantic interpretationOntology Integration Identifying the right senses (concepts) forcomplex domain term components and thesemantic relations between them. use of WordNet and SemCor creation of Semantic Nets use of Machine Learned Rule Base Domain concept forestApril 28, 05 from a core domain ontology or fromWordNet Applied to multiword term translationhttp://www.ontolearn.de85April 28, 054. MUMISMUMIS Use data from different media sources(documents, radio and televisionprogrammes) to build a specialised set oflexica and an ontology for the selecteddomain (soccer). Access to textual and especially acousticmaterial in the three languages English,Dutch, and GermanGoal: to develop basic technology forautomatic indexing of multimediaprogramme materialApril 28, 0587April 28, 05MUMIS88Information Extraction Natural Language Processing (InformationExtraction) Domain: soccer Developement of an ontology and a multilanguage lexica for this domain Query: "give me all goals Uwe Seeler shotby head during the last 5 minutes of agame" (formal query interface) Answer: a selection of events representedby keyframesApril 28, 0586Analyse all available textual documents (newspapers,speech transcripts, tickers, formal texts .), identifyand extract interesting entities, relations and events The relevant information is typically representedin form of predefined “templates”, which arefilled by means of Natural Language analysis IE combines here pattern matching, shallow NLPand domain knowledge Cross-document co-reference resolution89April 28, 059015

IE DATAIE Techniques & resourcesTicker24 Scholes beats Jens Jeremies wonderfully, dragging the ball around and past theBayern Munich man. He then finds Michael Owen on the right wing, but Owen's crossis poor.TV reportScholesPast JeremiesOwenApril 28, 05NewspaperOwen header pushed onto the postDeisler brought the Germansupporters to their feet with abuccaneering run down the right.Moments later Dietmar Hamannmanaged the first shot on target butit was straight at David Seaman.Mehmet Scholl should have donebetter after getting goalside of PhilNeville inside the area from JensJeremies’ astute pass but he scuffedhis shot.Formal textSchoten op doel 4 4Schoten naast doel 6 7Overtredingen2315Gele kaarten1 1Rode kaarten0 1Hoekschoppen3 5Buitenspel4 191 24 Scholes beats Jens JeremiesTokenisationwonderfully, dragging the ball around andpast the Bayern Munich man. He thenLemmatisationfinds Michael Owen on the right wing,but Owen's cross is poor.POS morphologyHe242424then finds MichaelNUMtime OwenHeScholesNamed EntitiesonthenScholesScholesScholesthe right wingPROPplayerthenfindsbeatsbeatbeatbeatVERB VPfindsShallow parsingPASSMichaelJensJens Jeremies 3pplayersingMichaelCo-reference resolution Jeremiesplayer1OwenJenswonderfull ies, Owen.PROPonTemplate filling,thewonderfull right wing ADV NPthe right wingbutdraggingdrag,PUNCT Owen's.crossNPApril 28, 0592IE subtasksTerms as descriptors and terms for NE task Named Entity task (NE): Mark into the texteach string that represents, a person,organization, or location name, or a date ortime, or a currency or percentage figure. Template Element task (TE): Extract basicinformation related to organization, person,and artifact entities, drawing evidence fromeverywhere in the text.Team: Titelverteidiger Brasilien, denrespektlosen Außenseiter SchottlandTrainer: Schottlands Trainer Brown, KapitänHendry seinen Keeper LeightonTime: in der 73. Minute, nach gerade einmal3:50 Minuten, von Roberto Carlos (16.),nach einer knappen halben Stunde,April 28, 05April 28, 0593IE subtasksIE subtasks Template Relation task (TR): Extractrelational information on employee of,manufacture of, location of relations etc.(TR expresses domain-independentrelationships).Opponents: Brasilien besiegt Schottland,feierte der Top-FavoritTrainer of: Schottlands Trainer BrownApril 28, 0594 Scenario Template task (ST): Extract prespecified event information and relate the eventinformation to particular organization, person, orartifact entities (ST identifies domain and taskspecific entities and relations).Foul: als er den durchlaufenden Gallacher imStrafraum allzu energisch am Trikot zogSubstitution: und mußte in der 59. Minute fürCrespo Platz machen.95April 28, 059616

Off-line TaskIE subtasksNewspaper Co-reference task (CO): Captureinformation on co-referring expressions,i.e. all mentions of a given entity, includingthose marked in NE and TE.NewspaperNewspaperNewspaperTextTextTextTexts3 LanguagesRadioCommentingCommentingRadioRadio CommentingAudio Commenting(TV, Radio)3LanguagesLanguages33 Languages3 Languagesmultilingual IE event tablesMerging ofAnnotationsEvent goalType FreekickPlayer BaslerDist. 25 mTime 17Score: leadingEvent goalPlayer BaslerDist. 25 mTime 18Score 1:0 Freekick 17 minEvents indexed in videorecordingApril 28, 0597April 28, 05 Foul Neville Basler 25 mOn-line taskEvent goalType FreekickPlayer BaslerTeam GermanyTime 18Score 1:0Final score 1:0Distance 25 mEvent goalPlayer BaslerTeam GermanyTime 18Score 1:0Finalscore 1:0 Goal 18 min 1:0 Freekick Pass 24 min Basler Matthäus 25 m 60 m Defense 28min Dribbling Campbell Scholl98On-line taskKnowledge GuidedUser Interface&Search Engine Searching and Displaying Search for interesting events with formal queriesGive me all goals from Overmars shot with his head in 1. Half.Event Goal; Player Overmars; Time 45; Previous-Event Headball Indicate hits by thumbnails & let user select sceneMünchen - Ajax1998München - Porto1996 Play scene via the Internet & allow scrolling etc User Guidance (Lexica and Ontology)April 28, ext3 Languages Freekick Goal Pass Defense 17 min 18 min 1:0 24 min 28min Foul Neville Basler Freekick BaslerPlayMovie DribblingFragment Matthäusof that Game Campbell 25 m 25 m 60 m SchollDeutschland Brasilien1998Prototype Demo99April 28, 05100Unsupervised learning5. OntoBasisraw text shallow parserparsed text patternmatchingrelations statisticsrelevant relations evaluationinitiation of an ontology[NP1Subject The/DT Sarsen/NNS Circle/NNPNP1Subject] [VP1 is/VBZ VP1] Elaboration and adaptation of semanticknowledge extraction tools for the buildingof specific domain ontologyApril 28, 05101mutation in genecatalytic subunit ofDNA polymeraseApril 28, 0510217

MaterialStonehenge corpus Stonehenge corpus, 4K words, rewritten Description of the megalithic ruinThe trilithons are ten upright stonesThe Sarsen heel stone is 16 feet high.The bluestones are arranged into a horseshoeshape inside the trilithon horseshoe. Extraction of semantic relations usingpattern matching and statistical measures Focus on “part of” and spatial relations,dimensions, positions April 28, 05103April 28, 05Syntactic analysis104Pattern matchingThe Sarsen Circle is about 108 feet in diameter . Selection of the syntactic structuresNominal String – Preposition – Nominal StringNs-Prep-Ns[a Ns is a string of adjectives and nouns, endingup with the head noun of the noun phrase]The/DT Sarsen/NNS Circle/NNP is/VBZ about/IN108/DT feet/NNS in/IN diameter/NN ./.[NP The/DT Sarsen/NNS Circle/NNP NP][VP is/VBZVP][NP about/IN 108/DT feet/NNS NP][PP in/IN PP] [NP diameter/NN NP] ./.[NP1Subject The/DT Sarsen/NNS Circle/NNP NP1Subject][VP1 is/VBZ VP1]Edman degradation of intact proteinbeta-oxidation of fatty acid56 Aubrey hole inside circle[NP about/IN 108/DT feet/NNS NP]{PNP [PP in/IN PP] [NP diameter/NN NP] PNP} ./.April 28, 05105April 28, 05SelectionPattern matching Nominal Strings filtering using a statisticalmeasure: the measure is high when theprepositional structure is coherent We select the N most relevant structures#NS1April 28, 05 Syntactic structures Subject-Verb-Direct Objector “lexons”amino acid sequence show Bacillus subtilisnucleotide sequencing reveal heterozygosityAubrey Holes are inside April 28, 0510818

ExamplesCombination “part of” basic relationsbottom of stoneshape of stoneblock of sandstone We consider the N prepositional structureswith the highest rate selected previously spatial relations We elect the structures Sub-Vb-Obj where theSubject and the Object both appear amongthose N structures disposition of the stonesApril 28, 05109ring of bluestonescenter of circlesandstone on Marlborough DownsPreseli Mountain in PembrokeshireBluestone circle outside Trilithon horseshoeBluestone circle inside Sarsen CircleBluestone circle is added outside Trilithon horseshoeSlaughter Stone is made of sarsen100 foot diameter circle of 30 sarsen stoneApril 28, 05110Correct relations we didn’t useWrong relationsAltar Stone is in frontHeel stone leans of verticalSarsen block are 1.4 metreStonehenge is of 35 footheel ston

Processing for ontology extraction from text April 28, 05 27 TM and NLP for ontology extraction from text lexical information extraction syntactic analysis semantic information extraction April 28, 05 28 Lexical acquisition collocations n-grams April 28, 05 29 Collocations A collocation is an expression consisting of

Related Documents:

Text text text Text text text Text text text Text text text Text text text Text text text Text text text Text text text Text text text Text text text Text text text

community-driven ontology matching and an overview of the M-Gov framework. 2.1 Collaborative ontology engineering . Ontology engineering refers to the study of the activities related to the ontology de-velopment, the ontology life cycle, and tools and technologies for building the ontol-ogies [6]. In the situation of a collaborative ontology .

method in map-reduce framework based on the struc-ture of ontologies and alignment of entities between ontologies. Definition 1 (Ontology Graph): An ontology graph is a directed, cyclic graph G V;E , where V include all the entities of an ontology and E is a set of all properties between entities. Definition 2 (Ontology Vocabulary): The .

To enable reuse of domain knowledge . Ontologies Databases Declare structure Knowledge bases Software agents Problem-solving methods Domain-independent applications Provide domain description. Outline What is an ontology? Why develop an ontology? Step-By-Step: Developing an ontology Underwater ? What to look out for. What Is "Ontology .

A Framework for Ontology-Driven Similarity Measuring Using Vector Learning Tricks Mengxiang Chen, Beixiong Liu, Desheng Zeng and Wei Gao, Abstract—Ontology learning problem has raised much atten-tion in semantic structure expression and information retrieval. As a powerful tool, ontology is evenly employed in various

and Theology. University College Tilburg provides the interdisciplinary Bachelor’s program of Liberal Arts and Sciences. TIAS School of Business and Society is the business school of Tilburg University and Eindhoven University of Technology, offering post-graduate education. According to the three most

Tilburg Wolves Defensive Playbook 2016 - 2017 Tilburg Wolves intend to start their first competitive season in 2017. After over a year of preparation we will measure ourselves with other teams in the league. Some players will have more experience than others, but the fact is that a football

The American Revolution This French snuffbox pictures (left to right) Voltaire, Rousseau, and colonial states-man Benjamin Franklin. Enlightenment and Revolution641 Americans Win Independence In 1754, war erupted on the North American continent between the English and the French. As you recall, the French had also colonized parts of North America through-out the 1600s and 1700s. The conflict .