Optimized Ontology-Driven Query Expansion Using Map-Reduce .

3y ago
39 Views
2 Downloads
412.17 KB
14 Pages
Last View : 3d ago
Last Download : 3m ago
Upload by : Elise Ammons
Transcription

Optimized Ontology-Driven Query Expansion UsingMap-Reduce Framework to Facilitate Federated QueriesTechnical Report UTDCS-30-11The University of Texas at DallasDepartment of Computer ScienceNovember 2011Neda Alipanah, Latifur Khan and Bhavani Thuraisingham

1Optimized Ontology-driven Query Expansionusing Map-Reduce Framework to FacilitateFederated QueriesNeda Alipanah, Latifur Khan, Bhavani ThuraisinghamDepartment of Computer ScienceUniversity of Texas at Dallas800 West Campbell Road, RichardsonTX 75080, USAEmail: neda.alipanah, lkhan, bhavani.thuraisingham@utdallas.eduFAbstract—In view of the need for a highly distributed and federatedarchitecture, a robust query expansion has great impact on theperformance of information retrieval in a specific domain. We aimto determine ontology-driven query expansion terms using differentweighting techniques to determine the most k-top relevant terms. Forthis, first we consider each individual ontology and user query keywords to determine the Basic Expansion Terms (BET). Second, wespecify New Expansion Terms (NET) by Ontology Alignment (OA).Third, we use a Map-Reduce distributed algorithm for calculating allthe shortest paths in ontology graph as a meta data to calculateweights for terms BETN ET . Fourth, we actually weightexpanded terms using a combination of semantic metrics namelyDensity Measure (DM), Betweenness Measure (BM), and SemanticSimilarity Measure (SSM). Map/Reduce algorithm improves the efficiency of BET calculation especifically for BM and SSM calculationusing the benefits of parallel processing. Finally, we use a SpecificInterval(SI) to determine a set of Robust Expansion Terms (RET)and compare the result of our novel weighting approach with existingexpansion approaches. We also show the effectiveness of our robustexpansion in federated architecture.Index Terms—Federated Query; Map-Reduce; Query Expansion;Ontology Matching; Expansion Terms.1I NTRODUCTIONThe internet continues to point out the benefits ofhighly distributed, federated architecture. Since webapplications use distributed data sources in such architecture, they are facing accuracy problem in information retrieval. That is, different data sources(ontologies) have different design and it is requiredto enrich the original user query and cover the gapbetween the user query and required informationby query expansion. The goal of many researchersis to discriminate between different expansion terms[?] and improve the robustness of query expansion.Query expansion calculation could be an expensiveoperation especifically on large scale data sources(ontologies). Parallel and distributed processing is thebest choice to improve the performance and efficiencyfor query expansion by breaking the operations in todifferent parts that can be calculated concurrently [?].For example, consider several bibliography ontologieswhich present description of publication like Author,Organization, Product, Event and Topic. They includeseveral classes and their relationships between them.When a user searches for publications and AcademicStaff, it is desired to retrieve the URIs for differentkinds of publications (i.e. Article, Book, Proceedings)and the URIs for all Academic Staff. While this knowledge stores in different ontologies. Different queryexpansion approaches use dependent or independentknowledge models [4] for their expansion purpose.In the dependent knowledge model, the system usesterm co-occurrence, concept node structure or distribution analysis of terms in collections. In the independent knowledge model, the system uses structuredknowledge such as domain ontologies or generalontologies (i.e. WordNet) [?],[?]. In our bibliographyexample, several expansion terms can result such asThesis, Tech Report, Soft Copy, Association, Collectionand Writer. The major problem with different available expansion methods is that they may worsenthe query result in some cases [?]. Thus, the goal ofmany researchers is to discriminate between differentexpansion terms and improve the robustness of queryexpansion. In our previous paper [?], we proposeda novel weighting mechanism for ontology-drivenquery expansion calling the Basic Expansion Terms(BET) and New Expansion Terms (NET). For each userquery, BET is calculated in each ontology based onsome metrics namely semantic similarity, density andbetweenness. NET is determined by aligning ontologies to find robust expansion terms between differentontologies. Similar entities in NET are determined bytheir name and structural similarity in each ontology.

2BET metrics are defined using the shortest pathscalculation in ontology graphs. The problem of finding the shortest path that goes through every entityin ontology graph is an expensive operation. Therefore, in this paper, we concentrate on a Map-Reducealgorithm for BET calculation in each ontology [?].We calculated all the possible shortest paths in theontologies using an iterative Map-Reduce algorithmand use the result of the Map-Reduce operation inour BET calculation.At the end of expansion, we generate a set of termsalong with weight as a query vector. We use a SpecificInterval (SI) to find the set of Robust Expansion Terms(RET) with the largest weights. Finally, we use Blackbook [?] to facilitate federated queries and considera number of data sources. Blackbook exploits luceneindexing to find the matching of the query vectorbased on cosine similarities.The contribution of our paper is as follows: Determining BETN ET using the structure ofindividual ontologies and Common Set of Entities(CSE) across multiple ontologies. Calculating all possible shortest paths in theontology by an iterative Map-Reduce algorithmusing the adjacency matrix of ontologies. Determing BET metrics based on the Map-Reduce result for all expansion terms BET N ET . Calculating different weighting measurement forextracting Robust Expansion Terms (RET) by apredetermined Interval. Analyzing the improvement of our query expansion algorithm using serial processing andsubstituted parallel computation.The rest of this paper is organized as follows. Insection 2, we present a survey on related works. Insection 3, we present the problem statement regardingrobust query expansion. In section 4, we proposeour weighting measurements to find the BET in eachindividual ontology using Map-Reduce frameworkand then align them using the Common Set of Entities(CSE) algorithm. Further, we explain the algorithmto find the robust expansion terms. In section 5, wepresent the experimental results of our expansion.Finally in section 6, we summarize conclusions andpotential future work.2R ELATED W ORKSQuery expansion is one of the major concerns in information retrieval communities. Various approaches areproposed by researchers to conduct query expansion.Some approaches focus on determining expansionterms using unstructured data (Text documents) whilethe others focus on expansion determination usingstructured data (Ontologies). Perez-Aguera et al. [?]compare and combine different methods for queryexpansions in unstructured documents. They considerco-occurrence of terms in different documents usingTanimoto, Dice and Cosine coefficients to weight expansion terms. Also, they analyze the distribution ofexpansion terms in the top ranked documents andthe whole collection of documents using KullbackLiebler Divergence. Bhogal et al. [?] review Ontologybased query expansion and explain various queryexpansion approaches including corpus dependentand independent knowledge models. Finally, theypresent different case studies on query expansionusing domain-specific and general ontologies. Theyfocus on a single ontology for query expansion anddo not consider more than one ontology or relatedissues to ontology alignment. In our work, we benefitfrom the methodologies in unstructured expansionand expand them in structured models (ontologies);namely we use a combination of the co-occurrencemethod in ontologies. That is, we define BET in eachindividual ontology using user query keywords andthen align each ontology’s BET to other ontology entities. We propose a novel weighting approach basedon the combination of entity structure analysis and theweight of semantic path between expansion terms.Alani et al. [?] rank ontologies based on their relevance to query keywords. They rank ontologies usingthe importance of their relevant vocabulary. Relevantvocabularies are defined by calculating centricity anddensity of the vocabularies. Centricity is the distanceof a vocabulary to the central level of its hierarchy anddensity is a weighted summation of the number ofits subclasses, siblings, and relations. We use the ideaof centricity and density of entities as an importantfactor in our weighting mechanism for entity structureanalysis. Also, we calculate the weight of semanticpath using an information theoretic approach to rankeach entity in the ontology.Lee et al. [?] propose an effective semantic searchusing ontologies. They consider the number of meaningful and distinguishable semantic relationships between keywords and resources to find the most relevant k-top resources. In our work, we use the ideaof calculating k-top relevant entities in each ontologyas proposed by [?], but we use it as a portion of ourweighting mechanism for weighting expansion candidates. That is, we have different weighting criteria forexpansion candidates including betweenness, density,semantic similarity and weight of semantic path.Our approach for aligning ontologies relies on anefficient algorithm for determining the Common Subset of Entities (CSE) in the semantic network of ontologies. Wang et al. [?] produce matching pairs forall nodes of two input graphs and create sorted listsfor every node. They determine the similarity of twonodes based on degree differences of nodes and theirdirect neighbors, node attribute similarity and edgeattribute similarity. However, no query expansion hasbeen addressed in this method. On the other hand,we use the same structure for matching classes fromthe semantic of two ontologies. We compute the edge

3attribute similarity by finding the cosine similarity ofedge types and first neighbors. Finally, we use a specific threshold applied to the average of all similaritiesand determine the common subset of entities.Our previous works [?], [?], [?] present ranking ontologies, query expansion and use rank of ontologiesto determine robust query expansion, respectively.In [?], we calculate the similarity of ontologies byan Entropy Based Distribution (EBD) measurementbased on commonality of overlapping entities. We useCSE algorithm to find the similar entities betweenontologies and rank them using the authority score. In[?], we determine robust expansion terms by a numberof semantic measures. We consider each individualontology and user query keywords to determine theBasic Expansion Terms (BET) based on the structure ofontology. We use Density Measure (DM), BetweennessMeasure (BM), Semantic Similarity Measure (SSM)and Weight of Semantic Path (WSP) to calculate BET.Then, we specify New Expansion Terms (NET) using the ontology alignment (OA). Further in [?], wedetermine the Robust Expansion Term (RET) using adynamic threshold. Ontology rank is used as a heuristic to determine dynamic threshold and k-top relevant terms for each user query. In our current work,we expand our previous query expansion methodin Map-Reduce framework. This work is differentfrom [?], in the sense that we do not cover anyontology ranking in this paper. Also, it is differentfrom [?] because we are using iterative Map-Reducealgorithm to benefit from parallel processing. That is,we calculate our expansion candidates concurrently.Some metrics of query expansion use shortest pathsin ontology graphs. We use the iterative Map-Reducealgorithm to calculate these metrics. In [?], we concentrate on determining the dynamic threshold to findRET in each ontology. Therefore, we used the ontologyranks to determine RET. While in this work, we onlyconcentrate on optimizing the RET calculation usingthe Map-Reduce paradigm. In the next section, thedetails of our approach are explained.3 R OBUST Q UERY E XPANSIONIn this section, we explain our robust query expansionmethod in map-reduce framework based on the structure of ontologies and alignment of entities betweenontologies.Definition 1 (Ontology Graph): An ontology graph is adirected, cyclic graph G V, E , where V includeall the entities of an ontology and E is a set of allproperties between entities.Definition 2 (Ontology Vocabulary): The ontology vocabulary of an RDF graph is all subjects and objects that are RDF URI references and defined inRDF sentences. They cover all entities of an ontologyincluding classes, object properties, data propertiesand individuals. They do not belong to the built-insprovided by Ontology Languages.In our approach, first we determine basic expansionterms on ontology graph. Second, we calculate newexpansion terms and finally we weight the robustexpansion terms.3.1 Problem StatementLet Q be a set of federated query keywords and{O1 , . . . , On } be external ontologies describing semantic information on a particular domain with different level of coverage, the goal is to use the benefitof parallel processing to calculate robust expansionterms efficiently from the vocabulary of large ontologies. That is, we use the Map-Reduce frameworkto calculate a set of weights wi (w1 , w2 , . . . , wn ),wi [0, 1] corresponding to entities (e1 , e2 , . . . , en )from vocabulary V of several ontologies. We needsome distributed algorithm for calculating wi as anindicator to include or exclude the term from expansion set to determine the k-most semantically relevantterms that improve the query retrieval. For instance,considering two different ontologies in bibliographydomains including UMBC, Karlsruhe ontologies [?]and a user searches for individuals and correspondingweb URIs for Academic Staff and Publication in federated architecture. As depicted in Figure ?, possiblequery expansion in Karlsruhe ontology is defined asPublication {Article, Book, Booklet, InBook, InCollection, InProceedings, Manual, Misc, Proceedings, Report,Technical Report, Project Report, Thesis, Master Thesis,PhD Thesis, Unpublished} and Academic Staff {FacultyMember, Lecturer}, while in UMBC ontology, possibleexpansion is specified as Publication {Article, Book,InBook, InCollection, InProceedings, Master Thesis, PhDThesis, Misc, Proceedings, Tech Reports } as shown inFigure 1 and 2.Fig. 1. Karlsruhe Bibliography Ontology.Thus, it is required to determine some weighting critera to discriminate between the related termsthat improves the query retrieval using the ontologygraph. For large ontologies, traversing between entities in the ontology graph requires to storing thewhole structure of the graph in memory, which iscomputationally expensive. Therefore, we concentrateon proposing Map-Reduce algorithm to calculate BET

4the ontology graph. The node that occurs on manyshortest paths for expanding user terms is consideredas the central keyword in each ontology [?]. Letei , ej Ok . BM(e) is the betweenness Measure ofentity e.BM(e) Σ(ei ̸ ej ̸ e)Fig. 2. UMBC Bibliography Ontology.using the structure of each individual ontology asfollows.4P ROPOSED S OLUTIONOur strategcy toward Robust Expansion Terms (RET)calculation relies on extracting the top-k candidatelist as alternative terms to user query terms fromdifferent ontologies. RET covers both Basic ExpansionTerms (BET) and Next Expansion Terms (NET). BETis detetmined by a Map-Reduce algorithm based onthe structure of ontology, while NET is determined byontology alignment between different ontologies.4.1 Basic Query Expansion using Map-ReduceFrameworkIn this section, we propose our novel Map-Reducealgorithm for calculating the BET in individualontology. For this, first we find the matching classesto query terms within each ontology. Second,we determine the central class using map-reduceBetweenness Measure (BM). Third, we expandthose matching terms using the concept of SemanticSimilarity Measure (SSM) and Density Measure (DM)as discussed below.Density Measure (DM)DM presents some details of each query keywordin an ontology graph including the correspondingproperties and neighbors. It is limited to first relations[?]. Suppose qj Q, c Oi , and c N ameM atch(qj )in Oi . BET includes all classes that have a relation withc that is {c1 , c2 ,. . . ,cm }. DM(c) is the Density Measureof class c, which represents the number of possibleexpansions for c. Let RT Set of all relations types(properties) for c including subclasses, superclasses,and relations.DM(c) Σall relation types wk RTk (1)wk is the weight factor for different relation types.The weight factor enables us to emphasize somespecial relation types like subclasses.Betweenness Measure (BM) and Central EntityBetweenness (BM) calculate the number of shortest paths that pass through each expansion term in(shortestpath(ei ,ej )passinge)(shortestpath(ei ,ej ))(2)BM determines the central keyword for SSMcalculation. Central keyword has the highest BMvalue.Semantic Similarity Measurement (SSM)Ontology graph is used as semantic presentation ofdomain to calculate SSM weights for each expansionterm. Shortest distance from every expansion term tocentral entity is a representative of semantic similarityfor expansion terms. If any entity is positioned relatively far from the central, then it has smaller weight.Let entities ej , c Oi and there is a path between c(central) and ej .{SSM (c, ej ) c ̸ ej1 c ej1length(minpath(c,ej ))(3)}BET in this method, is all entities in the shortest-pathfrom central keyword to ej .4.1.1 Shortest Path Calculation AlgorithmsTo find the shortest paths for BM and SSM in BETcalculation, different shortest path algorithms can beused including Dijkstra, Floyd-Warshall and BFS. Dijkstra calculates shortest paths from a start node to alltarget nodes, Floyd-Warshall determine shortest pathfrom all start nodes to all target nodes in ontologygraphs. Time complexity for calculating shortest pathsfrom all nodes to all other nodes is equal to calculatingshortest path for each start node to all target nodes V times, where V is the number of nodes. Shortest pathfrom a single start node to all target nodes complexityis O(( V E ) log V ), for V nodes and E edgesin Dijkstra algorithm with optimal data structures (i.e.adjacency list and binary heap). Therefore, time complexity for calculation shortest path from all nodes toall nodes is O(( V 2 V E ) log V ). In BreadthFirst Search (BFS), time complexity of shortest pathcalculation from a single start node to all target destinations, is O( V E ) where V is the number ofnodes and E is number of edges in graph. Using BFS,all source to all destination node distance calculationis O( V 2 V E ). Since BFS is the optimal shortestpath calculation for unweighted graphs, we use BFSin Map-Reduce framework and parallel processingto calculate BM and SSM values. Therefore, in thenext section we explain about an iterative map-reducealgorithm that explores all the paths in parallel and itis very well suited for large ontologies.

54.1.2 Shortest Path CalculationMap/Reduce AlgorithmusingIterativeMap/reduce programming model is a powerful interface for automatic parallelization and distributionof large-scale computations[?][?]. In this model, weuse the Map and Reduce functions that are definedas follows.Map( inkey , inV alue ) outKey , intermediateV alue listReduce( outKey , intermediateV alue list) outV alue list (4)Data from data sources are fed into Map functionas a pair of inkey , inV alue . Map functionproduces one or more intermediate values alongwith the outputkey from input. After the map phase,all intermediate values for any given outkey arecombined together into a list. Reduce functi

method in map-reduce framework based on the struc-ture of ontologies and alignment of entities between ontologies. Definition 1 (Ontology Graph): An ontology graph is a directed, cyclic graph G V;E , where V include all the entities of an ontology and E is a set of all properties between entities. Definition 2 (Ontology Vocabulary): The .

Related Documents:

community-driven ontology matching and an overview of the M-Gov framework. 2.1 Collaborative ontology engineering . Ontology engineering refers to the study of the activities related to the ontology de-velopment, the ontology life cycle, and tools and technologies for building the ontol-ogies [6]. In the situation of a collaborative ontology .

Why should you Query? Centers for Medicare and Medicaid Services supports the use of query forms as a supplement to the health care record. “Use of the physician query form is permissible to the extent it provides clarification and is consistent with other medical record documentation.” 3 File Size: 254KBPage Count: 26Explore furtherPhysician Query Examples Journal Of AHIMAjournal.ahima.org2019 update: Guidelines for achieving a compliant query .acdis.orgGuidelines for Achieving a Compliant Query Practice (2019 .bok.ahima.orgThe Physician Query Process Compliance Issuesassets.hcca-info.orgThe Physician Query: What Every Coder Wants You To Knowcapturebilling.comRecommended to you b

Tags:css media query, css media query examples, css media query for ipad, css media query for mobile, css media query value defined in the query. max-width Rules applied for any browser width below the value defined in the query. min-height Rules applied for any browser height over the value defined in the query. max-height Rules applied for any

entities, classes, properties and functions related to a certain view of the world. The use of an ontology, translated into an active information system component, leads to Ontology-Driven Information Systems and, in the specific case of GIS, leads to what we call Ontology-Driven Geographic Information Systems.

To enable reuse of domain knowledge . Ontologies Databases Declare structure Knowledge bases Software agents Problem-solving methods Domain-independent applications Provide domain description. Outline What is an ontology? Why develop an ontology? Step-By-Step: Developing an ontology Underwater ? What to look out for. What Is "Ontology .

A Framework for Ontology-Driven Similarity Measuring Using Vector Learning Tricks Mengxiang Chen, Beixiong Liu, Desheng Zeng and Wei Gao, Abstract—Ontology learning problem has raised much atten-tion in semantic structure expression and information retrieval. As a powerful tool, ontology is evenly employed in various

This research investigates how these technologies can be integrated into an Ontology Driven Multi-Agent System (ODMAS) for the Sensor Web. The research proposes an ODMAS framework and an implemented middleware platform, i.e. the Sensor Web Agent Platform (SWAP). SWAP deals with ontology construction, ontology use, and agent

Agile Software Development with Scrum Jeff Sutherland Gabrielle Benefield. Agenda Introduction Overview of Methodologies Exercise; empirical learning Agile Manifesto Agile Values History of Scrum Exercise: The offsite customer Scrum 101 Scrum Overview Roles and responsibilities Scrum team Product Owner ScrumMaster. Agenda Scrum In-depth The Sprint Sprint Planning Exercise: Sprint Planning .