Integrating Semantic Knowledge With Web Usage Mining For .

2y ago
28 Views
2 Downloads
667.41 KB
47 Pages
Last View : 2m ago
Last Download : 2m ago
Upload by : Nadine Tse
Transcription

Integrating Semantic Knowledge with Web Usage Miningfor PersonalizationHonghua Dai and Bamshad MobasherSchool of Computer Science, Telecommunication, and Information SystemsDePaul University243 S. Wabash Ave.Chicago, Illinois, 60604Phone: 312-362-5174Fax: 312-362-6116Email: mobasher@cs.depaul.eduKeywords: Web usage mining, personalization, data mining, domain knowledge, ontologies,semantic Web mining

Integrating Semantic Knowledge with Web Usage Miningfor PersonalizationAbstractWeb usage mining has been used effectively as an approach to automatic personalization and asa way to overcome deficiencies of traditional approaches such as collaborative filtering. Despitetheir success, such systems, as in more traditional ones, do not take into account the semanticknowledge about the underlying domain. Without such semantic knowledge, personalizationsystems cannot recommend different types of complex objects based in their underlyingproperties and attributes. Nor can these systems possess the ability to automatically explain orreason about the user models or user recommendations. The integration of semantic knowledgeis, in fact, the primary challenge for the next generation of personalization systems. In thischapter we provide an overview of approaches for incorporating semantic knowledge into Webusage mining and personalization processes. In particular we discuss the issues and requirementsfor successful integration of semantic knowledge from different sources, such as the content andthe structure of Web sites for personalization. Finally, we present a general framework for fullyintegrating domain ontologies with Web Usage Mining and Personalization processes atdifferent stages, including the preprocessing and pattern discovery phases, as well as in the finalstage where the discovered patterns are used for personalization.INTRODUCTIONWith the continued growth and proliferation of e-commerce, Web services, and Web-basedinformation systems, personalization has emerged as a critical application which is essential tothe success of a Web site. It is now common for Web users to encounter sites that providedynamic recommendations for products and services, targeted banner advertising, and

individualized link selections. Indeed, nowhere is this phenomenon more apparent as in thebusiness-to-consumer e-commerce arena. The reason is that, in today's highly competitive ecommerce environment, the success of a site often depends on the site's ability to retain visitorsand turn casual browsers into potential customers. Automatic personalization and recommendersystem technologies have become critical tools, precisely because they help engage visitors at adeeper and more intimate level by tailoring the site's interaction with a visitor to her needs andinterests.Web personalization can be defined as any action that tailors the Web experience to a particularuser, or set of users (Mobasher, Cooley, & Srivastava, 2000a). The experience can be somethingas casual as browsing a Web site or as (economically) significant as trading stocks or purchasinga car. Principal elements of Web personalization include modeling of Web objects (pages, etc.)and subjects (users), categorization of objects and subjects, matching between and across objectsand/or subjects, and determination of the set of actions to be recommended for personalization.The actions can range from simply making the presentation more pleasing to anticipating theneeds of a user and providing customized information.Traditional approaches to personalization have included both content-based and user-basedtechniques. Content-based techniques use personal profiles of users and recommend other itemsor pages based on their content similarity to the items or pages that are in the user’s profile. Theunderlying mechanism in these systems is usually the comparison of sets of keywordsrepresenting pages or item descriptions. Examples of such systems include Letizia (Lieberman,1995) and WebWatcher (Joachims, Freitag, & Mitchell, 1997). While these systems perform wellfrom the perspective of the end user who is searching the Web for information, they are lessuseful in e-commerce applications, partly due to the lack of server-side control by site owners,and partly because techniques based on content similarity alone may miss other types of semanticrelationships among objects (for example, the associations among products or services that aresemantically different, but are often used together).User-based techniques for personalization, on the other hand, primarily focus on the similaritiesamong users rather than item-based similarities. The most widely used technology user-based

personalization is collaborative filtering (CF) (Herlocker, Konstan, Borchers, & Riedl, 1999).Given a target user’s record of activity or preferences, CF-based techniques compare that recordwith the historical records of other users in order to find the users with similar interests. This isthe so called neighborhood of the current user. The mapping of a visitor record to itsneighborhood could be based on similarity in ratings of items, access to similar content orpages, or purchase of similar items. The identified neighborhood is then used to recommenditems not already accessed or purchased by the active user. The advantage of this approach overpurely content-based approaches which rely on content similarity in item-to-item comparisonsis that it can capture “pragmatic” relationships among items based on their intended use orbased on similar tastes of the users.The CF-based techniques, however, suffer from some well-known limitations (Sarwar, Karypis,Konstan, & Riedl, 2000). For the most part these limitations are related to the scalability andefficiency of the underlying algorithms which requires real-time computation in both theneighborhood formation and the recommendation phases. The effectiveness and scalability ofcollaborative filtering can be dramatically enhanced by the application of Web usage miningtechniques.In general, Web mining can be characterized as the application of data mining to the content,structure, and usage of Web resources (Cooley, Mobasher, & Srivastava, 1997; Srivastava,Cooley, Deshpande, & Tan, 2000). The goal of Web mining is to automatically discover local aswell as global models and patterns within and between Web pages or other Web resources. Thegoal of Web usage mining, in particular, is to capture and model Web user behavioral patterns.The discovery of such patterns from the enormous amount of data generated by Web andapplication servers has found a number of important applications. Among these applications aresystems to evaluate the effectiveness of a site in meeting user expectations (Spiliopoulou, 2000),techniques for dynamic load balancing and optimization of Web servers for better and moreefficient user access (Pitkow & Pirolli, 1999; Palpanas & Mendelzon, 1999), and applications fordynamically restructuring or customizing a site based on users’ predicted needs and interests(Perkowitz & Etzioni, 1998).

More recently, Web usage mining techniques have been proposed as another user-basedapproach to personalization which alleviate some of the problems associated with collaborativefiltering (Mobasher et al,, 2000a). In particular, Web usage mining has been used to improve thescalability of personalization systems based on traditional CF-based techiques (Mobasher, Dai,Luo, & Nakagawa, 2001; Mobasher, Dai, Luo, & Nakagawa, 2002).However, the pure usage-based approach to personalization has an important drawback: therecommendation process relies on the existing user transaction data, thus items or pages added to asite recently cannot be recommended. This is generally referred to as the “new item problem”. Acommon approach to revolving this problem in collaborative filtering has been to integratecontent characteristics of pages with the user ratings or judgments (Claypool et al. 1999;Pazzani, 1999). Generally, in these approaches, keywords are extracted from the content on theWeb site and are used to either index pages by content or classify pages into various contentcategories. In the context of personalization, this approach would allow the system torecommend pages to a user, not only based on similar users, but also (or alternatively) based onthe content similarity of these pages to the pages user has already visited.Keyword-based approaches, however, are incapable of capturing more complex relationshipsamong objects at a deeper semantic level based on the inherent properties associated with theseobjects. For example, potentially valuable relational structures among objects such asrelationships between movies, directors, and actors, or between students, courses, andinstructors, may be missed if one can only rely on the description of these entities using sets ofkeywords. To be able to recommend different types of complex objects using their underlyingproperties and attributes, the system must be able to rely on the characterization of usersegments and objects, not just based on keywords, but at a deeper semantic level using thedomain ontologies for the objects. For instance, in a traditional personalization system on auniversity Web site might recommend courses in Java to a student, simply because that studenthas previously taken or shown interest in Java courses. On the other hand, a system that hasknowledge of the underlying domain ontology, might recognize that the student should firstsatisfy the prerequisite requirements for a recommended course, or be able to recommend thebest instructors for Java course, and so on.

An ontology provides a set of well-founded constructs that define significant concepts and theirsemantic relationships. An example of an ontology is a relational schema for a databaseinvolving multiple tables and foreign keys semantically connecting these relations. Suchconstructs can be leveraged to build meaningful higher level knowledge in a particular domain.Domain ontologies for a Web site usually include concepts, subsumption relations betweenconcepts (concept hierarchies), and other relations among concepts that exist in the domain thatthe Web site represents. For example, the domain ontologies of a movie Web site usuallyincludes concepts such as “movie,” “actor,” “director,” “theater” etc. The genre hierarchy canbe used to represent different categories of movie concepts. Typical relations in this domainmay include, “Starring” (between actors and movies), “Directing”, “Playing” (between theatersand movies), etc.The ontology of a Web site can be constructed by extracting relevant concepts and relationsfrom the content and structure of the site, through machine learning and Web miningtechniques. But, in addition to concepts and relations that can be acquired from Web contentand structure information, we are also interested in usage-related concepts and relations in aWeb site. For instance, in an E-commerce Web site, we may be interested in the relationsbetween users and objects which define different types of online activity, such as browsing,searching, registering, buying, and bidding. The integration of such usage-based relations withontological information representing the underlying concepts and attributes embedded in a siteallows for more effective knowledge discovery, as well as better characterization andinterpretation of the discovered patterns.In the context of Web personalization and recommender systems, the use of semanticknowledge can lead to deeper interaction of the visitors or customers with the site. Integrationof domain knowledge allows such systems to infer additional useful recommendations for usersbased on more fine grained characteristics of the objects being recommended, and provides thecapability to explain and reason about user actions.In this chapter we present an overview of the issues related to and requirements for successfully

integrating semantic knowledge in the Web usage mining and personalization processes. Webegin by providing some general background on the use of semantic knowledge and ontologies inWeb mining, as well as an overview of personalization based on Web usage mining. We thendiscuss how the content and the structure of the site can be leveraged to transform raw usage datainto semantically-enhanced transactions that can be used for semantic Web usage mining andpersonalization. Finally we present a framework for more systematically integrating full-fledgeddomain ontologies in the personalization process.BACKGROUNDSemantic Web MiningWeb mining is the process of discovering and extracting useful knowledge from the content,usage, and structure of one or more Web sites. Semantic Web mining (Berendt, Hotho, &Stumme, 2002) involves the integration of domain knowledge into the Web mining process.For the most part the research in Semantic Web mining has been focused in application areassuch as Web content and structure mining. In this section, we provide a brief overview andsome examples of related work in this area. Few studies have focused on the use of domainknowledge in Web usage mining. Our goal in this chapter is to provide a road map for theintegration of semantic and ontological knowledge into the process of Web usage mining, andparticularly, in its application to Web personalization and recommender systems.Domain knowledge can be integrated into the Web mining process in many ways. This includesleveraging explicit domain ontologies or implicit domain semantics extracted from the contentor the structure of documents or Web site. In general, however, this process may involve one ormore of three critical activities: domain ontology acquisition, knowledge base construction, andknowledge-enhanced pattern discovery.

Domain Ontology AcquisitionThe process of acquiring, maintaining and enriching the domain ontologies is referred to as“ontology engineering”. For small Web sites with only static Web pages, it is feasible toconstruct a domain knowledge base manually or semi-manually. In Loh, Wives, & de Oliveira(2000) a semi-manual approach is adopted for defining each domain concept as a vector ofterms with the help of existing vocabulary and natural language processing tools.However, manual construction and maintenance of domain ontologies require a great deal ofeffort on the part of knowledge engineers, particularly for large-scale Web sites or Web siteswith dynamically generated content. In dynamically generated Web sites, page templates areusually populated based on structured queries performed against back-end databases. In suchcases, the database schema can be used directly to acquire ontological information. Some Webservers send structured data files (e.g., XML files) to users and let client-side formattingmechanisms (e.g., CSS files) work out the final Web representation on client agents. In thiscase, it is generally possible to infer the schema from the structured data files.When there is no direct source for acquiring domain ontologies, machine learning and textmining techniques must be employed to extract domain knowledge from the content orhyperlink structure of the Web pages. In Clerkin, Cunningham, & Hayes (2001) a hierarchicalclustering algorithm is applied to terms in order to create concept hierarchies. In Stumme,Taouil, Bastide, Pasquier, & Lakhal (2000) a Formal Concept Analysis framework is proposed toderive a concept lattice (a variation of association rule algorithm). The approach proposed inMaedche & Staab (2000) learns generalized conceptual relations by applying association rulemining. All these efforts aim to automatically generate machine understandable ontologies forWeb site domains.The outcome of this phase is a set of formally defined domain ontologies that preciselyrepresent the Web site. Good representation should provide machine understandability, thepower of reasoning, and computation efficiency. The choice of ontology representationlanguage has a direct effect on the flexibility of the data mining phase. Common representationapproaches are vector-space model (Loh et al., 2000), descriptive logics (such as DAML OIL)

(Giugno & Lukasiewicz, 2002; Horrocks & Sattler 2001), first order logic (Craven et al., 2000),relational models (Dai & Mobasher, 2002), probabilistic relational models (Getoor, Friedman,Koller, & Taskar, 2001), and probabilistic Markov models (Anderson, Domingos, & Weld,2002).Knowledge Base ConstructionThe first phase generates the formal representation of concepts and relations among them. Thesecond phase, knowledge base construction, can be viewed as building mappings between conceptsor relations on the one hand, and objects on the Web. The goal of this phase is to find the instancesof the concepts and relations from the Web site’s domain, so that they can be exploited to performfurther data mining tasks. Learning algorithms play an important role in this phase.In Ghani & Fano (2002) a text classifier is learned for each “semantic feature” (somewhatequivalent to the notion of a concept) based on a small manually labeled data set. First Web pagesare extracted from different Web sites that belong to a similar domain, and then the semanticfeatures are manually labeled. This small labeled data set is fed into a learning algorithm as thetraining data to learn the mappings between Web objects and the concept labels. In fact, thisapproach treats the process of assigning concept labels as filling “missing” data. Craven et al.(2000) adopt a combined approach of statistical text classification and first order textclassification in recognizing concept instances. In that study, learning process is based on bothpage content and linkage information.Knowledge-Enhanced Web Data MiningDomain knowledge enables analysts to perform more powerful Web data mining tasks. Theapplications include content mining, information retrieval and extraction, Web usage mining,and personalization. On the other hand, data mining tasks can also help to enhance the processof domain knowledge discovery.Domain knowledge can improve the accuracy of document clustering and classification andinduce more powerful content patterns. For example, in Horrocks (2002), domain ontologies areemployed in selecting textual features. The selection is based on lexical analysis tools that map

terms into concepts within the ontology. The approach also aggregates concepts by merging theconcepts that have low support in the documents. After preprocessing, only necessary conceptsare selected for the content clustering step. In McCallum, Rosenfeld, Mitchell, & Ng (1998) aconcept hierarchy is used to improve the accuracy and the scalability of text classification.Traditional approaches to content mining and information retrieval treat every document as a set ora bag of terms. Without domain semantics, we would treat “human” and “mankind” as differentterms, or, “brake” and “car” as unrelated terms. In Loh et al. (2000) a concept is defined as agroup of terms that are semantically relevant, e.g., as synonyms. With such concept definitions,concept distribution among documents is analyzed to find interesting concept patterns. Forexample, one can discover dominant themes in a document collection or in a single document;or find associations among concepts.Ontologies and domain semantics have been applied extensively in the context of Webinformation retrieval and extraction. For example, the ARCH system (Parent, Mobasher, &Lytinen, 2001) adopts concept hierarchies because they allow users to formulate moreexpressive and less ambiguous queries when compared to simple keyword-based queries. InARCH, an initial user query is used to find matching concepts within a portion of concepthierarchy. The concept hierarchy is stored in an aggregated form with each node represented asa term vector. The user can select or unselect nodes in the presented portion of the hierarchy,and relevance feedback techniques are used to modify the

Web mining, as well as an overview of personalization based on Web usage mining. We then discuss how the content and the structure of the site can be leveraged to transform raw usage data into semantically-enhanced transactions that can be used for semantic Web usage mining and personalization.

Related Documents:

WibKE – Wiki-based Knowledge Engineering @WikiSym2006 Our Goals: Why are we doing this? zWhat is the semantic web? yIntroducing the semantic web to the wiki community zWhere do semantic technologies help? yState of the art in semantic wikis zFrom Wiki to Semantic Wiki yTalk: „Doing Scie

A. Personalization using Semantic web: Semantic technologies promise a next generation of semantic search engines. General search engines don’t take into consideration the semantic relationships between query terms and other concepts that might be significant to the user. Thus, semantic web vision and its core ontology’s are used to .

Semantic Analysis Chapter 4 Role of Semantic Analysis Following parsing, the next two phases of the "typical" compiler are –semantic analysis –(intermediate) code generation The principal job of the semantic analyzer is to enforce static semantic rules –constructs a syntax tree (usua

(semantic) properties of objects to place additional constraints on snapping. Semantic snapping also provides more complex lexical feedback which reflects potential semantic consequences of a snap. This paper motivates the use of semantic snapping and describes how this technique has been implemented in a window-based toolkit. This

tive for patients with semantic impairments, and phono-logical tasks are effective for those with phonological impairments [4,5]. One of the techniques that focus on semantic impair-ments is Semantic Feature Analysis (SFA). SFA helps patients with describing the semantic features which ac-tivate the most distinguishing features of the semantic

personalization of the users. Semantic retrieval techniques are performed by interpreting the semantic of keywords. Using the . In addition, the semantic web is an approach to facilitate communication by making the web suitable for machine-to-machine communication [14]. It can be used to encode meaning and complex relationships in web pages. .

semantic web has been introduced to reason over distributed an-notations of resources. TRIPLE is able to handle the semantic web descriptions formats like those previously mentioned (see ap-pendix 10 for brief introduction). 3. SERVICES FOR PERSONALIZATION ON THE SEMANTIC WEB Our architecture for an adaptive educational semantic web ben-

knowledge about things, groups of things, and relations between things" To provide a common way to process the content of web Information. Enables knowledge linking on the web/Semantic Web. A computer-readable language, a declarative language (not a programming or schema language) Enables knowledge linking on the web/Semantic Web