Brief Introduction To Data & Web Mining

4m ago
1.14 MB
29 Pages
Last View : 29d ago
Last Download : 1m ago
Upload by : Noelle Grant

Brief Introduction toData & Web MiningOlfa NasraouiCECS 694:Web mining for e-commerce andinformation retrieval9/14/2005Brief Introduction to Data &Web Mining1

Outline Knowledge Discovery in DB & DataMining–Motivation & Definition of KDD–DM Tasks Web Mining–Motivation & Differences from DM–Types of Web Data to be Mined–Web Personalization & Profiling2

Knowledge Discovery in DB &Data Mining: MotivationExplosion in electronically stored dataHuge DB’s contain a wealth of info, stillnot fully exploited (valuable info(gold!)may be lurking within data).Accessing useful info. more and moredifficult (Info. Retrieval in various datarepositories: Image DB, WWW, etc).3

Knowledge Discovery inDB: DefinitionKDD: discovering useful info. and knowledgefrom huge data repositories (patterns,associations, etc)KDD4

Knowledge Discovery inDB: Process1. Data Preprocessing: Cleaning,integration, transformation2. Data Mining: Intelligent methodsfor extracting knowledge/diggingfor gold3. Pattern evaluation andpresentation5

Data Mining TasksClass description: summarization/characterization of a data collectionMining associations: Discoveringassociation relationships/correlations amonga set of items in the form of rules: X Y(DB tuples satisfying X are likely to satisfyY)6

Data Mining Tasks Classification: Construct a model foreach class of labeled training databased on its features and use it toclassify future data Prediction: Predict the possible valuesof some missing data/attributes basedon similar objects7

Data Mining Tasks Clustering: Dividing unlabeled data intogroups/clusters such that data in samecluster are as similar as possible while datafrom distinct clusters are dissimilar Time-series analysis: Discover regularities& interesting characteristics, search forsimilar sequences or subsequences, miningseq. patterns, trends/deviations8

Web MiningWWW: Vital, popular source of informationSearching for info.:– One of the most common tasks (71% ofusers)– Can be frustratingNavigation (self-guided, sometimes aimlesssearch)Design of good Web sites important9

Applications of WebMining Automatic personalization: Adaptivesites can facilitate navigation, search E-commerce Web sites can be mademore user friendly Optimized marketing efforts for tradingproducts, services, information Improved search engines10

Differences from RegularDMHuge, semi/unstructured, highly dynamic dataContent: 8 Billion pagesUsage: daily visitors to popular sites: inmillionsWWW data corrupted with noise(unintentional access, incorrect logging,imperfect crawling)Data is dynamic (expired links, changing userinterests/activities, changing Web content &structure, , etc)11

Types of Web Data Typesof Web Mining Content: Web pages HTML content,snippets, multimedia data (Web contentmining) Usage: Web access log files/clickstream data (Web usage mining) Structure: Link topology of the Web(Web structure mining)12




Web Content for file html head title Windows to the Universe /title meta http-equiv "Content-Type" content "text/html; charset iso-8859-1" !-- Fireworks MX Dreamweaver MX target. Created Wed Jan 28 11:46:21 GMT-0800(Pacific Standard Time) 2004-- style type "text/css" BODY,TD { background-color: black;color: #ffd782;font-family: Arial, Helvetica, sans-serif;font-size: 10pt}A { color:#66ccff }} /style script type "text/javascript" language "JavaScript"src "JavaScript/news.js" /script script type "text/javascript" language "JavaScript"src "JavaScript/win home.js" /script 16 /head

Web Content for file body bgcolor "#000000" background "/images home/orion star field 1.gif"onLoad "showNews();MM preloadImages('/images home/jupiterB.jpg','/images home/memberB.jpg','/images home/storeB.jpg','/images home/newsB.jpg','/images home/toursB.jpg','/images home/gamesB.jpg','/image

9/14/2005 Brief Introduction to Data & Web Mining 1 Brief Introduction to Data & Web Mining Olfa Nasraoui CECS 694: Web mining for e-commerce and information retrieval. 2 Outline Knowledge Discovery in DB & Data Mining –Motivation &