DATA MINING - University Of Rajshahi

1y ago
22 Views
2 Downloads
4.36 MB
550 Pages
Last View : 5d ago
Last Download : 6m ago
Upload by : Aliana Wahl
Transcription

DATA MININGConcepts, Models,Methods, and Algorithms

IEEE Press445 Hoes LanePiscataway, NJ 08854IEEE Press Editorial BoardLajos Hanzo, Editor in ChiefR. AbhariM. El-HawaryO. P. MalikJ. AndersonB-M. HaemmerliS. NahavandiG. W. ArnoldM. LanzerottiT. SamadF. CanaveroD. JacobsonG. ZobristKenneth Moore, Director of IEEE Book and Information Services (BIS)Technical ReviewersMariofanna Milanova, ProfessorComputer Science DepartmentUniversity of Arkansas at Little RockLittle Rock, Arkansas, USAJozef Zurada, Ph.D.Professor of Computer Information SystemsCollege of BusinessUniversity of LouisvilleLouisville, Kentucky, USAWitold PedryczDepartment of ECEUniversity of AlbertaEdmonton, Alberta, Canada

DATA MININGConcepts, Models,Methods, and AlgorithmsSECOND EDITIONMehmed KantardzicUniversity of LouisvilleIEEE PRESSA JOHN WILEY & SONS, INC., PUBLICATION

Copyright 2011 by Institute of Electrical and Electronics Engineers. All rights reserved.Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form orby any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except aspermitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the priorwritten permission of the Publisher, or authorization through payment of the appropriate per-copy fee tothe Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax(978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should beaddressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030,(201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts inpreparing this book, they make no representations or warranties with respect to the accuracy orcompleteness of the contents of this book and specifically disclaim any implied warranties ofmerchantability or fitness for a particular purpose. No warranty may be created or extended by salesrepresentatives or written sales materials. The advice and strategies contained herein may not be suitablefor your situation. You should consult with a professional where appropriate. Neither the publisher norauthor shall be liable for any loss of profit or any other commercial damages, including but not limited tospecial, incidental, consequential, or other damages.For general information on our other products and services or for technical support, please contact ourCustomer Care Department within the United States at (800) 762-2974, outside the United States at (317)572-3993 or fax (317) 572-4002.Wiley also publishes its books in a variety of electronic formats. Some content that appears in print maynot be available in electronic formats. For more information about Wiley products, visit our web site atwww.wiley.com.Library of Congress Cataloging-in-Publication Data:Kantardzic, Mehmed.Data mining : concepts, models, methods, and algorithms / Mehmed Kantardzic. – 2nd ed.p. cm.ISBN 978-0-470-89045-5 (cloth)1. Data mining. I. Title.QA76.9.D343K36 2011006.3'12–dc222011002190oBook ISBN: 978-1-118-02914-5ePDF ISBN: 978-1-118-02912-1ePub ISBN: 978-1-118-02913-8Printed in the United States of America.10987654321

To Belma and Nermin

CONTENTSPreface to the Second EditionPreface to the First Editionxiiixv1DATA-MINING CONCEPTS1.1IntroductionData-Mining Roots1.21.3Data-Mining Process1.4Large Data SetsData Warehouses for Data Mining1.51.6Business Aspects of Data Mining: Why a Data-Mining Project Fails1.7Organization of This BookReview Questions and Problems1.81.9References for Further Study1146914172123242PREPARING THE DATA2.1Representation of Raw DataCharacteristics of Raw Data2.22.3Transformation of Raw Data2.4Missing DataTime-Dependent Data2.52.6Outlier Analysis2.7Review Questions and ProblemsReferences for Further Study2.82626313336374148513DATA REDUCTION3.1Dimensions of Large Data Sets3.2Feature Reduction3.3Relief Algorithm53545666vii

viiiCONTENTS3.43.53.63.73.83.93.10Entropy Measure for Ranking FeaturesPCAValue ReductionFeature Discretization: ChiMerge TechniqueCase ReductionReview Questions and ProblemsReferences for Further Study687073778083854LEARNING FROM DATA4.1Learning Machine4.2SLTTypes of Learning Methods4.34.4Common Learning Tasks4.5SVMs4.6kNN: Nearest Neighbor ClassifierModel Selection versus Generalization4.74.8Model Estimation90% Accuracy: Now What?4.94.10 Review Questions and Problems4.11 References for Further Study878993991011051181221261321361385STATISTICAL METHODS5.1Statistical InferenceAssessing Differences in Data Sets5.25.3Bayesian Inference5.4Predictive RegressionANOVA5.55.6Logistic Regression5.7Log-Linear Models5.8LDAReview Questions and Problems5.95.10 References for Further Study1401411431461491551571581621641676DECISION TREES AND DECISION RULES6.1Decision Trees6.2C4.5 Algorithm: Generating a Decision TreeUnknown Attribute Values6.3169171173180

ixCONTENTS6.46.56.66.76.86.9Pruning Decision TreesC4.5 Algorithm: Generating Decision RulesCART Algorithm & Gini IndexLimitations of Decision Trees and Decision RulesReview Questions and ProblemsReferences for Further Study1841851891921941987ARTIFICIAL NEURAL NETWORKS7.1Model of an Artificial Neuron7.2Architectures of ANNs7.3Learning ProcessLearning Tasks Using ANNs7.47.5Multilayer Perceptrons (MLPs)7.6Competitive Networks and Competitive Learning7.7SOMsReview Questions and Problems7.87.9References for Further Study1992012052072102132212252312338ENSEMBLE LEARNING8.1Ensemble-Learning MethodologiesCombination Schemes for Multiple Learners8.28.3Bagging and Boosting8.4AdaBoostReview Questions and Problems8.58.6References for Further Study2352362402412432452479CLUSTER ANALYSIS9.1Clustering Concepts9.2Similarity Measures9.3Agglomerative Hierarchical ClusteringPartitional Clustering9.49.5Incremental Clustering9.6DBSCAN AlgorithmBIRCH Algorithm9.79.8Clustering Validation9.9Review Questions and Problems9.10 References for Further Study249250253259263266270272275275279

xCONTENTS10ASSOCIATION RULES10.1 Market-Basket Analysis10.2 Algorithm Apriori10.3 From Frequent Itemsets to Association Rules10.4 Improving the Efficiency of the Apriori Algorithm10.5 FP Growth Method10.6 Associative-Classification Method10.7 Multidimensional Association–Rules Mining10.8 Review Questions and Problems10.9 References for Further 32432612ADVANCES IN DATA MINING12.1 Graph Mining12.2 Temporal Data Mining12.3 Spatial Data Mining (SDM)12.4 Distributed Data Mining (DDM)12.5 Correlation Does Not Imply Causality12.6 Privacy, Security, and Legal Aspects of Data Mining12.7 Review Questions and Problems12.8 References for Further Study32832934335736036937638138213GENETIC ALGORITHMS13.1 Fundamentals of GAs13.2 Optimization Using GAs13.3 A Simple Illustration of a GA13.4 Schemata13.5 TSP385386388394399402MINING AND TEXT MININGWeb MiningWeb Content, Structure, and Usage MiningHITS and LOGSOM AlgorithmsMining Path–Traversal PatternsPageRank AlgorithmText MiningLatent Semantic Analysis (LSA)Review Questions and ProblemsReferences for Further Study

xiCONTENTS13.613.713.813.9Machine Learning Using GAsGAs for ClusteringReview Questions and ProblemsReferences for Further Study40440941141314FUZZY SETS AND FUZZY LOGIC14.1 Fuzzy Sets14.2 Fuzzy-Set Operations14.3 Extension Principle and Fuzzy Relations14.4 Fuzzy Logic and Fuzzy Inference Systems14.5 Multifactorial Evaluation14.6 Extracting Fuzzy Models from Data14.7 Data Mining and Fuzzy Sets14.8 Review Questions and Problems14.9 References for Further Study41441542042542943343644144344515VISUALIZATION METHODS15.1 Perception and Visualization15.2 Scientific Visualization andInformation Visualization15.3 Parallel Coordinates15.4 Radial Visualization15.5 Visualization Using Self-Organizing Maps (SOMs)15.6 Visualization Systems for Data Mining15.7 Review Questions and Problems15.8 References for Further Study447448AppendixA.1A.2A.3A.4A.5A.6AData-Mining JournalsData-Mining ConferencesData-Mining Forums/BlogsData SetsComercially and Publicly Available ToolsWeb Site LinksAppendix B: Data-Mining ApplicationsB.1 Data Mining for Financial Data AnalysisB.2 Data Mining for the Telecomunications 496496499

xiiCONTENTSB.3B.4B.5B.6Data Mining for the Retail IndustryData Mining in Health Care and Biomedical ResearchData Mining in Science and EngineeringPitfalls of Data MiningBibliographyIndex501503506509510529

PREFACE TOTHE SECOND EDITIONIn the seven years that have passed since the publication of the first edition of this book,the field of data mining has made a good progress both in developing new methodologies and in extending the spectrum of new applications. These changes in data miningmotivated me to update my data-mining book with a second edition. Although the coreof material in this edition remains the same, the new version of the book attempts tosummarize recent developments in our fast-changing field, presenting the state-of-theart in data mining, both in academic research and in deployment in commercial applications. The most notable changes from the first edition are the addition of new topics such as ensemble learning, graph mining, temporal, spatial, distributed, and privacy preserving data mining; new algorithms such as Classification and Regression Trees (CART), DensityBased Spatial Clustering of Applications with Noise (DBSCAN), Balanced andIterative Reducing and Clustering Using Hierarchies (BIRCH), PageRank,AdaBoost, support vector machines (SVM), Kohonen self-organizing maps(SOM), and latent semantic indexing (LSI); more details on practical aspects and business understanding of a data-miningprocess, discussing important problems of validation, deployment, data understanding, causality, security, and privacy; and some quantitative measures and methods for comparison of data-mining modelssuch as ROC curve, lift chart, ROI chart, McNemar ’s test, and K-fold cross validation paired t-test.Keeping in mind the educational aspect of the book, many new exercises have beenadded. The bibliography and appendices have been updated to include work that hasappeared in the last few years, as well as to reflect the change in emphasis when a newtopic gained importance.I would like to thank all my colleagues all over the world who used the first editionof the book for their classes and who sent me support, encouragement, and suggestionsto put together this revised version. My sincere thanks are due to all my colleagues andstudents in the Data Mining Lab and Computer Science Department for their reviewsof this edition, and numerous helpful suggestions. Special thanks go to graduate students Brent Wenerstrom, Chamila Walgampaya, and Wael Emara for patience in proofreading this new edition and for useful discussions about the content of new chapters,xiii

xivPREFACE TO THE SECOND EDITIONnumerous corrections, and additions. To Dr. Joung Woo Ryu, who helped me enormously in the preparation of the final version of the text and all additional figures andtables, I would like to express my deepest gratitude.I believe this book can serve as a valuable guide to the field for undergraduate,graduate students, researchers, and practitioners. I hope that the wide range of topicscovered will allow readers to appreciate the extent of the impact of data mining onmodern business, science, even the entire society.Mehmed KantardzicLouisvilleJuly 2011

PREFACE TOTHE FIRST EDITIONThe modern technologies of computers, networks, and sensors have made data collection and organization an almost effortless task. However, the captured data need to beconverted into information and knowledge from recorded data to become useful.Traditionally, the task of extracting useful information from recorded data has beenperformed by analysts; however, the increasing volume of data in modern businessesand sciences calls for computer-based methods for this task. As data sets have grownin size and complexity, so there has been an inevitable shift away from direct hands-ondata analysis toward indirect, automatic data analysis in which the analyst works viamore complex and sophisticated tools. The entire process of applying computer-basedmethodology, including new techniques for knowledge discovery from data, is oftencalled data mining.The importance of data mining arises from the fact that the modern world is adata-driven world. We are surrounded by data, numerical and otherwise, which mustbe analyzed and processed to convert it into information that informs, instructs, answers,or otherwise aids understanding and decision making. In the age of the Internet,intranets, data warehouses, and data marts, the fundamental paradigms of classical dataanalysis are ripe for changes. Very large collections of data—millions or even hundredof millions of individual records—are now being stored into centralized data warehouses, allowing analysts to make use of powerful data mining methods to examinedata more comprehensively. The quantity of such data is huge and growing, the numberof sources is effectively unlimited, and the range of areas covered is vast: industrial,commercial, financial, and scientific activities are all generating such data.The new discipline of data mining has developed especially to extract valuableinformation from such huge data sets. In recent years there has been an explosivegrowth of methods for discovering new knowledge from raw data. This is not surprisinggiven the proliferation of low-cost computers (for implementing such methods in software), low-cost sensors, communications, and database technology (for collecting andstoring data), and highly computer-literate application experts who can pose “interesting” and “useful” application problems.Data-mining technology is currently a hot favorite in the hands of decision makersas it can provide valuable hidden business and scientific “intelligence” from largeamount of historical data. It should be remembered, however, that fundamentally, datamining is not a new technology. The concept of extracting information and knowledgediscovery from recorded data is a well-established concept in scientific and medicalxv

xviPREFACE TO THE FIRST EDITIONstudies. What is new is the convergence of several disciplines and corresponding technologies that have created a unique opportunity for data mining in scientific and corporate world.The origin of this book was a wish to have a single introductory source to whichwe could direct students, rather than having to direct them to multiple sources. However,it soon became apparent that a wide interest existed, and potential readers other thanour students would appreciate a compilation of some of the most important methods,tools, and algorithms in data mining. Such readers include people from a wide varietyof backgrounds and positions, who find themselves confronted by the need to makesense of large amount of raw data. This book can be used by a wide range of readers,from students wishing to learn about basic processes and techniques in data mining toanalysts and programmers who will be engaged directly in interdisciplinary teams forselected data mining applications. This book reviews state-of-the-art techniques foranalyzing enormous quantities of raw data in a high-dimensional data spaces to extractnew information useful in decision-making processes. Most of the definitions, classifications, and explanations of the techniques covered in this book are not new, and theyare presented in references at the end of the book. One of the author ’s main goals wasto concentrate on a systematic and balanced approach to all phases of a data miningprocess, and present them with sufficient illustrative examples. We expect that carefullyprepared examples should give the reader additional arguments and guidelines in theselection and structuring of techniques and tools for his or her own data mining applications. A better understanding of the implementational details for most of the introducedtechniques will help challenge the reader to build his or her own tools or to improveapplied methods and techniques.Teaching in data mining has to have emphasis on the concepts and properties ofthe applied methods, rather than on the mechanical details of how to apply differentdata mining tools. Despite all of their attractive “bells and whistles,” computer-basedtools alone will never provide the entire solution. There will always be the need forthe practitioner to make important decisions regarding how the whole process will bedesigned, and how and which tools will be employed. Obtaining a deeper understandingof the methods and models, how they behave, and why they behave the way theydo is a prerequisite for efficient and successful application of data mining technology.The premise of this book is that there are just a handful of important principles andissues in the field of data mining. Any researcher or practitioner in this field needs tobe aware of these issues in order to successfully apply a particular methodology, tounderstand a method’s limitations, or to develop new techniques. This book is anattempt to present and discuss such issues and principles and then describe representative and popular methods originating from statistics, machine learning, computer graphics, data bases, information retrieval, neural networks, fuzzy logic, and evolutionarycomputation.In this book, we describe how best to prepare environments for performing datamining and discuss approaches that have proven to be critical in revealing importantpatterns, trends, and models in large data sets. It is our expectation that once a readerhas completed this text, he or she will be able to initiate and perform basic activitiesin all phases of a data mining process successfully and effectively. Although it is easy

PREFACE TO THE FIRST EDITIONxviito focus on the technologies, as you read through the book keep in mind that technology alone does not provide the entire solution. One of our goals in writing this bookwas to minimize the hype associated with data mining. Rather than making false promises that overstep the bounds of what can reasonably be expected from data mining,we have tried to take a more objective approach. We describe with enough informationthe processes and algorithms that are necessary to produce reliable and useful resultsin data mining applications. We do not advocate the use of any particular product ortechnique over another; the designer of data mining process has to have enough background for selection of appropriate methodologies and software tools.Mehmed KantardzicLouisvilleAugust 2002

1DATA-MINING CONCEPTSChapter Objectives Understand the need for analyses of large, complex, information-rich data sets.Identify the goals and primary tasks of data-mining process.Describe the roots of data-mining technology.Recognize the iterative character of a data-mining process and specify its basicsteps. Explain the influence of data quality on a data-mining process. Establish the relation between data warehousing and data mining.1.1INTRODUCTIONModern science and engineering are based on using first-principle models to describephysical, biological, and social systems. Such an approach starts with a basic scientificmodel, such as Newton’s laws of motion or Maxwell’s equations in electromagnetism,and then builds upon them various applications in mechanical engineering or electricalengineering. In this approach, experimental data are used to verify the underlyingData Mining: Concepts, Models, Methods, and Algorithms, Second Edition. Mehmed Kantardzic. 2011 by Institute of Electrical and Electronics Engineers. Published 2011 by John Wiley & Sons, Inc.1

2DATA-MINING CONCEPTSfirst-principle models and to estimate some of the parameters that are difficult orsometimes impossible to measure directly. However, in many domains the underlyingfirst principles are unknown, or the systems under study are too complex to be mathematically formalized. With the growing use of computers, there is a great amount ofdata being generated by such systems. In the absence of first-principle models, suchreadily available data can be used to derive models by estimating useful relationshipsbetween a system’s variables (i.e., unknown input–output dependencies). Thus thereis currently a paradigm shift from classical modeling and analyses based on firstprinciples to developing models and the corresponding analyses directly from data.We have gradually grown accustomed to the fact that there are tremendous volumesof data filling our computers, networks, and lives. Government agencies, scientificinstitutions, and businesses have all dedicated enormous resources to collecting andstoring data. In reality, only a small amount of these data will ever be used because, inmany cases, the volumes are simply too large to manage, or the data structures themselves are too complicated to be analyzed effectively. How could this happen? Theprimary reason is that the original effort to create a data set is often focused on issuessuch as storage efficiency; it does not include a plan for how the data will eventuallybe used and analyzed.The need to understand large, complex, information-rich data sets is common tovirtually all fields of business, science, and engineering. In the business world, corporate and customer data are becoming recognized as a strategic asset. The ability toextract useful knowledge hidden in these data and to act on that knowledge is becomingincreasingly important in today’s competitive world. The entire process of applying acomputer-based methodology, including new techniques, for discovering knowledgefrom data is called data mining.Data mining is an iterative process within which progress is defined by discovery,through either automatic or manual methods. Data mining is most useful in an exploratory analysis scenario in which there are no predetermined notions about what willconstitute an “interesting” outcome. Data mining is the search for new, valuable, andnontrivial information in large volumes of data. It is a cooperative effort of humansand computers. Best results are achieved by balancing the knowledge of human expertsin describing problems and goals with the search capabilities of computers.In practice, the two primary goals of data mining tend to be prediction and description. Prediction involves using some variables or fields in the data set to predictunknown or future values of other variables of interest. Description, on the other hand,focuses on finding patterns describing the data that can be interpreted by humans.Therefore, it is possible to put data-mining activities into one of two categories:1. predictive data mining, which produces the model of the system described bythe given data set, or2. descriptive data mining, which produces new, nontrivial information based onthe available data set.On the predictive end of the spectrum, the goal of data mining is to produce amodel, expressed as an executable code, which can be used to perform classification,

INTRODUCTION3prediction, estimation, or other similar tasks. On the descriptive end of the spectrum,the goal is to gain an understanding of the analyzed system by uncovering patterns andrelationships in large data sets. The relative importance of prediction and descriptionfor particular data-mining applications can vary considerably. The goals of predictionand description are achieved by using data-mining techniques, explained later in thisbook, for the following primary data-mining tasks:1. Classification. Discovery of a predictive learning function that classifies a dataitem into one of several predefined classes.2. Regression. Discovery of a predictive learning function that maps a data itemto a real-value prediction variable.3. Clustering. A common descriptive task in which one seeks to identify a finiteset of categories or clusters to describe the data.4. Summarization. An additional descriptive task that involves methods forfinding a compact description for a set (or subset) of data.5. Dependency Modeling. Finding a local model that describes significant dependencies between variables or between the values of a feature in a data set or ina part of a data set.6. Change and Deviation Detection. Discovering the most significant changes inthe data set.The more formal approach, with graphical interpretation of data-mining tasks forcomplex and large data sets and illustrative examples, is given in Chapter 4. Currentintroductory classifications and definitions are given here only to give the reader afeeling of the wide spectrum of problems and tasks that may be solved using datamining technology.The success of a data-mining engagement depends largely on the amount of energy,knowledge, and creativity that the designer puts into it. In essence, data mining is likesolving a puzzle. The individual pieces of the puzzle are not complex structures in andof themselves. Taken as a collective whole, however, they can constitute very elaboratesystems. As you try to unravel these systems, you will probably get frustrated, startforcing parts together, and generally become annoyed at the entire process, but onceyou know how to work with the pieces, you realize that it was not really that hard inthe first place. The same analogy can be applied to data mining. In the beginning, thedesigners of the data-mining process probably did not know much about the datasources; if they did, they would most likely not be interested in performing data mining.Individually, the data seem simple, complete, and explainable. But collectively, theytake on a whole new appearance that is intimidating and difficult to comprehend, likethe puzzle. Therefore, being an analyst and designer in a data-mining process requires,besides thorough professional knowledge, creative thinking and a willingness to seeproblems in a different light.Data mining is one of the fastest growing fields in the computer industry. Once asmall interest area within computer science and statistics, it has quickly expanded intoa field of its own. One of the greatest strengths of data mining is reflected in its wide

4DATA-MINING CONCEPTSrange of methodologies and techniques that can be applied to a host of problem sets.Since data mining is a natural activity to be performed on large data sets, one of thelargest target markets is the entire data-warehousing, data-mart, and decision-supportcommunity, encompassing professionals from such industries as retail, manufacturing,telecommunications, health care, insurance, and transportation. In the business community, data mining can be used to discover new purchasing trends, plan investmentstrategies, and detect unauthorized expenditures in the accounting system. It canimprove marketing campaigns and the outcomes can be used to provide customers withmore focused support and attention. Data-mining techniques can be applied to problemsof business process reengineering, in which the goal is to understand interactions andrelationships among business practices and organizations.Many law enforcement and special investigative units, whose mission is to identifyfraudulent activities and discover crime trends, have also used data mining successfully.For example, these methodologies can aid analysts in the identification of criticalbehavior patterns, in the communication interactions of narcotics organizations, themonetary transactions of money laundering and insider trading operations, the movements of serial killers, and the targeting of smugglers at border crossings. Data-miningtechniques have also been employed by people in the intelligence community whomaintain many large data sources as a part of the activities relating to matters of nationalsecurity. Appendix B of the book gives a brief overview of the typical commercialapplications of data-mining technology today. Despite a considerable level of overhypeand strategic misuse, data mining has not only persevered but matured and adapted forpractical use in the business world.1.2DATA-MINING ROOTSLooking at how different authors describe data mining, it is clear that we are far froma universal agreement on the definition of data mining or even what constitutes datamining. Is data mining a form of statistics enriched with learning theory or is it a revolutionary new concept? In our view, most data-mining problems and correspondingsolutions have roots in classical data analysis. Data mining has its origins in variousdisciplines, of which the two most important are statistics and machine learning.Statistics has its roots in mathematics; therefore, there has been an emphasis on mathematical rigor, a desire to establish that something is sensible on theoretical groundsbefore testing it in practice. In contrast, the machine-learning community has its originsvery much in computer practice. This has led to a practical orientation, a willingnessto test something out to see how well it performs, without waiting for a formal proofof effectiveness.If the place given to mathematics and formalizations is one of the major differencesbetween statistical and machine-learning approaches to data mining, another is the relative emphasis they give to models and algorithms. Modern statistics is almost entirelydriven by the notion of a model. This is a postulated structure, or an approximation toa structure, which could have led to the data. In place of the statistical emphasis on

5DATA-MINING ROOTSmodels, machine learning tends to emphasize algorithms. This is hardly surprising; thevery word “learning” contains the notion of a

Preface to the First Edition xv 1 DATA-MINING CONCEPTS 1 1.1 Introduction 1 1.2 Data-Mining Roots 4 1.3 Data-Mining Process 6 1.4 Large Data Sets 9 1.5 Data Warehouses for Data Mining 14 1.6 Business Aspects of Data Mining: Why a Data-Mining Project Fails 17 1.7 Organization of This Book 21 1.8 Review Questions and Problems 23

Related Documents:

DATA MINING What is data mining? [Fayyad 1996]: "Data mining is the application of specific algorithms for extracting patterns from data". [Han&Kamber 2006]: "data mining refers to extracting or mining knowledge from large amounts of data". [Zaki and Meira 2014]: "Data mining comprises the core algorithms that enable one to gain fundamental in

Data Mining and its Techniques, Classification of Data Mining Objective of MRD, MRDM approaches, Applications of MRDM Keywords Data Mining, Multi-Relational Data mining, Inductive logic programming, Selection graph, Tuple ID propagation 1. INTRODUCTION The main objective of the data mining techniques is to extract .

October 20, 2009 Data Mining: Concepts and Techniques 7 Data Mining: Confluence of Multiple Disciplines Data Mining Database Technology Statistics Machine Learning Pattern Recognition Algorithm Other Disciplines Visualization October 20, 2009 Data Mining: Concepts and Techniques 8 Why Not Traditional Data Analysis? Tremendous amount of data

Begum Rokeya University, Rangpur. SSC-CGPA-3.31 HSC- GPA-5.00 B.Sc (Hons.)-l't M.Sc- I't University of Rajshahi Reg: 2963 Date:2311012016 l.Prof. Dr. Md. Zahidul Hasan Dept. ofGeography and Environmental Studies University of Rajshahi Co-Supervisor Z.Prof. Dr. Md. Mizanoor Rahman Dept.

Course No. 211 : History of Western Education in Bengal B. A (Hons.) Part 3 Examination : 2018 Marks : 750 Course No. 301 : History of Bengal, 1757-1947 Course No. 302 : History of th e United States of America 1776-1865 Course No. 303 : History of Europe, 1453-1789 Course No. 304 :

Data Mining CS102 Data Mining Looking for patterns in data Similar to unsupervised machine learning Popularity predates popularity of machine learning "Data mining" often associated with specific data types and patterns We will focus on "market-basket" data Widely applicable (despite the name) And two types of data mining patterns

enable mining to leave behind only clean water, rehabilitated landscapes, and healthy ecosystems. Its objective is to improve the mining sector's environmental performance, promote innovation in mining, and position Canada's mining sector as the global leader in green mining technologies and practices. Source: Green Mining Initiative (2013).

American Revolution in Europe working to negotiate assistance from France, Spain, and the Netherlands. Foreign Assistance French ultimately provided critical military and financial assistance Spain and the Netherlands provided primarily financial assistance to the American cause. A comparison of the resources held by the British and by the colonies: The population of the thirteen colonies .