Introduction To Spatial Data Mining

2y ago
23 Views
2 Downloads
1.30 MB
31 Pages
Last View : 30d ago
Last Download : 3m ago
Upload by : Jenson Heredia
Transcription

Introduction to Spatial Data Mining7.17.27.37.47.57.6Pattern DiscoveryMotivationClassification TechniquesAssociation Rule Discovery TechniquesClusteringOutlier Detection

Introduction: a classic example for spatial analysisDisease clusterDr. John SnowDeaths of choleraepidemiaLondon, September 1854Infected water pump?A good representation isthe key to solving aproblem2

Good representation because.Represents spatial relation of objectsof the same typeRepresents spatial relation ofobjects to other objectsShows only relevant aspects andhides irrelevantIt is not onlyimportant where acluster is but also,what else is there (e.g.a water-pump)!3

Other examples of Spatial PatternsHistoric Examples (section 7.1.5, pp. 186)Fluoride and healthy gums near Colorado riverTheory of Gondwanaland - continents fit like pieces of a jigsaw puzlleModern ExamplesCancer clusters to investigate environment health hazardsCrime hotspots for planning police patrol routesBald eagles nest on tall trees near open waterNile virus spreading from north east USA to south and westUnusual warming of Pacific ocean (El Nino) affects weather in USA4

Goals of Spatial Data Mining Identifying spatial patterns Identifying spatial objects that arepotential generators of patterns Identifying information relevantfor explaining the spatial pattern(and hiding irrelevant information) Presenting the information in a waythat is intuitive and supports further analysis5

What is a Spatial Pattern ? What is not a pattern? Random, haphazard, chance, stray, accidental, unexpected Without definite direction, trend, rule, method, design, aim, purpose Accidental - without design, outside regular course of things Casual - absence of pre-arrangement, relatively unimportant Fortuitous - What occurs without known cause What is a Pattern? A frequent arrangement, configuration, composition, regularity A rule, law, method, design, description A major direction, trend, prediction A significant surface irregularity or unevenness6

What is Spatial Data Mining?MetaphorsMining nuggets of information embedded in large databases Nuggets interesting, useful, unexpected spatial patterns Mining looking for nuggetsNeedle in a haystackDefining Spatial Data MiningSearch for spatial patternsNon-trivial search - as “automated” as possible—reduce human effortInteresting, useful and unexpected spatial pattern7

What is Spatial Data Mining? - 2Non-trivial search for interesting and unexpected spatial patternNon-trivial SearchLarge (e.g. exponential) search space of plausible hypothesisEx. Asiatic cholera : causes: water, food, air, insects, ; water deliverymechanisms - numerous pumps, rivers, ponds, wells, pipes, .InterestingUseful in certain application domainEx. Shutting off identified Water pump saved human lifeUnexpectedPattern is not common knowledgeMay provide a new understanding of worldEx. Water pump - Cholera connection lead to the “germ” theory8

What is NOT Spatial Data Mining?Simple Querying of Spatial DataFind neighbors of Canada given names and boundaries of all countriesFind shortest path from Boston to Houston in a freeway mapSearch space is not large (not exponential)Testing a hypothesis via a primary data analysisEx. Female chimpanzee territories are smaller than male territoriesSearch space is not large !SDM: secondary data analysis to generate multiple plausible hypothesesUninteresting or obvious patterns in spatial dataHeavy rainfall in Minneapolis is correlated with heavy rainfall in St. Paul,Given that the two cities are 10 miles apart.Common knowledge: Nearby places have similar rainfallMining of non-spatial dataDiaper sales and beer sales are correlated in eveningsGPS product buyers are of 3 kinds: outdoors enthusiasts, farmers, technology enthusiasts9

Why Learn about Spatial Data Mining?Two basic reasons for new workConsideration of use in certain application domainsProvide fundamental new understandingApplication domainsScale up secondary spatial (statistical) analysis to very large datasets Describe/explain locations of human settlements in last 5000 yearsFind cancer clusters to locate hazardous environmentsPrepare land-use maps from satellite imageryPredict habitat suitable for endangered speciesFind new spatial patterns Find groups of co-located geographic features10

Why Learn about Spatial Data Mining? - 2New understanding of geographic processes for Critical questionsEx. How is the health of planet Earth?Ex. Characterize effects of human activity on environment and ecologyEx. Predict effect of El Nino on weather, and economyTraditional approach: manually generate and test hypothesisBut, spatial data is growing too fast to analyze manually Satellite imagery, GPS tracks, sensors on highways, Number of possible geographic hypothesis too large to explore manually Large number of geographic features and locations Number of interacting subsets of features grow exponentially Ex. Find tele connections between weather events across ocean and land areasSDM may reduce the set of plausible hypothesisIdentify hypothesis supported by the dataFor further exploration using traditional statistical methods11

Interactive Exploratory AnalysisChoropleth maps showingdistribution of variable(s)in spaceParallel Coordinate PlotCombining spatialand non-spatialdisplaysVariables selectedand manipulated bythe userPowerful for lowdimensionaldependencies (3-4)Displays dynamically linkedScatter Plot12

Data Mining: A KDD ProcessSelection: Obtain data from various sources.Preprocessing: Cleanse data.Transformation: Convert to common format.Transform to new format.Data Mining: Obtain desired results.Interpretation/Evaluation: Present results to user inmeaningful manner13

Data Mining: Confluence of Multiple onTheoryStatisticsData MiningVisualizationAlgorithms, , otherDisciplines14

Primary Data Mining TasksDescriptive ModelingFinding a compact description for large datasetClustering: group objects into groups based on their attributesAssociation rules: correlate what events are likely to occur togetherSequential rules: correlate events ordered in timeTrend detection: discovering the most significant changesPredictive ModelingClassification: assign objects into groups by recognizing patternsRegression: forecasting what may happen in the future by mappinga data item to a predicting real-value variable15

What is Cluster Analysis?Finding groups of objects such that the objects in a groupwill be similar (or related) to one another and differentfrom (or unrelated to) the objects in other groupsIntra-clusterdistances areminimizedInter-clusterdistances aremaximized16

ClusteringCluster: a collection of data objectsSimilar to one another within the same clusterDissimilar to the objects in other clustersClusteringGrouping a set of data objects into clusters based onthe principle: maximizing the intra-class similarity andminimizing the interclass similarityExampleLand use: Identification of areas of similar land use inan earth observation databaseCity-planning: Identifying groups of houses accordingto their house type, value, and geographical location17

Association ruleAssociation (correlation and causality)age(X, “20.29”) income(X, “20.29K”)“PC”) [support 2%, confidence 60%]buys(X,Association rule miningFinding frequent patterns, associations, correlationsamong sets of items or objects in transactiondatabases, relational databases, and other informationrepositoriesFrequent pattern: pattern (set of items, sequence, etc.)that occurs frequently in a databaseMotivation: finding regularities in dataWhat products were often purchased together?18

Example: Association ruleTransaction-idItems bought10a1,a2, a320a1, a330a1, a440a2, a5, a6 Itemset A1,A2 {a1, , ak} Find all the rules A1ÆA2 withmin confidence and support– support, s, probability that atransaction contains A1 A2– confidence, c, conditionalprobability that a transactionhaving A1 also contains A2.Let min support 50%,min conf 50%:a1 Æ a3 (50%, 66.7%)a3 Æ a1 (50%, 100%)19

Deviation DetectionOutlier: a data object that does not comply with thegeneral behavior of the dataIt can be considered as noise or exception but is quiteuseful in fraud detection, rare events analysisTrend and evolution analysisTrend and deviation: regression analysisPeriodicity analysisSimilarity-based analysis20

Classification and RegressionClassification:constructs a model (classifier) based on the training setand uses it in classifying new dataExample: Climate Classification, Regression:models continuous-valued functions, i.e., predictsunknown or missing valuesExample: stock trends prediction, 21

Classification (1): Model ConstructionTrainingDataNAM EM ikeM aryBillJimDaveAnneRANKYEARS TENUREDAssistant Prof3noAssistant Prof7yesProfessor2yesAssociate Prof7yesAssistant Prof6noAssociate Prof3noClassificationAlgorithmsClassifier(Model)IF rank ‘professor’OR years 6THEN tenured ‘yes’22

Classification (2): Prediction Using theModelClassifierTestingDataUnseen Data(Jeff, Professor, 4)NAM ETomM erlisaGeorgeJosephRANKYEARS TENUREDAssistant Prof2noAssociate Prof7noProfessor5yesAssistant Prof7yesTenured?23

Classification TechniquesDecision Tree InductionBayesian ClassificationNeural NetworksGenetic AlgorithmsFuzzy Set and Logic24

RegressionRegression is similar to classificationFirst, construct a modelSecond, use model to predictunknown valueMethodsLinear and multiple regressionNon-linear regressionRegression is different fromclassificationClassification refers to predictcategorical class labelRegression models continuous-valuedfunctions25

Spatial Data MiningSpatial PatternsHotspots, Clustering, trends, Spatial outliersLocation predictionAssociations, co-locationsPrimary TasksSpatialSpatialMiningSpatialData Clustering AnalysisOutlier AnalysisSpatial Association RulesClassification and PredictionExample: Unusual warming of Pacific ocean (ElNino) affects weather in USA 26

Spatial Data MiningSpatial data mining follows along the samefunctions in data mining, with the end objective tofind patterns in geography, meteorology, etc.The main difference: spatial autocorrelationthe neighbors of a spatial object may have an influenceon it and therefore have to be considered as wellSpatial attributesTopological adjacency or inclusion informationGeometric position (longitude/latitude), area, perimeter, boundarypolygon27

ExampleWhat Kind of Houses Are Highly Valued?—Associative Classification28

Example: Location Prediction Question addressed Where will a phenomenon occur? Which spatial events are predictable? How can a spatial events be predictedfrom other spatial events? Equations, rules, other methods, Examples: Where will an endangered bird nest ? Which areas are prone to fire given mapsof vegetation, draught, etc.? What should be recommended to atraveler in a given location?29

Example: Spatial Interactions Question addressed Which spatial events are related to each other? Which spatial phenomena depend on other phenomenon? Examples: Exercise: List two interaction patterns.30

Example: Hot spots Question addressed Is a phenomenon spatially clustered? Which spatial entities or clusters are unusual? Which spatial entities share common characteristics? Examples: Cancer clusters [CDC] to launch investigations Crime hot spots to plan police patrols Defining unusual Comparison group: neighborhood entire population Significance: probability of being unusual is high31

Spatial Data Mining Spatial data mining follows along the same functions in data mining, with the end objective to find patterns in geography, meteorology, etc. The main difference: spatial autocorrelation the neighbors of a spatial object may have an influence on it and therefore hav

Related Documents:

Preface to the First Edition xv 1 DATA-MINING CONCEPTS 1 1.1 Introduction 1 1.2 Data-Mining Roots 4 1.3 Data-Mining Process 6 1.4 Large Data Sets 9 1.5 Data Warehouses for Data Mining 14 1.6 Business Aspects of Data Mining: Why a Data-Mining Project Fails 17 1.7 Organization of This Book 21 1.8 Review Questions and Problems 23

Data Mining and its Techniques, Classification of Data Mining Objective of MRD, MRDM approaches, Applications of MRDM Keywords Data Mining, Multi-Relational Data mining, Inductive logic programming, Selection graph, Tuple ID propagation 1. INTRODUCTION The main objective of the data mining techniques is to extract .

DATA MINING What is data mining? [Fayyad 1996]: "Data mining is the application of specific algorithms for extracting patterns from data". [Han&Kamber 2006]: "data mining refers to extracting or mining knowledge from large amounts of data". [Zaki and Meira 2014]: "Data mining comprises the core algorithms that enable one to gain fundamental in

Spatial Big Data Spatial Big Data exceeds the capacity of commonly used spatial computing systems due to volume, variety and velocity Spatial Big Data comes from many different sources satellites, drones, vehicles, geosocial networking services, mobile devices, cameras A significant portion of big data is in fact spatial big data 1. Introduction

no unique set of data mining algorithms that can be used in all application domains. But we can apply different types of the data mining algorithms as an integrated architecture or hybrid models to data sets to increase the robustness of the mining system. GeoMiner, a spatial data mining system prototype was developed on the top of the DBMiner .

SPATIAL ECONOMETRIC MODELLING OF MASSIVE DATASETS: THE CON-TRIBUTION OF DATA MINING G. Arbia, 1 M. Tabasso 2 Abstract In this paper we provide a brief overview of some of the most recent empirical research on spatial econometric models and spatial data mining. Data mining in general is t

October 20, 2009 Data Mining: Concepts and Techniques 7 Data Mining: Confluence of Multiple Disciplines Data Mining Database Technology Statistics Machine Learning Pattern Recognition Algorithm Other Disciplines Visualization October 20, 2009 Data Mining: Concepts and Techniques 8 Why Not Traditional Data Analysis? Tremendous amount of data

Alfredo López Austin). Co-Edited Volume: Art and Media History –––Modern Art in Africa, Asia and Latin America: An Introduction to Global Modernisms. Boston: Wiley-Blackwell, 2012 (Elaine O’Brien, editor; Everlyn Nicodemus, Melissa Chiu, Benjamin Genocchio, Mary K. Coffey, Roberto Tejada, co-editors). Exhibition Catalogs ––– “Equivocal Documents,” in Manuel Álvarez Bravo (c