Geo-spatial Big Data Mining Techniques - Ijcaonline

1y ago
7 Views
2 Downloads
949.25 KB
9 Pages
Last View : 30d ago
Last Download : 3m ago
Upload by : Kelvin Chao
Transcription

International Journal of Computer Applications (0975 – 8887)Volume 135 – No.11, February 2016Geo-spatial Big Data Mining TechniquesMazin AlkathiriJhummarwala AbdulBhaskaracharya Institute forSpace Applications and GeoInformatics,Gandhinagar 382007, IndiaBhaskaracharya Institute forSpace Applications and GeoInformatics,Gandhinagar 382007, IndiaABSTRACTAs stated in literature by several authors, there has beenliterally big-bang explosion in data acquired in recent times.This is especially so about the geographical or geospatial data.The huge volume of data acquired in different formats,structured, unstructured ways, having large complexity andnon-stop generation of these data have posed aninsurmountable challenge in scientific and business worldalike. The conventional tools, techniques and hardwareexisting about a decade ago have met with the limitations inhandling such data. Hence, such data are termed as big data.This has necessitated inventing new software tools andtechniques as well as parallel computing hardwarearchitectures to meet the requirement of timely and efficienthandling of the big data. The field of data mining has beenbenefitted from these evolutions as well. This article reviewsthe evolution of data mining techniques over last two decadesand efforts made in developing big data analytics, especiallyas applied to geospatial big data. This is still a very activelyevolving field. There will be no surprise if some newtechniques are published before this article appears in print.M.B. Potdar, PhDBhaskaracharya Institute forSpace Applications and GeoInformatics,Gandhinagar 382007, IndiaWang et al.(1996) reported on the development of a softwaresystem for Data Mining based on the RDBMS, named asDBMiner, by integrating the three parts, viz. database, OLAPand data mining technologies. This system incorporatedseveral interesting data mining techniques, interactivelyperformed data mining at multiple levels of abstraction of anyuser specified set of data in a database or a data warehouse(Fig. 1). Efficient implementations of techniques wereexplored using different data structures, including multipledimensional data cubes having generalized relations. The datamining processes utilized user or expert defined set-groupingas well as schema-level concept hierarchies. The DBMinertightly integrated a relational database system with a concepthierarchy module.2. SPATIAL DATAThe Geo-Spatial data generated from are in general multidimensional in spatial, spectral and temporal domains. Theinformation content in these dimensions represents real worldfeatures. The data represent either radiance or reflectance orany other physical quantityKeywordsData mining, Distributed Computing, Hadoop, Big Data,Geospatial, Radoop, SpatialHadoop, Hadoop-GIS, Pigeon1. INTRODUCTIONAt the end of last century, the data were commonly stored inrelational databases and Structured Query Languages (SQLs)were used for information extraction from such databases.They were used extensively for development of decisionsupport systems for managing businesses profitably as well asby governments in planning and execution of people friendlydevelopmental programs. The data stored in the databases anddata warehouses have grown fast with the increase in size ofthe storage media over last decade or so resulting inrequirement of new techniques which can surmount thelimitations of the traditional analysis techniques.Consequently, this has led to the development of new datawarehousing and data mining techniques for analysis of largevolumes of data. These techniques have enabled retrieval ofinteresting and useful knowledge.With the scaling up of the storage and the processingcapabilities, the time was ripe for emergence the concept ofparallel computing. Several architectures were proposed andframeworks were developed to take advantage of recentdevelopments of hardware. Among them, the mostnoteworthy are Network Compuing, Hadoop Framenwork fordistributed computing, Cloud computing platforms, CUDAComputing using the array of processors in GPUsmanufactured by NVIDIA, and recently developed OpenMPprogramming model based on Intel Xeon Phi co-processors.They use distributed data model wherein it will be possible toaccess many storing and processing units.Fig. 1: Schematics Layout of DBMiner (Wang et al. 1996)associated with a ground resolution element, called pixel.Such data are known as raster data (fig. 2). The theme baseddata files generated from the raster data or any other collateraldata are known as vector data. Spatial data are referenced bytheir location co-ordinates (e.g. latitude and longitude) in theGeographic Information System (GIS) for geo-spatialanalysis. The common applications of spatial data usage arefor: Proximity assessment. Entity identification and estimation of likeness orsimilarities. Geometriccomputationrelationships. Digital representation of elevation data. Topological matching and pattern analysis. Multidimensional data representation.andgeo-spatial28

International Journal of Computer Applications (0975 – 8887)Volume 135 – No.11, February 2016explicitly stored in spatial databases. The DBMiner packagedescribed above did not support this feature of spatial datamining. To deal with the huge amount of geo-spatial data, aneed arose for advanced and special purpose data miningsystem which can extract important knowledge from bothspatial and non-spatial objects in large database. Also there isno unique set of data mining algorithms that can be used in allapplication domains. But we can apply different types of thedata mining algorithms as an integrated architecture or hybridmodels to data sets to increase the robustness of the miningsystem.Fig.2: Representation of raster and vector data andvectorization of raster data.3. SPATIAL DATA ANALYSIS ANDMININGThe standard image processing tools and techniques areapplied to extract the spatial and spectral information from theraster data and to characterize the physical, chemical,biological, geological as well as geophysical processes usingmulti-temporal images. There are many old examples showingthat it was possible to visualize the geospatial data in a goodhelpful way before the computer came into existence andwere subject to the data mining (DM) techniques. TheMapanalysis of Napoleon’s Russian campaign (Burch andGrudnitski, 1989) and the work of Dr John Snow at the timeof the great cholera epidemic in London in 1854, are wellknown early examples of information extraction from mapsand cause effect analysis of them without the aid of computer.The Geographic Information Science (GISc) deals with therelationships among the spatial patterns and processes andtheir temporal dynamics. With the massive increase in thedata volume due to global coverage by sensors on-boardsatellite systems, since the late nineteen seventies, the GISchas come to age. The massive increasing of the geographicaldata due to long legacy in space technology andadvancements in satellite based sensor technology, satellitetelemetry on one hand and the complexity of data storage andretrieval technologies, handling and analysis of large volumeof data through networking of computers and parallelprocessing software technologies on the other hand, the issuesof geographic science can now be addressed. One of suchsoftware technology field, Data Mining, offers a good solutionto help converting the geographical data into information andextract knowledge out it.Geospatial Data Mining should be understood as a specialtype of DM that seeks to carry out generic functions similar tothose of conventional DM, thoroughly modified to safeguardthe spatial aspects of geo-information. There are manydefinitions for spatial data analysis. Anselin (1993) definitionis "the statistical study of phenomena that manifest themselvesin space". Another definition by (Bailey, 1994) is ―a generalability to manipulate spatial data into different forms andextract additional meaning as a result‖. From these definitionswe get the recognition of space as a source of explanation forthe patterns presented by the different phenomena within it.The geo-spatial data mining is based on the foundation laid bythe First Law of Geography, which is stated as: "everything isrelated to everything else, but near things are more relatedthan distant things" (Anselin, 1993). Therefore, to a greatextent, it is not possible to use the traditional methods ofclassical statistics to analyze spatial data, given the importantrole that location plays in understanding phenomena observedin space. Han et al. (1997) stated that the Spatial data miningas that aspect of data mining which deals with the extractionof knowledge, spatial relationships, or any hidden patterns notGeoMiner, a spatial data mining system prototype wasdeveloped on the top of the DBMiner system(Han et al.,1997). In this system, the non-spatial data were handled by theDBMiner system, while the functions for mining spatial dataand the relationships between spatial and non-spatial datawere handled by GeoMiner functions. A query language, GeoMining Query Language (GMQL), was designed as anextension to Spatial SQL. It considered knowledge discoveryfrom only single thematic map. Due to its obvious limitations,the need was felt to enhance its capability to deal with a verylarge amount of data, and handle streaming data as well.The next stage was the development of Spatial Data Analysisand Modelling (SDAM) software system (fig. 3) thatexecuted programs developed in different environments (C,C , MATLAB) through a unified control and a simpleGraphical User Interface (Lazarevic, Fiez, & Obradovic,2000). The SDAM could be run on a local machine and alsoremotely. Data security was ensured by using passwords.The users could use learning algorithms remotely to buildprediction models for each remote data set. They alsopresented a more advanced distributed SDAM for includingmodel for data management over distributed sites and morecomplex methods for combining data classifiers.Fig. 3: SDAM under unified GUI (Lazerevic et al. 2000)(Bação, 2006), in his thesis, proposed to divide the spatialData Mining (DM) models used in science into threefundamental types: Deterministic models Parametric models Non-parametric models.For the development of a robust theoretical framework toserve as a basis for developing a new GISc, there are twoperspectives:A.Regard DM as a ―black box‖ within GISc 9, andB.Use DM, but assign an important role to the preprocessing stage.29

International Journal of Computer Applications (0975 – 8887)Volume 135 – No.11, February 2016(Bação, 2006) also introduced the issues of Geospatial DataMining (GDM) which fitted it into the broader setting of GIScand provide the framework for Geo-Spatial Data Mining(GSDM). GSDM included three generations of GIS datamodels are existing, viz. (i) the CAD data model, (ii) the Georelational data model, and (iii) the object–relation data model.The last one abstracts geographical entities as object ofclasses with attributes, behavior and the associated rules andrelationships between objects. Some of the benefits of theGeo-database model are that they are close to humanunderstanding and are expressions of real world objects. Theyalso have a better capacity of expansibility, and therelationships between spatial data are fully expressed. All ofspatial data together with attribute data can be stored andcentrally managed in a DBMS. There was a provision to editgeographic data simultaneously by many users. Yin and Su(2006) implemented a Model for Geospatial Database asNational Fundamental Geographic Information which usedthe DBMS Oracle9i to store geospatial objects, and the Geodatabase model was applied to describe and organize thegeospatial entities and their relations.The spatial features of a particular location can be stored inthe geographic databases, in which each one of those featuresis usually located or stored in a different relation. The processof data preparation, storing and analyzing is a time consumingtask which is an issue of concern in the spatial data miningsystems. One of the obvious solutions is to automate thisprocess. Bogorny et al. (2007) presented a package named―Weka-GDPM ‖, which was an extension of Weka to supportthe automation of the spatial data preparation, storing andanalyzing for mining from geographic data. The fig. 4 showsthe different geographic data storage methods under thegeographical database management systems (GDBMS)following the Open Geo-spatial Consortium (OGC)specifications. It also contains a knowledge repository forstoring well known geographic associations extracted fromgeographic database schemas, Geo-Ontologies, and thoseprovided by the users. It can seen that the first part of thediagram contains different data mining algorithms for the taskof extracting the needed knowledge out of the database. Thesecond part of this diagram is a new one which takes care ofthe spatial data preparation and it is located in the center tosolve the problem that exists between the GDBMS and thedata mining tools. The JDBC is used to reach to the data. Thedata is retrieved, preprocessed, and transformed into a singletable format according to the user specifications.Fig. 4: Integrated spatial data mining framework(Bogorny et al. 2007)4. GEO-SPATIAL DATA AS BIG DATAAccording to some authors there have been big-bangexplosion of data in recent years. The data acquisitions at therates of terabytes per day are quite common. The spatial dataon mobile phone users, rail and air travelers, marketing,consumers, goods production and wide range of dailyactivities are producing large volume of data which are ofinterest for Data Mining. The remote sensing data are noexception to these. A large number of meteorological, landand ocean observations sensors on board satellites arecontinuously down pouring data from space. The analyses ofsuch large data present their own challenges, even thoughwith highly powerful processors and high speed data accesspossible presently.One of the characteristics of this volume of data generated istheir formats. The data may be structured, semi-structured andun-structured formats, which further complicate their analysis.As a historical study of the term ―Big Data‖, it is noticed thatit used first time in by Mashey(1997). Diebold (2000)presented the first academic paper with the words ―Big Data‖in the title in 2000, and Doug Laney was the first one tointroduce the 3 Vs characterizing the Big Data. The Big Datadata are sets whose size (volume), complexity (variability),and rate of growth (velocity) make them complex to becollected, managed, processed or analyzed by currenttechnologies and tools (Bhosale and Gadekar, 2014). The BigData analysis require ingenious steps for data analysis,characterized by its three main components: variety, velocityand volume” (Sagiroglu & Sinanc, 2013).On the top of this framework there are:1.The Metadata retrieval module: for retrieving allrelevant information from a database,2.The Dependence Elimination module: forverifying all associations between the target featuretype and all relevant feature types,3.The Spatial Join module: for computing ips, and4.The Transformation module: for transposing aswell as discretizing the Spatial Join module outputinto the single table representation, understandableby data mining algorithmsSagiroglu and Sinanc (2013) compared two of the existing bigdata analysis platforms, viz. Hadoop framework and HighPerformance Computing (HPC) clusters (HPCC).The hadoopis a known open source Java based framework, whichincludes, firstly, a system of several main parts starting withthe distributed file system, and the secondly the system thatmanages the workflow and the distribution of the processes inthe system. Apache Hadoop is an open source implementationof Google’s MapReduce framework; it comes with itsdistributed file own system (HDFS), which was derived fromGoogle File System (GFS). The HDFS has many benefitssuch as low cost of storing the data, data redundancy byreplication, huge storage abilities, balanced utilization ofstorage, high fault-tolerance, high throughput rate, andscalability. These two main parts of Apache hadoop are theHDFS and MapReduce are shown in Fig. 5. The Hadoopframework also provides many other sub-projects, viz. HBase,Pig, Hive, Sqoop, Avro, Oozie, Chukwa, Flume, andZooKeeper, each with its specific focus.30

International Journal of Computer Applications (0975 – 8887)Volume 135 – No.11, February 2016Fig. 5: Comparison of Hadoop and HPCC frameworks(Sagoroglu and Sinanc, 2013)The HPCC Systems is also an open source, distributed, dataintensive computing platform. It provides big data workflowmanagement services. The HPCC data model is defined by theuser. The three main HPCC components are: HPCC DataRefinery (Thor), HPCC Data Delivery Engine (Roxie), andEnterprise Control Language (ECL). Here, in this paper, theHadoop framework is dealt with in more details; especially incontext of geo-spatial data analysis.5. HADOOP FRAMEWORK FORSPATIAL DATA ANALYSISThe main concept of the framework is segregated in two parts,viz. the Hadoop Distributed File System (HDFS) for storingdata and the MapReduce programming model to process thedata (fig. 6). It is a distributed data processing framework. Itwas first developed by Google and later it was adopted byApache Foundation. It is capable of managing large volumedata efficiently, using a large number of commodity hardwareas Nodes forming a hadoop cluster. The MapReduce model ofdata analysis is composed of map and reduce functions. Theconcepts of the framework are based on divide and conquerrule, in which, the big volume data are divided into smallclusters and processed parallel as SIMD or MIMD techniques.The map function reads data and converts it into key-valuepairs. Later the reduce function, after shuffling the key-valuepairs are converted into key-multiple values pairs. A job isdivided into the following five tasks: (1) Iteration over theinput data, (2) Computation of key/value pairs from each ofinput data, (3) Grouping of all intermediate values by key, (4)Iteration over the resulting groups, and (5) Reduction of eachgroup.Economides et al. (2013) presented a new Spatial DatabaseDesign or Spatial Database Management System (SDBMS)named MIGIS which is a Hadoop based framework forhandling efficiently complex spatial queries with highperformance. This Hadoop framework is further extended byYSmart and RESQUE. A typical database needs an index inorder to achieve fast query processing. The R-Tree basedindices group nearby entities and represent them using theirminimum bounding rectangle. The R-Tree can be built inthree-phases, in the first phase; each of close input is dividedaccording to the size properties. In the second phase, hadoopproduces lower level R-Trees processes out of the divideddata. And in the last phase, the lower level R-Trees arecombined to form complete R-Tree index of the data.Cary et al. (2009) showed how MapReduce model solvedfollowing two typical and representative spatial dataprocessing problems fast and efficiently: The large scale construction of R-Trees, a popularindexing mechanism for spatial search queryprocessing, and Processing digital aerial imagery and computationand storing image quality characteristics asmetadata.The key contribution of this work was presentation of atechnique for large scale building R-Trees based on theMapReduce model, and also to show how MapReduce can beapplied to big parallel processing of raster data. They alsoevaluated the algorithms performance using differentmatrices.The algorithm had following three phases(1) Computation of partitioning function (f): It is afunction calculated quantity using the inputs for thisphase which are the data set and a positive number R,which represents the number of partitions.(2) R-Tree construction: The partitioning function (f)calculated in the first phase is used by Mappers to dividethe data set into R partitions. Also build R-Tree indicesin each of the input partitions.(3) R-Tree consolidation: This phase combines the Rindividual R-Trees, built in the second phase, under asingle root node to form the final R-Tree index of thedata set.For supporting spatial queries on Hadoop, many newtechniques have come to the surface. But, most of themrequire internal modifications of frameworks. Lee et al. (2014)have come up with a spatial index especially for big data,based on a hierarchical spatial data structure stored indistributed file storage systems. This spatial index has severaladvantages; such as: it can be implemented without changingthe internal implementation of the existing storage systems,simple and efficient filtering, and supports to updates ofspatial objects.Fig. 6: Architecture of Hadoop Distributed File System(HDFS) (Source: http://hadoop.apache.org)Hadoop can be used for many applications involving largevolume of Geospatial as well as Spatio-temporal dataanalysis, Biomedical Imagery analysis, and even forsimulation of various physical, chemical and computationallyintensive application biological processes. However, in thispaper, we emphasize the spatial usage of hadoop framework.For this spatial indexing, a hierarchical spatial data structure,called geo-hash is used. It is a geo-coding system of latitudesand longitudes. The longer the geo-hash code the smaller thebounding. There are two categories of the existing spatialquery processing techniques, viz: In the first, small representatives of spatial objects areused like the k-Nearest Neighbour (kNN) selectionqueries.31

International Journal of Computer Applications (0975 – 8887)Volume 135 – No.11, February 2016 In the second, the whole data set is used for the queryingpurpose. This type of query called low selectivity query.meters and less) images. The Very High Resolution (VHR)images allow recognition of structures in the images.The other basis queries for mainly spatial data are:More examples applications are: Containing: This query returns all spatial objectscontaining the given search geometry. ContainedIn: This query returns all spatial objectscontained by the given search geometry Intersects: This query returns all spatial objects thatintersecting a given search geometry. WithinDistance: This query (or also called range query)returns all spatial objects within a given distance from agiven search geometry.Above tasks require feature extraction and selection, indexing,machine learning, pattern matching etc. Most of these tasksoften deal with segments and objects instead of pixels.Computation parameter of similarity between image patches(Housdorff distance) is computationally expensive. The largescaling of these algorithms (to global applications) requiresefficient and novel algorithmic solutions, and also requireexascale computing infrastructure to support large scale toglobal spatiotemporal applications. In such situations, thedistributed and parallel computing techniques, such as Apachehadoop framework, CUDA computing architecture and MPIbased architecture etc., come handy.The Spatiotemporal data is one type of the spatial data thatalso can have a very big size and is a problem while analyzingthe data. Also, the very large volume of the spatiotemporaldata generated by different social media networks is veryimportant and useful in various domains like the commercialdomain or more importantly in the disaster mapping ornational security applications. Also much of the huge amountof data that Google generates is spatiotemporal data. Toprocess such type of large volumes of data, there is a need forresearch in developing efficient management techniques andanalytical infrastructure for such massive to big spatial data.One of the main requirements of any data mining algorithm isthat it should take into account the spatial and temporalautocorrelations existing, if any. The explicit modeling ofspatial dependencies increases computational complexity. TheData mining primitives that explicitly model spatialdependencies are (Vatsavai et al., 2012): Spatial Autoregressive (SAR) model in which for theprediction, often spatial dependencies are modeled usingregression technique, Markov Random Field (MRF) model in which thespatial dependencies in classification are often modeledas the extension of a priori probabilities in a Bayesianclassification framework, and Gaussian Processes Learning and Mixture Models formodeling spatial heterogeneity in classification of largegeographic regions. Used also in change detectionstudies.There are many examples of applications which deal with biggeospatial data. For example, the application on BiomassMonitoring requires high temporal resolution satellite remotesensing imagery. The MODIS instrument on board NASA’sTerra satellite is providing a opportunity for continuousmonitoring of biomass over large geographic regions. Sincethe data at global scale is difficult to handle due to its largevolume and formats complexity, the data from the MODISsensor is organized into tiles of 10o 10olatitudes andlongitudes (4800 x 4800 MODIS pixels). Also, thecomputational complexity of change detection algorithms isvery high due to varying atmospheric contributions, varyingsensor parameters, sensors look angles and scan angles, inaddition, season dependent solar illumination levels.The second example, in the above category, can be searchingfor complex patterns in geospatial data. Most of the patternrecognition as well as the machine learning algorithms areper-pixel. Such methods work well for thematic classificationof moderate to high to very high resolution (pixel size of 5 Recognizing complex spatial patterns in an urban areasto map informal and formal settlements,Recognizing critical infrastructure establishments (suchas nuclear, thermal, and chemical plants, airports,shopping and sports complexes), andImage based content search and retrieval on applicationsof classification and clustering techniques.There are many data mining packages are in vogue and quitepopular among the data miners. To name some most popularpackages are Open Office, WEKA, KNIMIE, andRapidMiner. The Rapid Miner has a clear and user friendlyinterface that makes it easy to learn. It enables to design thedata mining work flow visually and investigate results whileexecution. Though, the above packages are highly rich infunctionalities and visualization tools, but they havelimitations in terms of size of data being handled andefficiency of large scale data processing.6. RADOOPPrekopcsak etal.(2011) added Hadoop extension toRapidMiner, calling the extended version as Radoop (fig. 7).The extension was designed to achieve a close integration ofhadoop and rapid miner and provide hadoop functionalitiescommonly used in memory-based RapidMiner processes.Also it keeps the convenient RapidMiner features likemetadata transformations and breakpoints.The user interface is one important part of any platform orframework, and it should be a user friendly and make theinteraction of the user with the platform easier and faster. Inthe case of the Hadoop revolution many distributed analyticsystems have been developed that are strong and faulttolerance and support many other features. But these systemsare usually very hard to use and interact with them.Radoop creates a process by adding the RadoopNest metaoperator containing general settings for the cluster (such asthe IP address of the Hadoop master node), and all otherRadoop operators are used inside this meta-operator.Prekopcsak et al. (2011) describe data handling and theseveral possibilities of uploading the data to a cluster forprocessing as well as the data preprocessing and modelingoperators of Radoop.For the data handling in RapidMiner, the data tables areobjects of ExampleSet and normally stored in memory. InRadoop, the data tables are stored in Apache Hive and useHadoopExampleSet object to describe the data stored. TheHadoopExampleSet not only stores several pointers and32

International Journal of Computer Applications (0975 – 8887)Volume 135 – No.11, February 2016settings, but all data is stored in Hive in the distributed filesystem resulting in no significant memory consumptionduring Radoop based data processing. Also, the Reader andWriter operators are implemented to enable transfer of largedata files from right inside RapidMiner. These operators workwith parsing every row in the dataset which has an overhead.So the big CSV (Comma Separated Variable) formatted filescan be uploaded to HDFS and also loaded to Hive using thecommand line interface. The Store and Retrieve are extremelypowerful operators to write and read back intermediateresults.3.Complex spatial queries, such as spatial cross-matchingor overlay (large scale spatial join) and nearest neighborqueries,4.Integrated spatial and feature queries, for example,feature aggregation queries in a selected spatial regions,and5.Global spatial pattern queries, include queries onfinding high density regions, or queries to find directionalpatterns of spatial objects.Traditional Methods for Spatial Queries have the followinglimitations:1.Managing and querying spatial data on a massive scale,2.Reducing the I/O bottleneck by partitioning of data onmultiple parallel SDBMSs disks,3.Optimzing for computationally intensive operations suchas geometric computations,4.Locking of effective spatial partitioning mechanism(s)to balance data loads and task loads across databasepartitions, and5.High data loading overheads.7.1 ResqueFig. 7: Radoop Architecture (Prekopcsak et al. 2011)Radoop has a built in excellent powerful mechanism for thedata transformations using views in Hive Query Language(HiveQL). It was expensive data writing in the results table tothe distributed file system only when needed. Radoopsupports many data transformations, such as selectingattributes, generating new attributes, filtering examples,sorting, renaming, t

no unique set of data mining algorithms that can be used in all application domains. But we can apply different types of the data mining algorithms as an integrated architecture or hybrid models to data sets to increase the robustness of the mining system. GeoMiner, a spatial data mining system prototype was developed on the top of the DBMiner .

Related Documents:

Spatial Big Data Spatial Big Data exceeds the capacity of commonly used spatial computing systems due to volume, variety and velocity Spatial Big Data comes from many different sources satellites, drones, vehicles, geosocial networking services, mobile devices, cameras A significant portion of big data is in fact spatial big data 1. Introduction

Spatial Data Mining Spatial data mining follows along the same functions in data mining, with the end objective to find patterns in geography, meteorology, etc. The main difference: spatial autocorrelation the neighbors of a spatial object may have an influence on it and therefore hav

The importance of big spatial data, which is ill-supported in the systems mentioned above, motivated many researchers to extend these systems to handle big spatial data. In this paper, we survey the ex-isting work in the area of big spatial data. The goal is to cover the different approaches of processing big spatial data in a distributed en-

and novel applications of Spatial Big Data Analytics for Urban Informatics. In this thesis, we de ne spatial big data and propose novel approaches for storing and analyzing two popular spatial big data types: GPS trajectories and spatio-temporal networks. We conclude the thesis by exploring future work in the processing of spatial big data. iii

Preface to the First Edition xv 1 DATA-MINING CONCEPTS 1 1.1 Introduction 1 1.2 Data-Mining Roots 4 1.3 Data-Mining Process 6 1.4 Large Data Sets 9 1.5 Data Warehouses for Data Mining 14 1.6 Business Aspects of Data Mining: Why a Data-Mining Project Fails 17 1.7 Organization of This Book 21 1.8 Review Questions and Problems 23

Nexo Geo M6 450 Watt 6,5" 1" / 80 (120 ) 20 CHF 70.00 Nexo Geo M6B 450 Watt 1 6,5" CHF 64.60 Nexo Geo S805 600 Watt 8" 1,4" / 80 5 CHF 75.40 Nexo Geo S830 600 Watt 8" 1,4" / 120 5 CHF 75.40 Nexo Geo S1210, Bi-Amp 1550 Watt 12" 1,4" / 80 10 CHF 96.95 Nexo Geo

3 Geo.513 Structural Geology and Geology of Nepal 4 100 4 Geo.514 Sedimentology 2 50 5 Geo.515 Practical of Geo.511 2 50 6 Geo.516 Practical of Geo.512 2 50 . Cambridge University Press, 536 p. 4. Adolf Seilacher, Trace Fossil Analysis, 2007, Springer Verlag, 238 p. 5. Moore, P. D., Webb, J.

ACCA ADVANCED DIPLOMA IN ACCOUNTING AND BUSINESS ETHICS AND PROFESSIONAL SKILLS MODULE Research and Analysis Project and Key Skills Statement ACCA DIPLOMA IN ACCOUNTING AND BUSINESS (RQF LEVEL 4) ACCA DIPLOMA IN ACCOUNTING AND BUSINESS (RQF LEVEL 4) ACCA GOVERNANCE ACCA (the Association of Chartered Certified Accountants) is the global body for professional accountants. We aim to offer .