Spatial Big Data Analytics For Urban Informatics

1y ago
11 Views
2 Downloads
3.79 MB
102 Pages
Last View : 11d ago
Last Download : 3m ago
Upload by : Aliana Wahl
Transcription

Spatial Big Data Analytics for Urban InformaticsA DISSERTATIONSUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOLOF THE UNIVERSITY OF MINNESOTABYMichael Robert EvansIN PARTIAL FULFILLMENT OF THE REQUIREMENTSFOR THE DEGREE OFDoctor of PhilosophyAdvisor: Professor Shashi ShekharAugust, 2013

c Michael Robert Evans 2013ALL RIGHTS RESERVED

AcknowledgementsI want to thank my advisor, Professor Shashi Shekhar, for his incredible support throughout my Ph.D. One phone call from him out of the blue back in 2008 truly changed mylife. His dedication and patience has been invaluable over the past five years. I alsowant to thank all of the professors who helped me over the years in classes and especially those who accepted to serve on my committee: Dr. Vipin Kumar, Dr. MohamedMokbel, and Dr. Francis Harvey. Each of your unique insights helped shape and craftthis work and my overall research interests. Thank you.I extend thanks to Prof. Shekhar’s spatial research group, members both past andpresent, for all the meetings, proposals and late nights we spent together. I will miss allof you and thank you for all the help you have given me over the years. Lastly I wantto thank my lovely wife and parents for all their support.i

AbstractUrban Informatics is the practice of using computer technology to support core cityfunctions: planning, governance and operations. This technology consists of hardware,software, databases, sensors, and communication devices used to develop and sustainmore livable and healthy cities. Urban Informatics provides governments with the toolsto make data-driven decisions regarding long-term plans, predict and respond to currentand upcoming situations, and even help with day-to-day tasks such as monitoring wateruse and waste processing. New and immense location-aware datasets formally definedin this thesis as Spatial Big Data are emerging from a variety of sources and can be usedto find novel and interesting patterns for use in urban informatics. Spatial big data isthe key component driving the emerging field of Urban Informatics at the intersection ofpeople, places, and technology. However, spatial big data presents challenges for existingspatial computing systems to store, process, and analyze such large datasets. With thesechallenges come new opportunities in many fields of computer science research, such asspatial data mining and spatial database systems. This thesis contains original researchon two types of spatial big data, each study focusing on a different aspect of handlingspatial big data (storage, processing, and analysis). Below we describe each data typethrough a real-world problem with challenges, related work, novel algorithmic solutions,and experimental analysis.To address the challenge of analysis of spatial big data, we studied the problem offinding primary corridors in bicycle GPS datasets. Given a set of GPS trajectorieson a road network, the goal of the All-Pair Network Trajectory Similarity (APNTS)problem is to calculate the similarity between all trajectories using the Network Hausdorff Distance. This problem is important for a variety of societal applications, suchas facilitating greener travel via bicycle corridor identification. The APNTS problem ischallenging due to the high cost of computing the exact Network Hausdorff Distance between trajectories in spatial big datasets. Previous work on the APNTS problem takesover 16 hours of computation time on a real-world dataset of bicycle GPS trajectories inMinneapolis, MN. In contrast, this work focuses on a scalable method for the APNTSproblem using the idea of row-wise computation, resulting in a computation time ofii

less than 6 minutes on the same datasets. We provide a case study for transportationservices using a data-driven approach to identify primary bicycle corridors for publictransportation by leveraging emerging GPS trajectory datasets. Experimental resultson real-world and synthetic data show a two orders of magnitude improvement overprevious work.To address the challenge of storage of spatial big data, we studied the problem ofstoring spatio-temporal networks in spatial database systems. Given a spatio-temporalnetwork and a set of database query operators, the goal of the Storing Spatio-TemporalNetworks (SSTN) problem is to produce an efficient data storage method that minimizesdisk I/O access costs. Storing and accessing spatio-temporal networks is increasinglyimportant in many societal applications such as transportation management and emergency planning. This problem is challenging due to strains on traditional adjacencylist representations when storing temporal attribute values from the sizable increasein length of the time-series. Current approaches for the SSTN problem focus on orthogonal partitioning (e.g., snapshot, longitudinal, etc.), which may produce excessiveI/O costs when performing traversal-based spatio-temporal network queries (e.g., routeevaluation, arrival time prediction, etc) due to the desired nodes not being allocated toa common page. We propose a Lagrangian-Connectivity Partitioning (LCP) techniqueto efficiently store and access spatio-temporal networks that utilizes the interaction between nodes and edges in a network. Experimental evaluation using the Minneapolis,MN road network showed that LCP outperforms traditional orthogonal approaches.The work in this thesis is the first step toward understanding the immense challengesand novel applications of Spatial Big Data Analytics for Urban Informatics. In thisthesis, we define spatial big data and propose novel approaches for storing and analyzingtwo popular spatial big data types: GPS trajectories and spatio-temporal networks. Weconclude the thesis by exploring future work in the processing of spatial big data.iii

ContentsAcknowledgementsiAbstractiiList of TablesviiList of Figuresviii1 Introduction11.1Urban Informatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21.2Spatial Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31.2.1Analysis of GPS Trajectories . . . . . . . . . . . . . . . . . . . .41.2.2Storage of Spatio-Temporal Networks . . . . . . . . . . . . . . .51.3Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71.4Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .81.5Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .92 Enabling Urban Informatics with Spatial Big Data102.1Defining Spatial Big Data . . . . . . . . . . . . . . . . . . . . . . . . . .132.2Spatial Big Data Opportunities . . . . . . . . . . . . . . . . . . . . . . .172.2.1Estimating Spatial Neighbor Relationships. . . . . . . . . . . .172.2.2Supporting Place-based Ensemble Models . . . . . . . . . . . . .182.2.3Simplifying Spatial Models . . . . . . . . . . . . . . . . . . . . .192.2.4On-line Spatio-Temporal Data Analytics . . . . . . . . . . . . . .19Spatial Big Data Infrastructure . . . . . . . . . . . . . . . . . . . . . . .202.3iv

2.42.3.1Parallelization of Spatial Big Data . . . . . . . . . . . . . . . . .202.3.2Difficulties of Parallelization . . . . . . . . . . . . . . . . . . . . .212.3.3Problems with Current Techniques . . . . . . . . . . . . . . . . .22Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Analysis of GPS Trajectories for Bicycle Corridor Identification22243.1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .243.2Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .283.2.1Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . .283.2.2Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . .303.3Computational Structure. . . . . . . . . . . . . . . . . . . . . . . . . .313.3.1Graph-Node Track Similarity Baseline (GNTS - B) . . . . . . . .323.3.2Graph-Node Track Similarity with Precomputed Distances (GNTS- P) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .333.4.1Matrix-Element Track Similarity (METS) . . . . . . . . . . . . .333.4.2Row-Wise Track Similarity (ROW-TS) . . . . . . . . . . . . . . .353.5Case Study: k-Primary Corridors for Commuter Bicyclists . . . . . . . .373.6Analytical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .393.6.1Cost Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . .423.7.1Experimental Goals . . . . . . . . . . . . . . . . . . . . . . . . .423.7.2Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . .433.7.3Experimental Results . . . . . . . . . . . . . . . . . . . . . . . .433.43.73.8Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Storage of Spatio-Temporal Networks for Advanced Routing4.14450Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .504.1.1Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .514.1.2Spatio-Temporal Networks (STN) . . . . . . . . . . . . . . . . .534.1.3Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . .554.1.4Related Work and Limitations . . . . . . . . . . . . . . . . . . .574.1.5Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59v

4.1.6Scope and Outline . . . . . . . . . . . . . . . . . . . . . . . . . .60Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .614.2.1Lagrangian-Connectivity Partitioning . . . . . . . . . . . . . . .634.2.2Cost Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .64Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . .654.3.1Experiment Setup: . . . . . . . . . . . . . . . . . . . . . . . . . .654.3.2LCP Approximation: ATSS . . . . . . . . . . . . . . . . . . . . .674.3.3Experimental Results . . . . . . . . . . . . . . . . . . . . . . . .684.4Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .714.5Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . .734.24.35 Conclusion and Future Work755.1Key Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .755.2Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .775.2.1Short-term Directions . . . . . . . . . . . . . . . . . . . . . . . .775.2.2Long-term Directions78. . . . . . . . . . . . . . . . . . . . . . . .References80vi

List of Tables1.1Examples of Current Urban Informatics Projects . . . . . . . . . . . . .21.2Thesis Framework: Spatial Big Data Analytics for Urban Informatics . .82.1Spatial Auto-Regression and the W -matrix . . . . . . . . . . . . . . . .173.1Output for the All-Pair Network Trajectory Similarity problem: a Trajectory Similarity Matrix for the input data in Figure 3.1 using NetworkHausdorff Distance.3.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . .30Network distance between node pairs; required for N HD(tB , tA ) (Input:Figure 3.1, Full trajectory similarity matrix shown in Table 3.1. . . . . .313.3Descriptive statistics about the case study dataset from [1]. . . . . . . .383.4CPU Execution Time on Bicycle GPS trajectories in Minneapolis, MN .383.5Notation used in this chapter. . . . . . . . . . . . . . . . . . . . . . . . .413.6Asymptotic Complexity of Track Similarity Algorithms. . . . . . . . .414.1Access Operators for Spatio-Temporal Networks from [2] . . . . . . . . .544.2Related work for Spatio-Temporal Network . . . . . . . . . . . . . . . .725.1Thesis Contributions: Spatial Big Data Analytics for Urban Informatics76vii

List of Figures1.1A commuter’s GPS tracks over three months reveal preferred routes.(Best viewed in color) . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.2Traffic speed measurements averaged over 30 days by time of day. Courtesy: [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.111Hurricane Rita and Evacuation Traffic. Source: National Weather Services and FEMA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.37Eco-routing supports sustainability and energy independence. (Best incolor) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.2512Engine measurement data improve understanding of fuel consumption [4].(Best in color). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .152.4Spatial Big Data on Historical Speed Profiles. (Best viewed in color) . .163.1Road network represented as an undirected graph with four trajectoriesillustrated with bold dashed lines. . . . . . . . . . . . . . . . . . . . .253.2Classifications of Hausdorff Trajectory Similarity Algorithms. . . . . . .263.3Inserting a virtual node (Avirtual ) to represent Track A for efficient Network Hausdorff Distance computation. . . . . . . . . . . . . . . . . . . .343.4Example input and output of the k-Primary Corridor problem. . . . . .363.5Set of 8 -primary corridors identified from bicycle GPS trajectories andcandidate corridors with varying restrictions on number of street traversals. 373.6Experiment Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.7Experimental results on synthetic data. Note the y-axis is in logarithmic42scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .494.1Airline travel information as a spatio-temporal network. . . . . . . . . .514.2The U.S. natural gas pipeline network. [5] . . . . . . . . . . . . . . . . .52viii

4.3Traffic speed measurements over 30 days on a portion of highway. Courtesy: [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .534.4Snapshot model of a spatio-temporal network . . . . . . . . . . . . . . .544.5Snapshot storage of a STN . . . . . . . . . . . . . . . . . . . . . . . . .574.6Longitudinal storage of a STN . . . . . . . . . . . . . . . . . . . . . . .594.7STN as a time-expanded network . . . . . . . . . . . . . . . . . . . . . .614.8Orthogonal partitioning of Spatio-Temporal Networks. . . . . . . . .624.9Lagrangian-Connectivity Partitioning. . . . . . . . . . . . . . . . . . .634.10 Minneapolis, MN road network [6] . . . . . . . . . . . . . . . . . . . . .664.11 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . .664.12 Aggregated Time-Stamped Snapshot . . . . . . . . . . . . . . . . . . . .674.13 Experiment 1 - The effect of the route length. Note that Snapshot andLongitudinal are overlapping. . . . . . . . . . . . . . . . . . . . . . . . .684.14 Experiment 2 - Effects of varying the size of data pages . . . . . . . . .694.15 Experiment 3 - Accuracy of the cost model in a Lagrangian path evaluation 704.16 Experiment 4 - Comparison of Non-Orthogonal Methods . . . . . . . . .714.17 Experiment 5 - Changing the Spatio-Temporal Network . . . . . . . . .724.18 Related work in Record Formats for Time Series Storage . . . . . . . . .735.1Many potential route solutions will require merging and grouping ofroutes, similar to trajectory similarity. . . . . . . . . . . . . . . . . . . .ix79

Chapter 1IntroductionUrban Informatics is a set of ideas and technologies that will transform the lives of everyday citizens by helping understand the urbanized world, knowing and communicatingour relation to people and places in that world, and generally enabling more livableand healthier cities. Recent examples of urban informatics in use across the world arelisted in Table 1.1. The city of Santander, Spain uses over “12,000 electronic sensorsthat track everything from traffic and noise to surfing conditions at local beaches [7]”.A reoccurring theme in the example cities given is a centralized computer system tocollect data and respond to events in real-time.New and immense location-aware datasets formally defined in this thesis as SpatialBig Data are emerging from a variety of sources and can be used to find novel andinteresting patterns for use in urban informatics. This large variety of disparate datasources is fostering the new field of Urban Informatics which has the potential to transform metropolitan services such as public health, transportation, and urban utilities. Inpublic health, spatial aspects, e.g., neighborhood context [8], are critical in understanding many contributors to disease process including environmental toxicant exposure aswell as human behavior and lifestyle choices. This exposome [9], a characterization of aperson’s lifetime exposures, is becoming an increasingly popular subject of research forpublic health [10].Urban Informatics requires the ability to store, process, and analyze spatial bigdata, something that challenges traditional spatial computing systems [11]. With thesechallenges come new opportunities in many fields of computer science research, such as1

2spatial data mining and spatial database systems. This thesis contains original researchon two types of spatial big data, each study focusing on a different aspect of handlingspatial big data (storage, processing, and analysis).Table 1.1: Examples of Current Urban Informatics ProjectsCityCountryUrban Informatics and Use-CasesSongdoSouth Korea“Computers will be built into the houses, streetsand offices as part of a “ubiquitous” network linking everyone in a sort of digital commune [12].”SmartSantanderSpain“Buried under the streets of Santander, Spain or discreetly affixed to buses, utility poles, anddumpsters - are some 12,000 electronic sensorsthat track everything from traffic to noise tosurfing conditions at local beaches [7].”PlanIT ValleyPortugal“using a centralized computer brain to control functions like water use, waste processing.from a network of sensors much like a nervoussystem to collect data and control the city [13].”Masdar CityAbu Dhabi“Everything is connected through a cloud toan Urban Operating System, which acts as thecity’s brain [13].”1.1Urban InformaticsUrban Informatics is the practice of using computer technology to support core cityfunctions: planning, governance and operations. This technology consists of hardware,software, databases, sensors and communication devices used to develop and sustainmore livable and healthy cities [14]. Urban Informatics provides governments with thetools to make data-driven decisions regarding long-term plans, predict and respond tocurrent and upcoming situations, and even help with day-to-day tasks such as monitoring water use and waste processing.Urban Informatics is an interdisciplinary field across many related disciplines (e.g.,Citizen Science [15], Urban Computing [16], Ubiquitous Computing [17]) and consist ofa variety of academic fields. Recently defined in the Handbook of Urban Informatics [18],

3urban informatics can be defined as:“the study, design, and practice of urban experiences across different urbancontexts that are created by new opportunities of real-time, ubiquitous technology and the augmentation that mediates the physical and digital layersof people, networks, and urban infrastructures.” (Forth, Choi, & Satchell,2011, p.4).Urban informatics and related technologies are beginning to quantify these toxicantexposures through the use of wide-spread sensing devices. For example, Accra, Ghanaused air quality measuring devices attached to smartphones to record spatio-temporalair quality information and compare it to the static daily reports given out by thegovernment [15]. This data-driven approach found a huge variation of air quality across asingle city and a single day, questioning single-source measurement for quantifying publichealth risks [15]. These case-studies provide ample motivation to push the developmentof Urban Informatics and the technologies behind it, such as Spatial Big Data.1.2Spatial Big DataIncreasingly, the size, variety, and update rate of spatial datasets exceed the capacity ofcommonly used spatial computing technologies to learn, manage, and process the datawith reasonable effort. We refer to these datasets as Spatial Big Data (SBD). A 2011McKinsey Global Institute report defines traditional big data as data featuring one ormore of the 3 “V’s”: Volume, Velocity, and Variety [19]. Examples of emerging SBDinclude temporally detailed roadmaps that provide traffic speed values every minute forevery road in a city, GPS trajectory data from cell-phones, and engine measurementsof fuel consumption, greenhouse gas emissions, etc. Temporally-detailed roadmaps areproviding more accurate travel time estimates for commuters depending on the time ofday. Location-based services are allowing cities to examine usage patterns of bike lanesand park trails and make data-driven decisions about placing new corridors. Spatialbig data is the key component driving the emerging field of Urban Informatics at theintersection of people, places, and technology. However, spatial big data presents challenges for existing spatial computing systems to store, process, and analyze such large

4datasets. With these challenges come new opportunities in many fields of computerscience research, such as spatial data mining and spatial database systems. Below wedescribe the societal applications, related work, and challenges of working with thesetwo example spatial big datasets.1.2.1Analysis of GPS TrajectoriesGPS trajectories are quickly becoming available for a larger collection of people due torapid proliferation of cell-phones, in-vehicle navigation devices, and other GPS datalogging devices [20] such as those distributed by insurance companies [21]. Such GPStraces allow analysis of people for a number of urban informatics use-cases. For example, indirect estimation of fuel efficiency and GHG emissions is possible via estimationof vehicle-speed, idling and congestion. They also make it possible to make personalizedroute suggestions to users to reduce fuel consumption and GHG emissions. For example, Figure 1.1 shows 3 months of GPS trace data from a commuter with each pointrepresenting a GPS record taken at 1 minute intervals, 24 hours a day, 7 days a week.As can be seen, 3 alternative commute routes are identified between home and workfrom this dataset. These routes may be compared for idling which are represented bydarker (red) circles. Assuming the availability of a model to estimate fuel consumptionfrom speed profile, one may even rank alternative routes for fuel efficiency. In recentyears, consumer GPS products [20, 22] are evaluating the potential of this approach.A key hurdle is the dataset size, which can reach 1013 items per year given constantminute-resolution measurements for all 100 million US vehicles.Trajectory pattern mining is a popular field with a number of interesting problemsboth in geometric (Euclidean) spaces [24] and networks (graphs) [25]. A key componentto traditional data mining in these domains is the notion of a similarity metric, the measure of sameness or closeness between a pair of objects. A variety of trajectory similaritymetrics, both geometric and network, have been proposed in the literature [26]. Onepopular metric is Hausdorff distance, a commonly used measure to compare similaritybetween two geometric objects (e.g., polygons, lines, sets of points) [27]. A numberof methods have focused on applying Hausdorff distance to trajectories in geometricspace [28, 29, 30, 31].

5(a) GPS Trace Data. Color indicates speed.(b) Routes 1, 2, 3, & 4 [23].Figure 1.1: A commuter’s GPS tracks over three months reveal preferred routes. (Bestviewed in color)Hausdorff distance has been shown to be a useful tool in geometric space for measuring similarity between trajectories, but applying Hausdorff distance to network-basedtrajectories is non-trivial. A number of papers have proposed approximation heuristicsto compute Hausdorff distance on networks [32, 33, 34, 35, 36]. This is due to theexpensive graph-distance computations needed when dealing with trajectories on networks. These approximations allow for interesting and useful pattern discovery, but donot compute exact similarities between trajectories and may alter results. We proposefast and correct algorithms for computing these similarities for a variety of use cases.We go on to provide a case study on real-world GPS data from cyclists in Minneapolis,MN.1.2.2Storage of Spatio-Temporal NetworksA Spatio-Temporal Network (STN) can be defined as a graph G (N, E, T ), where Nis a set of nodes, E is a set of edges connecting two nodes, and every node and edge isassociated with temporal information T (e.g., travel time). STN datasets are becoming

6indispensable in societal applications, such as surface and air transportation management systems. One of the most appealing properties of these datasets is their ability tocapture network attributes that vary over time. Consequently, STN datasets are usuallymassive in size and are accessed based on spatio-temporal movement patterns, makingI/O efficient storage and access methods a significant challenge. Analyzing movementin spatio-temporal networks is important in many societal applications such as transportation, distribution of electricity and gas, and evacuation route planning. The abilityto efficiently store, process and analyze spatio-temporal networks with large time seriesdata would provide benefit to a wide variety of applications.The Federal Highway Administration [3] is recording traffic data of major roads andhighways using sensors such as loop detectors, among others, across the United States.Depending on the type of sensor, traffic levels are recorded every minute throughout aday, as shown in Figure 1.2. The Mobility Monitoring Program (MMP), started in 2000by the Texas Transportation Institute, aimed to evaluate the use of sensors for trafficinformation around the United States. By 2003, MMP was receiving traffic sensor datafrom over 30 cities and 3,000 miles of highway, with sensor readings occurring roughlyevery 30 seconds. This data is then recorded 24 hours a day, 365 days a year, resultingin millions of time steps per year for each sensor. MMP published a report citing theneed for processing and storage of historical traffic data, and how it may benefit trafficmanagement [37].Spatio-temporal networks (STN) are used to represent temporally-detailed road networks, where analysis can be done on fine-grained traffic speed measurements. However, these datasets are significantly larger than their atemporal brethren, and posea number of challenges for existing analytical, processing, and underlying infrastructure technology. Over the last decade, considerable work on STNs has focused onpre-computation techniques and speed-up algorithms for time-dependent route planning [38, 39, 40, 41, 42, 43]. By comparison, there has been relatively little work on thedesign and evaluation of storage and access methods for STNs. Early work employedgeometric space indexes for both space and space-time [44]. Orthogonal partitioningmethods, such as the longitudinal or snapshot method [43], are able to capture network connectivity based on either space or time orthogonally. The longitudinal methodstores temporally consecutive properties of a node (or edge) into the same data page

7Figure 1.2: Traffic speed measurements averaged over 30 days by time of day. Courtesy: [3]whereas the snapshot method stores a topologically connected sub-graph for a giventime instance into the same data page. Current related work for storing and accessing STN data have relied on these orthogonal approaches [43]. In contrast with thesemethods, our approach focuses on non-orthogonal partitioning based on movementconnectivity [45, 46].1.3Thesis ContributionsThe main contributions of this thesis address the challenges of defining, storing, processing, and analyzing spatial big data. First, we define spatial big data and discuss thechallenges and opportunities SBD brings to spatial computing, elaborating on work wedid in [11]. Second, we address the challenge of analyzing spatial big data using GPStrajectories as a case-study, motivated by the societal application of finding bicycle corridors from urban cyclists [47, 48]. Third, we address the challenge of storing spatialbig data in spatial database systems, specifically storing spatio-temporal networks foruse in advanced routing services [45, 46]. We lastly present our preliminary work on thechallenge of processing spatial big data in the future work section of the thesis.

8Table 1.2: Thesis Framework: Spatial Big Data Analytics for Urban InformaticsSpatial Big DataUrban InformaticsStrategicLong-term Forecasts(climate,demographics, economy)Location(GPS)OperationalSBDforUrban Informatics(Ch. 2)TracesSpatio-TemporalNetworks (STN)1.4TacticalBicycle CorridorSelection (Ch. 3)Commuter InformationSystems(Ch. 4)Thesis OverviewEach layer of urban management (planning, governance, and operations) can be mappedto a temporal range consisting of strategic, tactical, and operational (STO) plans. ThisSTO model is traditionally used to describe short and long term planning and management in businesses and government. Strategic plans provide long-term directions fora city, such as 20 year traffic and demographic projections. Tactical plans are on thetime scale of 2-5 years, corresponding to the term of a mayor who may fund new bikelanes and other eco-friendly transportation options. Finally, operational tools provideday-to-day tools and information for city employees to help keep the city functioningsmoothly and efficiently.Table 1.2 illustrates how spatial big data may facilitate different scales of management and analysis for urban informatics. The left column labeled Spatial Big Datacontains three different example datasets. The right three columns, strategic, tactical,and operational, labeled under the umbrella of Urban Informatics, correspond to different temporal scales of urban informatics. Each cell intersecting an urban informaticsscale and example spatial big d

and novel applications of Spatial Big Data Analytics for Urban Informatics. In this thesis, we de ne spatial big data and propose novel approaches for storing and analyzing two popular spatial big data types: GPS trajectories and spatio-temporal networks. We conclude the thesis by exploring future work in the processing of spatial big data. iii

Related Documents:

Spatial Big Data Spatial Big Data exceeds the capacity of commonly used spatial computing systems due to volume, variety and velocity Spatial Big Data comes from many different sources satellites, drones, vehicles, geosocial networking services, mobile devices, cameras A significant portion of big data is in fact spatial big data 1. Introduction

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

The importance of big spatial data, which is ill-supported in the systems mentioned above, motivated many researchers to extend these systems to handle big spatial data. In this paper, we survey the ex-isting work in the area of big spatial data. The goal is to cover the different approaches of processing big spatial data in a distributed en-

tdwi.org 5 Introduction 1 See the TDWI Best Practices Report Next Generation Data Warehouse Platforms (Q4 2009), available on tdwi.org. Introduction to Big Data Analytics Big data analytics is where advanced analytic techniques operate on big data sets. Hence, big data analytics is really about two things—big data and analytics—plus how the two have teamed up to

big data analytics" To discuss the in-depth analysis of hardware and software platforms for big data analytics The study only focused on the hardware and software platform for big data analytics. The review is centered on the impact of parameters such as scalability, data sizes, resources availability on big data analytics. However, the

India has the second largest unmet demand for AI and Big Data/Analytics, driven primarily by large service providers, GCCs and the start-up ecosystem NCR Others Hyderabad Pune Mumbai Bangalore Chennai Top Skills Talent Big Data/ Analytics 5,800 AI 1,200 Top Skills Talent Big Data/ Analytics 19,100 AI 7.400 Top Skills Talent Big Data/ Analytics .

Q) Define Big Data Analytics. What are the various types of analytics? Big Data Analytics is the process of examining big data to uncover patterns, unearth trends, and find unknown correlations and other useful information to make faster and better decisions. Few Top Analytics tools are: MS Excel, SAS, IBM SPSS Modeler, R analytics,

KESEHATAN JIWA Pada saat ini ada kecenderungan penderita dengan gangguan jiwa jumlahnya mengalami peningkatan. Data hasil Survey Kesehatan Rumah Tangga (SK-RT) yang dilakukan Badan Litbang Departemen Kesehatan Republik Indonesia pada tahun 1995 menunjukkan, diperkirakan terdapat 264 dari 1000 anggota Rumah Tangga menderita gangguan kesehatan jiwa. Dalam kurun waktu enam tahun terakhir ini .