Project Report Exploring Spatial Data On Crime Analysis

1y ago

4 Views

2 Downloads

965.53 KB

14 Pages

Last View : 15d ago

Last Download : 3m ago

Upload by : Philip Renner

Report this link

Download PDF

Transcription

Project ReportExploring Spatial Data on Crime AnalysisMatheus Paes de Souzampaes.souza292@gmail.comSupervision: Jorge PocoEscola de Matemática AplicadaDecember 22, 2021

Contents1 Datasets11.1Crime occurrences dataset . . . . . . . . . . . . . . . . . . . . . . . . . .11.2Amenities dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.3Discretization and aggregation of the datasets . . . . . . . . . . . . . . .12 Methodology22.1The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22.2Data transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22.2.1Treatment of outlier values in crime levels . . . . . . . . . . . . .32.2.2Treatment of multicollinearity on the input data . . . . . . . . . .43 Evaluation43.1Resolution level 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43.2Resolution level 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .64 Discussion4.14.28Case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .94.1.1Case 1 - Hotspot region analysis . . . . . . . . . . . . . . . . . . .94.1.2Case 2 - Low crime region analysis . . . . . . . . . . . . . . . . .10Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .115 Conclusion12

AbstractThis research project aims to analyse the spatial relation between the distribution of crime and the presence of amenities in the city of São Paulo. To that aim,we employ a spatial-aware regression model, Geographically Weighted Regression(GWR). This model takes into account the spatial distribution of the input data,and describes the manner in which the importance of features for the prediction ofa variable varies in space.1DatasetsWe used two datasets for this task, describing crime occurrences and amenities throughoutSão Paulo. Both datasets and a processed one relation crime with amenities are availableat google drive folder as SPdataEstabelecimentos&Crime.zip.1.1Crime occurrences datasetThis dataset is a list of crime occurrences that were reported in São Paulo from 2006through 2017. The occurrences were sourced from official Police Reports. Each entrylists the date and time of the occurrence, whether it was against passersby, vehiclesor stores, and the geographical coordinates of the occurrence. For this work, only theoccurrences reported in 2017 were considered.1.2Amenities datasetThis dataset provides information for amenities located throughout São Paulo. Eachentry indicates the amenity’s name, category, and geographical coordinates. The datadistinguishes between 108 categories (explained in the codebook available at google drivefolder as SPdataEstabelecimentos&Crime.zip). This dataset was sourced from information from Google Maps.1.3Discretization and aggregation of the datasetsBoth datasets indicate individual geographical coordinates for each point of data. In orderto identify spatial patterns in the crime distribution and amenities in São Paulo, we needto discretize the city in small, preferably nearly identical regions. In order to achieve thisdiscretization, we used Uber’s H3 hexagonal spatial discretization system1 . H3 providesus with small, nearly identical hexagonal regions of a controllable size covering the entirecity’s area. The previous datasets were then aggregated in these regions. H3 provides aparameter to control the resolution of the discretization, i.e. the size of the regions. We1https://h3geo.org/1

chose to work with resolution levels 8 and 9, as lower resolutions were not fine enoughand higher resolutions were not computationally efficient.The resulting dataset for each resolution was a list of hexagonal regions covering SãoPaulo, in which each entry indicates the number of crimes reported inside the region in2017 for each type of crime, and the number of amenities present inside the region, foreach type of amenity. We also created a datasets aggregating only the downtown area.2MethodologyThe spatial analysis was performed through the training of a prediction model for thenumber of crime occurrences in each region. The code of this methodology is availableat google drive folder2 at SpatialCrime.zip.2.1The modelThe model used for predicting the number of crimes was Geographically Weighted Regression (GWR) [1]. GWR takes into account the spatial structure of the data, when dividedinto discrete regions. In contrast to simpler prediction models such as Linear Regression,this class of models encloses a prediction model for each region. In GWR, each localmodel takes into account not only the features of it’s local region, but also the featuresof the surrounding regions. The contribution of each region to a local model depends onit’s distance to the local region, and is weighted by a kernel function. The kernel functionhas a bandwidth parameter for controlling the radius of influence and weight decay forsurrounding regions. In a related work, Silva et al. [2] used Geographically WeightedRegression to model homicide rates in the state of Pernambuco, Brazil.In this work, we used the Gamma kernel for GWR, with bandwidth parameters varying from 900 to 6000, depending on the resolution level of the discretization.We used the implementation of Geographically Weighted Regression provided by thePython package mgwr [3].2.2Data transformationsSome further transformations were applied to the dataset indicated in Section jx4DdHJw2OzilAp25yvv-78kcaL3 yR2

2.2.1Treatment of outlier values in crime levelsFigure 1: Distribution of the number of crimesFigure 1 shows the distribution of the number of crimes across the regions. It can benoted that small values are very frequent, with this frequency decreasing as the valuesrise. Still, there are a few regions which have extremely high values. The presence ofthese outliers cannot be ignored, since they are hostpots. However, their presence in thedata have a degrading effect on the performance of the prediction model. We explore twosolutions to this problem, both involving the application of a monotonic transformationto the data.The first solution is simply to sum 1 and apply the natural logarithm to the datavalues. This transformation is continuous and it’s power increases exponentially as thevalues increase, bringing about the desired effect.Figure 2: Distribution of the number of crimes, logarithm transformation3

The second method is to apply the inverse quantile function of the data distribution.This will replace each data point by it’s quantile, producing values between 0 and 1. Asthis method utilizes a transformation that depends on the data, we calculated the inversequantile function using only the training data in order to avoid data leakage. Then, thissame function was used to transform the test data.Figure 3: Distribution of the number of crimes, inverse quantile transformation2.2.2Treatment of multicollinearity on the input dataAs the input data has many variables, a possible problem is the presence of multicollinearity in the data. Furthermore, the high number of variables also makes the possibility ofoverfitting more likely. To mitigate this, we treated the input data by removing variablesaccording to correlation measures. We performed hierarchical clustering of the variablesusing the Spearman correlation coefficients and Ward’s linkage criterion [4]. This methodrequires a parameter (threshold) for the generation of the clusters.3EvaluationWe now describe the evaluation process, with the choosing of the transformations, multicollinearity treatment threshold parameter and kernel bandwidth.3.1Resolution level 8We trained and evaluated predictors for passerby crimes, for both logarithm and inversequantile transformation, and for several threshold and bandwidth parameters. The fulllist of parameters can be found in in the codebook available at google drive folder in SPdataEstabelecimentos&Crime.zip). We then calculated the equivalent of the R2 measurefor the test data. The result of the experiments is shown in the figures below. Figure 44

Figure 4: Results for inverse quantile transformation.Figure 5: Results for logarithm transformation.shows the results for the inverse quantile transformation, and Figure 5 the results for thelogarithm transformation.The best result for the inverse quantile transformation was 0.83, while the logarithm transformation 0.88. The best results for both transformations were achieved usingthreshold 0 (equivalent to no multicollinearity treatment) and bandwidth 6000.The remainder of the experiments were performed using the logarithm transformation.Figure 6 shows the results obtained for predicting the number of crimes against vehicles. The best score was 0.83 with a threshold of 1.2 and bandwidth 1500.5

Figure 6: Results using R2 metrics for crimes against vehicles.Figure 7 shows the results obtained for predicting the number of crimes against stores.We achieved a best score of 0.64 with threshold of 0.1 and bandwidth 6000.Figure 7: Results using R2 metrics for crimes against stores.3.2Resolution level 9We performed similar experiments with the resolution level 9. The full list of parameterscan be found in in the codebook available at google drive folder in SPdataEstabelecimentos&Crime.zip).Figure 8 shows the results obtained for predicting the number of crimes againstpassersby. The best result was a score of 0.76, for a threshold of 0 and bandwidth2700.6

Figure 8: Results using R2 metrics for crimes against passersby.Figure 9 shows the results obtained for predicting the number of crimes against vehicles. The best result was a score of 0.57, for a threshold of 0 and bandwidth 2700.Figure 9: Results using R2 metrics for crimes against vehicles.Finally, figure 10 shows the results obtained for predicting the number of crimesagainst stores. The best result was a score of 0.29, for a threshold of 0.5 and bandwidth3300.7

Figure 10: Results using R2 metrics for crimes against stores.4DiscussionThe experiments showed that the logarithm transformation for the number of crimesleads to the best results in the regression. The best results were also generally observedwith little to no multicollinearity treatment. Finally, we observed that the discretizationwith resolution level 8 lead to better results than with resolution level 9. We summarizethe best result for each regression variable in Table 1:Crimes againstPassersbyVehiclesStoresResolution level 8R2 threshold bw0.880.060000.831.215000.640.16000Resolution level 9R2 threshold bw0.760.027000.570.027000.290.53300Table 1: Summary of results using R2 metricsOne of the simplest models available to predict the number of crimes in a givenregion is the Linear Regression model. While the goal of the experiment is to identifyspatial patterns in the crime distribution using Geographically Weighted Regression, thepredictive power of the models are also important. Thus, we provide a comparison ofthe performance of both models for predicting the number of crimes against passersby inresolution level 8.Both models achieved good results, though the Geographically Weighted Regressionmodel had slightly better performance. The latter achieved a score of 0.88 for the measureof the equivalent of the R2 coefficient for the test dataset, while the Linear Regressionachieved around 0.83.8

4.1Case analysisWe now present an analysis of the regression for the number of crimes against passersbyin resolution level 8, in two cases:4.1.1Case 1 - Hotspot region analysisWe analyse the results obtained by utilizing a Geographically Weighted Model to predictthe number of passerby crimes in the four adjacent regions with the highest number ofrecorded incidents (5047 incidents), located downtown. The figure below shows the 10features identified to have the biggest importance for the prediction on each of the regions:Figure 11: Positive (in red) and negative (in green) importance of the variables to occurrence of crimes.It can be noted that the feature transit station is the most important predictorfor all four regions, with an increasing effect in the crime level predictions. In fact,this feature was found to be frequently the most important predictor. Furthermore, thefeature subway station also has high importance and increasing effect in the predictionsfor two of these four regions. This could be interpreted as bus stops and subway stationsbeing possible hotspots for crimes against passersby (i.e., muggings).The importance for the other features are similar across the regions. Schools, churches,parking structures, travel agencies and takeaway restaurants seem to have a positive cor9

relation with the number of crimes reported in the area, while the presence of accountingoffices, electronics stores, drugstores and convenience stores were found to have the opposite effect. These are perhaps not immediately interpretable, and can serve as a startingpoint for investigation or the refining of the model.4.1.2Case 2 - Low crime region analysisIn contrast, we now discuss the results of the same regression for four additional adjacentregions with much lower crime rates, recording only 39 cases. The figure below shows thecalculated importance for the main features:Figure 12: Positive (in red) and negative (in green) importance of the variables to occurrence of crimes.In this part of the city, the feature with the most influence in the increase of theprediction is the presence of car repair shops, though closely followed by the already knowntransit station feature. Again, we have a certain commonality in all of the predictorsfor these regions. Beauty salons, schools, banks, offices for local government and travelagencies have a positive correlation with the increase in crime there. In this case, theincrease in muggings near banks is very easily explained. Meanwhile, the presence of spas,insurance agencies, accounting and dentists offices were found to have negative effect onthe prediction, though this behaviour also lacks a simple explanation.10

4.2VisualizationIn Figures 13 and 14, we show a visualization of the predicted values for passerby crimesin the whole city and limited to downtown. The values have been scaled for the training.We can observe that the predicted values agree with the actual data.Figure 13: Heatmap of number of crimes in the whole city of São Paulo with threshold 0.0, bw 6000.Figure 14: Heatmap of number of crimes in the whole city of São Paulo with threshold 0.0, bw 3300.11

We can observe that the predicted values (in the right) have a good resemblance withthe original values (in the left) for the whole city (Figure 13) preserving the scale, thusmaking a good prediction. Now, for the data focused in the downtown (Figure 14) despitethe scale not being preserved, thus not making a really good prediction the patterns wherethe data highlight criminal activities is preserved.5ConclusionIn this report we studied the impact of models that only takes account of regions nearto the the observed one. Our dataset investigates the impact of certain amenities oncrime. Since we create a model for each region, we can observe what are the mostimportant variables for each region and then observe which amenity has a deeper impacton each part of the city. Our experiments show a small increase in performance usingGeographically Weighted Regression, as another gain using that model, we could observethat the presence of amenities have different impacts on each region.References[1] C. Brunsdon, S. Fotheringham, and M. Charlton, “Geographically weighted regression,” Journal of the Royal Statistical Society: Series D (The Statistician), vol. 47,no. 3, pp. 431–443, 1998.[2] C. Silva, S. Melo, A. Santos, P. A. Junior, S. Sato, K. Santiago, and L. Sá, “Spatialmodeling for homicide rates estimation in pernambuco state-brazil,” ISPRS International Journal of Geo-Information, vol. 9, no. 12, p. 740, 2020.[3] T. M. Oshan, Z. Li, W. Kang, L. J. Wolf, and A. S. Fotheringham, “mgwr: Apython implementation of multiscale geographically weighted regression for investigating process spatial heterogeneity and scale,” ISPRS International Journal of GeoInformation, vol. 8, no. 6, p. 269, 2019.[4] J. H. Ward Jr, “Hierarchical grouping to optimize an objective function,” Journal ofthe American statistical association, vol. 58, no. 301, pp. 236–244, 1963.12

Exploring Spatial Data on Crime Analysis Matheus Paes de Souza mpaes.souza292@gmail.com Supervision: Jorge Poco Escola de Matem atica Aplicada December 22, 2021. . This model takes into account the spatial distribution of the input data, and describes the manner in which the importance of features for the prediction of a variable varies in .

Related Documents:

Spatial Big Data - University of Helsinki

Spatial Big Data Spatial Big Data exceeds the capacity of commonly used spatial computing systems due to volume, variety and velocity Spatial Big Data comes from many different sources satellites, drones, vehicles, geosocial networking services, mobile devices, cameras A significant portion of big data is in fact spatial big data 1. Introduction

22 Views

1y ago

Application in Augmented Reality for Learning Mathematical Functions: A ...

The term spatial intelligence covers five fundamental skills: Spatial visualization, mental rotation, spatial perception, spatial relationship, and spatial orientation [14]. Spatial visualization [15] denotes the ability to perceive and mentally recreate two- and three-dimensional objects or models. Several authors [16,17] use the term spatial vis-

15 Views

1y ago

Spatial Big Data Analytics for Urban Informatics

and novel applications of Spatial Big Data Analytics for Urban Informatics. In this thesis, we de ne spatial big data and propose novel approaches for storing and analyzing two popular spatial big data types: GPS trajectories and spatio-temporal networks. We conclude the thesis by exploring future work in the processing of spatial big data. iii

11 Views

1y ago

The Spatial‐temporal Exploration of Health and ... - UK Data Service

The Spatial ‐temporal . Data & analytical approach Population bases & health/illness transitions Spatial concentrations - Health (non) . Further information: Anselin L .(2005) Exploring Spatial Data with GeoDaTM: A Workbook. Spatial Analysis .

10 Views

1y ago

SQL SUPPORTED SPATIAL ANALYSIS FOR WEB-GIS - Purdue University College ...

advanced spatial analysis capabilities. OGIS SQL standard contains a set of spatial data types and functions that are crucial for spatial data querying. In our work, OGIS SQL has been implemented in a Web-GIS based on open sources. Supported by spatial-query enhanced SQL, typical spatial analysis functions in desktop GIS are realized at

12 Views

1y ago

The Era of Big Spatial Data: A Survey

The importance of big spatial data, which is ill-supported in the systems mentioned above, motivated many researchers to extend these systems to handle big spatial data. In this paper, we survey the ex-isting work in the area of big spatial data. The goal is to cover the different approaches of processing big spatial data in a distributed en-

15 Views

1y ago

On the Spatial Graph

Spatial graph is a spatial presen-tation of a graph in the 3-dimensional Euclidean space R3 or the 3-sphere S3. That is, for a graph G we take an embedding / : G —» R3, then the image G : f(G) is called a spatial graph of G. So the spatial graph is a generalization of knot and link. For example the figure 0 (a), (b) are spatial graphs of a .

13 Views

1y ago

Auditing and Assurance Services, 15e, Global Edition ...

Auditing and Assurance Services, 15e, Global Edition (Arens) Chapter 2 The Audit Standards’ Setting Process Learning Objective 2-1 1) The legal right to perform audits is granted to a CPA firm by regulation of: A) each state. B) the Financial Accounting Standards Board (FASB). C) the American Institute of Certified Public Accountants (AICPA). D) the Audit Standards Board. Answer: A Terms .

180 Views

3y ago

Recent Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

Decision Tree Tutorial by Kardi Teknomo - TAN THIAM HUAT 陳添發

Male 1 Cheap Medium Bus Female 1 Cheap Medium Train Female 0 Cheap Low Bus Male 1 Cheap Medium Bus Male 0 Standard Medium Train Female 1 Standard Medium Train Female 1 Expensive High Car Male 2 Expensive Medium Car Female 2 Expensive High Car Based on above training data, we can induce a decision tree as the following:

10m ago

84 Views

-xglfldo:Dwfk Xjxvw Wkurxjk)2,

Affordable Care Act - insurance comparison, cheapest insurance, cheap health insurance NJ, cheapest insurance company Priority One High Volume - Washington state health insurance plans, affordable health insurance The best performing ad copy included those that made specific reference to finding "health insurance" for

1y ago

259 Views

Gold Tier - MAPFRE Insurance

Foy Insurance of MA, LLC 198 Frank Consolati Insurance Agency, Inc. 198 County Insurance Agency, Inc. 198 Woodrow W Cross Agency 214 Woodland Insurance Agency, Inc. 214 Tegeler Insurance Services of CT, Inc. 214 Pantano/VonKahle Insurance Agency, Inc. 214 . Hanson Insurance Agency, Inc. 287 J.H. Slattery Insurance Agency, Inc. 287

1y ago

565 Views

Consumer Guide to Auto Insurance - csimt.gov

consumer guide to auto insurance contents introduction to auto insurance 1 understanding your auto insurance policy 2 required auto insurance 3 optional types of auto insurance 4-5 getting the right coverage 6 accidents and violations 7 how to shop for auto insurance 8 shopping tips 9 frequently asked questions 10-11 insurance complaints/when you have a problem 12

2y ago

805 Views

Industry Observations Insurance Industry

Jun 30, 2019 · 6/17/2019 Commercial Insurance Branch of Extraco Banks, N.A. Higginbotham Insurance Group, Inc. Insurance Brokers NA 6/13/2019 Links Insurance Services, LLC World Insurance Associates LLC Property and Casualty Insurance NA 6/13/2019 Abram Interstate Insurance Services, Inc. Risk Placement Services,

2y ago

619 Views

Life Insurance Buyer's Guide Life Insurance - National Association of .

Life Insurance uers uide Naional ssociaion of Insurance Commissioners Compare the Different Types of Insurance Policies There are many types of life insurance pol-icies. You should choose a policy with fea-tures that fit your individual needs. Some things to consider are: Term Insurance vs. Cash Value In-surance. Term insurance is intended to

1y ago

520 Views

your guide to understanding auto ins in nh - New Hampshire

Hampshire Insurance Department does not mandate or set Auto Insurance Rates. Auto Insurance Rates will vary by insurance company. This guide is intended to give New Hampshire consumers basic information on auto insurance. It suggests ways to: Lower the cost of your auto insurance, shop for Auto insurance and, file an auto insurance claim.

1y ago

449 Views

18.01.41 - REPLACEMENT OF LIFE INSURANCE AND ANNUITIES - Idaho

Department of Insurance Replacement of Life Insurance and Annuities. Page 3. 04. Existing Life Insurance or Annuity. "Existing Life Insurance or Annuity" means any life insurance or annuity in force, including life insurance under a binding or conditional receipt or a lif e insurance policy or annuity that is within an unconditional refund period.

1y ago

407 Views

EXAMINATION REPORT OF THE ADMIRAL INSURANCE COMPANY AS OF . - Delaware

Berkley Regional Specialty Insurance Comp 31295 DE Carolina Casualty Insurance Company 10510 IA Clermont Insurance Company 33480 IA Continental Western Insurance Company 10804 IA Firemen's Insurance Com pany of Wash, D.C. 21784 DE Gemini Insurance Company 10833 DE Great Divide Insurance Company 25224 ND

1y ago

258 Views

American International Group, Inc. - Federal Reserve

American General Life Insurance Company AGL U.S. Life Insurance Company AGC Life Insurance Company AGC Life U.S. Life Insurance Company The United States Life Insurance Company in the City of New York U.S. Life U.S. Life Insurance Company The Variable Annuity Life Insurance Company VALIC U.S. Life Insurance Company

1y ago

269 Views

Japan's Insurance Market - Toa Re

with 61.6% of net premiums written, of which automobile insurance totaled 48.8% and compulsory automobile liability insurance totaled 12.8%. Fire insurance accounted for 13.7%, miscellaneous casualty insurance including liability insurance accounted for 11.6%, accident insurance accounted for 9.8%, and marine insurance accounted for 3.2%.

1y ago

179 Views

List of Insurance Companies by Insurance Manager - Cayman Islands dollar

2447 Batan Insurance Company SPC, Ltd. 29-Sep-03 1307714 BBG Insurance Services, Ltd. 09-Aug-16 1254 BCHS Insurance, Ltd. 07-Oct-98 1168 Bearacuda Re 01-Aug-97 2639 Bedrock Insurance Limited 24-Nov-05 2150 Bom Ambiente Insurance Company 14-Jun-00 2565 Boundless Insurance Company, Ltd. 01-Dec-04 769 Bucap Limited 03-Mar-89

1y ago

293 Views

Insurance Certificate 713705-3 and Assistance Program

Name of insurance product: Purchase Protection and Travel Insurance for National Bank of Canada Mastercard credit cards, group insurance policy no. 713705 (Schedule A Certificate number 3)/713705-3 Type of insurance product: Purchase insurance and extended warranty and travel insurance (group insurance) Assistance provider contact information

3m ago

54 Views

The End of Cheap Oil

78 Scientific American March 1998 The End of Cheap Oil The End of Cheap Oil . serves “proved” only if the oil lies near a producing well and there is “reason- . many P90 reserve estimates always un - derstates the amount of proved oil in a region. The only correct way to total

2y ago

153 Views

Project Report Exploring Spatial Data On Crime Analysis

It looks like you're using an ad-blocker