当前位置: X-MOL 学术arXiv.cs.CY › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Comparative Study on Crime in Denver City Based on Machine Learning and Data Mining
arXiv - CS - Computers and Society Pub Date : 2020-01-09 , DOI: arxiv-2001.02802
Md. Aminur Rab Ratul

To ensure the security of the general mass, crime prevention is one of the most higher priorities for any government. An accurate crime prediction model can help the government, law enforcement to prevent violence, detect the criminals in advance, allocate the government resources, and recognize problems causing crimes. To construct any future-oriented tools, examine and understand the crime patterns in the earliest possible time is essential. In this paper, I analyzed a real-world crime and accident dataset of Denver county, USA, from January 2014 to May 2019, which containing 478,578 incidents. This project aims to predict and highlights the trends of occurrence that will, in return, support the law enforcement agencies and government to discover the preventive measures from the prediction rates. At first, I apply several statistical analysis supported by several data visualization approaches. Then, I implement various classification algorithms such as Random Forest, Decision Tree, AdaBoost Classifier, Extra Tree Classifier, Linear Discriminant Analysis, K-Neighbors Classifiers, and 4 Ensemble Models to classify 15 different classes of crimes. The outcomes are captured using two popular test methods: train-test split, and k-fold cross-validation. Moreover, to evaluate the performance flawlessly, I also utilize precision, recall, F1-score, Mean Squared Error (MSE), ROC curve, and paired-T-test. Except for the AdaBoost classifier, most of the algorithms exhibit satisfactory accuracy. Random Forest, Decision Tree, Ensemble Model 1, 3, and 4 even produce me more than 90% accuracy. Among all the approaches, Ensemble Model 4 presented superior results for every evaluation basis. This study could be useful to raise the awareness of peoples regarding the occurrence locations and to assist security agencies to predict future outbreaks of violence in a specific area within a particular time.

中文翻译:

基于机器学习和数据挖掘的丹佛市犯罪比较研究

为了确保广大群众的安全,预防犯罪是任何政府的首要任务之一。准确的犯罪预测模型可以帮助政府、执法部门预防暴力,提前发现犯罪分子,分配政府资源,识别导致犯罪的问题。要构建任何面向未来的工具,尽早检查和了解犯罪模式至关重要。在本文中,我分析了美国丹佛县 2014 年 1 月至 2019 年 5 月的真实犯罪和事故数据集,其中包含 478,578 起事件。该项目旨在预测和突出发生的趋势,作为回报,支持执法机构和政府从预测率中发现预防措施。首先,我应用了几种数据可视化方法支持的几种统计分析。然后,我实现了各种分类算法,如随机森林、决策树、AdaBoost 分类器、额外树分类器、线性判别分析、K 邻域分类器和 4 个集成模型,以对 15 种不同类别的犯罪进行分类。结果是使用两种流行的测试方法捕获的:训练测试拆分和 k 折交叉验证。此外,为了完美地评估性能,我还利用了精度、召回率、F1 分数、均方误差 (MSE)、ROC 曲线和配对 T 检验。除了 AdaBoost 分类器,大多数算法都表现出令人满意的准确度。随机森林、决策树、集成模型 1、3 和 4 甚至产生了超过 90% 的准确率。在所有的方法中,Ensemble Model 4 在每个评估基础上都呈现出优异的结果。这项研究可能有助于提高人们对发生地点的认识,并帮助安全机构预测特定时间内特定地区未来的暴力爆发。
更新日期:2020-01-10
down
wechat
bug