当前位置: X-MOL 学术arXiv.cs.DS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Taming Near Repeat Calculation for Crime Analysis via Cohesive Subgraph Computing
arXiv - CS - Data Structures and Algorithms Pub Date : 2017-05-18 , DOI: arxiv-1705.07746
Zhaoming Yin, Xuan Shi

Near repeat (NR) is a well known phenomenon in crime analysis assuming that crime events exhibit correlations within a given time and space frame. Traditional NR calculation generates 2 event pairs if 2 events happened within a given space and time limit. When the number of events is large, however, NR calculation is time consuming and how these pairs are organized are not yet explored. In this paper, we designed a new approach to calculate clusters of NR events efficiently. To begin with, R-tree is utilized to index crime events, a single event is represented by a vertex whereas edges are constructed by range querying the vertex in R-tree, and a graph is formed. Cohesive subgraph approaches are applied to identify the event chains. k-clique, k-truss, k-core plus DBSCAN algorithms are implemented in sequence with respect to their varied range of ability to find cohesive subgraphs. Real world crime data in Chicago, New York and Washington DC are utilized to conduct experiments. The experiment confirmed that near repeat is a solid effect in real big crime data by conducting Mapreduce empowered knox tests. The performance of 4 different algorithms are validated, while the quality of the algorithms are gauged by the distribution of number of cohesive subgraphs and their clustering coefficients. The proposed framework is the first to process the real crime data of million record scale, and is the first to detect NR events with size of more than 2.

中文翻译:

通过内聚子图计算驯服犯罪分析的近重复计算

近重复 (NR) 是犯罪分析中众所周知的现象,假设犯罪事件在给定的时间和空间范围内表现出相关性。如果在给定的空间和时间限制内发生了 2 个事件,则传统的 NR 计算会生成 2 个事件对。然而,当事件数量很大时,NR 计算很耗时,并且尚未探索如何组织这些对。在本文中,我们设计了一种有效计算 NR 事件簇的新方法。首先,利用R-tree对犯罪事件进行索引,单个事件由一个顶点表示,而边则通过查询R-tree中的顶点的范围来构造,从而形成一个图。应用内聚子图方法来识别事件链。k-clique, k-truss, k-core plus DBSCAN 算法根据它们寻找内聚子图的不同能力范围按顺序实施。芝加哥、纽约和华盛顿特区的真实犯罪数据被用来进行实验。该实验通过执行 Mapreduce 授权的 Knox 测试,证实了接近重复在真实的大犯罪数据中是一个可靠的效果。对4种不同算法的性能进行了验证,而算法的质量则通过内聚子图的数量分布及其聚类系数来衡量。所提出的框架是第一个处理百万记录规模的真实犯罪数据的框架,也是第一个检测大小超过2的NR事件的框架。该实验通过执行 Mapreduce 授权的 Knox 测试,证实了接近重复在真实的大犯罪数据中是一个可靠的效果。对4种不同算法的性能进行了验证,而算法的质量则通过内聚子图的数量分布及其聚类系数来衡量。所提出的框架是第一个处理百万记录规模的真实犯罪数据的框架,也是第一个检测大小超过2的NR事件的框架。该实验通过执行 Mapreduce 授权的 Knox 测试,证实了接近重复在真实的大犯罪数据中是一个可靠的效果。对4种不同算法的性能进行了验证,而算法的质量则通过内聚子图的数量分布及其聚类系数来衡量。所提出的框架是第一个处理百万记录规模的真实犯罪数据的框架,也是第一个检测大小超过2的NR事件的框架。
更新日期:2020-03-27
down
wechat
bug