当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Generation of overlapping clusters constructing suitable graph for crime report analysis
Future Generation Computer Systems ( IF 6.2 ) Pub Date : 2021-01-21 , DOI: 10.1016/j.future.2021.01.027
Ankur Das , Janmenjoy Nayak , Bighnaraj Naik , Uttam Ghosh

Cybercrime is a kind of criminal activity generally committed by cybercriminals or hackers. Crime activities are growing explosively all over the world which motivates the law enforcement agencies for systematic analysis of crimes. In many cases, crime information is stored as online text reports in an unstructured way and one report describes several different criminal activities. Analysis of these crime reports for identifying patterns and trends in crime and devising solutions to crime detection and prevention strategies are very challenging tasks. In this paper, the crime reports are preprocessed and relations among named entity pairs are extracted to give the structured form to the reports. Each extracted relation is converted to an n-dimensional real-valued vector based on the concept of Word2Vec model of Natural Language Processing. Then a novel agglomerative graph partitioning algorithm using various graph centrality measures is applied to partition the extracted relations. All the extracted relations of a report which are in a single partition are replaced by the representative of that partition and thus each report is described by a set of distinct types of relations. Next, a graph for the set of reports is constructed in such a way that nodes are corresponding to the tuple of relations that describes the reports, and an edge between a pair of nodes is drawn only if the corresponding pair of relations are of a similar type of two different reports. The constructed graph is a disconnected graph with each connected component is a clique. These cliques are easily identified in linear time of the number of edges in the graph and each clique provides a cluster of reports. As each report is described by a set of relations of different types, so obtained clusters are overlapping clusters. The degree of membership of a report in a cluster is also identified in the paper. The proposed method is experimented, and compared with some state-of-the-art partition-based and overlapping clustering algorithms to demonstrate its effectiveness in the domain of crime corpora.



中文翻译:

生成重叠聚类,构建适合犯罪报告分析的图

网络犯罪是通常由网络犯罪分子或黑客实施的一种犯罪活动。犯罪活动在世界范围内呈爆炸性增长,这促使执法机构对犯罪进行系统的分析。在许多情况下,犯罪信息以非结构化方式存储为在线文本报告,并且一个报告描述了几种不同的犯罪活动。对这些犯罪报告进行分析以查明犯罪的模式和趋势,并为犯罪侦查和预防战略制定解决方案,是一项非常具有挑战性的任务。本文对犯罪报告进行了预处理,并提取了命名实体对之间的关​​系,以将结构形式提供给报告。每个提取的关系都转换为ñ基于自然语言处理的Word2Vec模型概念的三维实值向量。然后应用一种新颖的凝聚图划分算法,该算法使用各种图中心性度量对提取的关系进行划分。单个分区中报告的所有提取关系都由该分区的代表替换,因此每个报告都由一组不同类型的关系来描述。接下来,以这样一种方式构造报告集的图形:节点对应于描述报告的关系元组,并且仅当相应的一对关系具有相似关系时才绘制一对节点之间的边两种不同报告的类型。构造图是一个断开的图,每个连接的组件都是一个集团。这些集团很容易在图形中边缘数量的线性时间中进行识别,并且每个集团都提供了一组报告。由于每个报告都是通过一组不同类型的关系来描述的,因此获得的聚类是重叠的聚类。本文还确定了集群中报告的隶属程度。对提出的方法进行了实验,并与一些最新的基于分区的重叠聚类算法进行了比较,以证明其在犯罪语料领域的有效性。

更新日期:2021-01-29
down
wechat
bug