当前位置: X-MOL 学术Knowl. Based Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A novel random forest approach for imbalance problem in crime linkage
Knowledge-Based Systems ( IF 8.8 ) Pub Date : 2020-03-09 , DOI: 10.1016/j.knosys.2020.105738
Yu-Sheng Li , Hong Chi , Xue-Yan Shao , Ming-Liang Qi , Bao-Guang Xu

Crime linkage is a challenging task in crime analysis, which is to find serial crimes committed by the same offenders. It can be regarded as a binary classification task detecting serial case pairs. However, most case pairs in the real world are nonserial, so there is a serious class imbalance in the crime linkage. In this paper, we propose a novel random forest based on the information granule. The approach doesn’t resample the minority class or the majority class but concentrates on indistinguishable case pairs at the classification boundary. The information granule is used to identify case pairs that are difficult to distinguish in the dataset and constructs a nearly balanced dataset in the uncertainty region to deal with the imbalanced problem. In the proposed approach, random trees come from the original dataset and the above mentioned nearly balanced dataset. A real-world robbery dataset and some public imbalanced datasets are employed to measure the performance of the approach. The results show that the proposed approach is effective in dealing with class imbalances, and it can be extended to combine with other methods solving class imbalances.



中文翻译:

一种新颖的随机森林方法解决犯罪关联中的失衡问题

犯罪联系是犯罪分析中一项具有挑战性的任务,即查找同一罪犯所犯的一系列犯罪。可以将其视为检测串行案例对的二进制分类任务。但是,现实世界中大多数案例对都是非序列的,因此犯罪联系中存在严重的阶级失衡。在本文中,我们提出了一种基于信息颗粒的新型随机森林。该方法不会对少数类别或多数类别进行重新采样,而是将注意力集中在分类边界上难以区分的案例对上。信息颗粒用于识别在数据集中难以区分的案例对,并在不确定性区域中构建几乎平衡的数据集以处理不平衡问题。在建议的方法中,随机树来自原始数据集和上述几乎平衡的数据集。使用真实世界的抢劫数据集和一些公共不平衡数据集来衡量该方法的性能。结果表明,该方法可以有效地解决阶级失衡问题,并且可以扩展为与其他解决阶级失衡问题的方法相结合。

更新日期:2020-03-09
down
wechat
bug