当前位置: X-MOL 学术J. Exp. Theor. Artif. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Credit Card Fraud Detection under Extreme Imbalanced Data: A Comparative Study of Data-level Algorithms
Journal of Experimental & Theoretical Artificial Intelligence ( IF 1.7 ) Pub Date : 2021-04-03 , DOI: 10.1080/0952813x.2021.1907795
Amit Singh 1 , Ranjeet Kumar Ranjan 2 , Abhishek Tiwari 3
Affiliation  

ABSTRACT

Credit card fraud is one of the biggest cybercrimes faced by users. Intelligent machine learning based fraudulent transaction detection systems are very effective in real-world scenarios. However, while designing these systems, machine learning approaches suffer from the problem of imbalanced data, i.e. imbalanced class distribution. Therefore, balancing the dataset becomes an imperative sub-task. Investigation of state-of-the-art approaches reveals that there is a need for a systematic study of class imbalance handling strategies to design an intelligent and capable system to detect the fraudulent transaction. This work aims to provide a comparative study of different class imbalance handling methods. To compare the effectiveness and efficiency of different class imbalance approaches in conjunction with state-of-the-art classification approaches, we have performed an extensive experimental study. We compared these methods on many performance indicators such as Precision, Recall, K-fold Cross-validation, AUC-ROC curve and execution time. In this study, we found that the Oversampling followed by Undersampling methods performs well for ensemble classification models such as AdaBoost, XGBoost and Random Forest.



中文翻译:

极端不平衡数据下的信用卡欺诈检测:数据级算法的比较研究

摘要

信用卡欺诈是用户面临的最大网络犯罪之一。基于智能机器学习的欺诈交易检测系统在现实世界中非常有效。然而,在设计这些系统时,机器学习方法存在数据不平衡的问题,即类别分布不平衡。因此,平衡数据集成为一项势在必行的子任务。对最先进方法的研究表明,需要对类不平衡处理策略进行系统研究,以设计一个智能且有能力的系统来检测欺诈交易。这项工作旨在提供不同类别不平衡处理方法的比较研究。为了比较不同类别不平衡方法与最先进的分类方法的有效性和效率,我们进行了广泛的实验研究。我们在许多性能指标上对这些方法进行了比较,例如 Precision、Recall、K-fold Cross-validation、AUC-ROC 曲线和执行时间。在这项研究中,我们发现过采样和欠采样方法对于 AdaBoost、XGBoost 和随机森林等集成分类模型表现良好。

更新日期:2021-04-03
down
wechat
bug