当前位置: X-MOL 学术Pattern Anal. Applic. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Customs fraud detection
Pattern Analysis and Applications ( IF 3.7 ) Pub Date : 2019-10-30 , DOI: 10.1007/s10044-019-00852-w
Jellis Vanhoeyveld , David Martens , Bruno Peeters

In this customs fraud detection application, we analyse a unique data set of 9,624,124 records resulting from a collaboration with the Belgian customs administration. They are faced with increasing levels of international trade, which pressurizes regulatory control. Governments therefore rely on data mining to focus their limited resources on the most likely fraud cases. The literature on data mining for customs fraud detection lacks in two main directions that are simultaneously addressed in this paper: (1) behavioural and high-cardinality data types are neglected due to a lack of methodology to include them. We demonstrate that such fine-grained features (e.g. the specific entities such as consignee, consignor and declarant and the commodities involved in a declaration) are very predictive. (2) Studies in the tax domain most often use standard learning algorithms on their fraud detection applications. However, customs data are highly imbalanced and this poses challenges for many inducers. We present a new EasyEnsemble method that integrates a support vector machine base learner in a confidence-rated boosting algorithm. This results in a fast and scalable learner that is able to drastically improve predictive performance over the base application of a support vector machine. The results of our proposed framework reveals high AUC and lift values that translate into an immediate impact on the customs fraud detection domain through an improved retrieval of tax losses and an enhanced deterrence.

中文翻译:

海关欺诈检测

在此海关欺诈检测应用程序中,我们分析了与比利时海关总署合作产生的9,624,124条记录的唯一数据集。他们面临着日益增长的国际贸易水平,这加剧了监管管制。因此,政府依靠数据挖掘将其有限的资源集中在最可能的欺诈案件上。关于海关欺诈检测的数据挖掘的文献缺乏两个主要方向,本文同时解决了这些问题:(1)由于缺乏将行为和高基数数据类型包括在内的方法,因此它们被忽略了。我们证明,这种细粒度的功能(例如,收货人,发货人和声明人等特定实体以及声明中涉及的商品)具有很好的预测性。(2)税收领域的研究最常在欺诈检测应用程序中使用标准学习算法。但是,海关数据高度不平衡,这对许多诱因者构成了挑战。我们提出了一种新的EasyEnsemble方法,该方法将支持向量机基础学习器集成在置信度提升的算法中。这样就产生了一个快速且可扩展的学习器,该学习器能够比支持向量机的基础应用程序大大提高预测性能。我们提出的框架的结果显示出较高的AUC和提升值,通过改进税收损失的获取和增强的威慑力,对海关欺诈检测领域具有直接影响。我们提出了一种新的EasyEnsemble方法,该方法将支持向量机基础学习器集成到置信度提升的算法中。这样就产生了一个快速且可扩展的学习器,该学习器能够比支持向量机的基础应用程序大大提高预测性能。我们提出的框架的结果显示出较高的AUC和提升值,通过改进税收损失的获取和增强的威慑力,对海关欺诈检测领域具有直接影响。我们提出了一种新的EasyEnsemble方法,该方法将支持向量机基础学习器集成到置信度提升的算法中。这样就产生了一个快速且可扩展的学习器,该学习器能够比支持向量机的基础应用程序大大提高预测性能。我们提出的框架的结果显示出较高的AUC和提升值,通过改进税收损失的获取和增强的威慑力,对海关欺诈检测领域具有直接影响。
更新日期:2019-10-30
down
wechat
bug