当前位置: X-MOL 学术Journal of Money Laundering Control › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Data quality issues leading to sub optimal machine learning for money laundering models
Journal of Money Laundering Control ( IF 1.3 ) Pub Date : 2021-07-28 , DOI: 10.1108/jmlc-05-2021-0049
Abhishek Gupta 1 , Dwijendra Nath Dwivedi 2 , Jigar Shah 3 , Ashish Jain 4
Affiliation  

Purpose

Good quality input data is critical to developing a robust machine learning model for identifying possible money laundering transactions. McKinsey, during one of the conferences of ACAMS, attributed data quality as one of the reasons for struggling artificial intelligence use cases in compliance to data. There were often use concerns raised on data quality of predictors such as wrong transaction codes, industry classification, etc. However, there has not been much discussion on the most critical variable of machine learning, the definition of an event, i.e. the date on which the suspicious activity reports (SAR) is filed.

Design/methodology/approach

The team analyzed the transaction behavior of four major banks spread across Asia and Europe. Based on the findings, the team created a synthetic database comprising 2,000 SAR customers mimicking the time of investigation and case closure. In this paper, the authors focused on one very specific area of data quality, the definition of an event, i.e. the SAR/suspicious transaction report.

Findings

The analysis of few of the banks in Asia and Europe suggests that this itself can improve the effectiveness of model and reduce the prediction span, i.e. the time lag between money laundering transaction done and prediction of money laundering as an alert for investigation

Research limitations/implications

The analysis was done with existing experience of all situations where the time duration between alert and case closure is high (anywhere between 15 days till 10 months). Team could not quantify the impact of this finding due to lack of such actual case observed so far.

Originality/value

The key finding from paper suggests that the money launderers typically either increase their level of activity or reduce their activity in the recent quarter. This is not true in terms of real behavior. They typically show a spike in activity through various means during money laundering. This in turn impacts the quality of insights that the model should be trained on. The authors believe that once the financial institutions start speeding up investigations on high risk cases, the scatter plot of SAR behavior will change significantly and will lead to better capture of money laundering behavior and a faster and more precise “catch” rate.



中文翻译:

数据质量问题导致洗钱模型的次优机器学习

目的

高质量的输入数据对于开发强大的机器学习模型以识别可能的洗钱交易至关重要。麦肯锡在 ACAMS 的一次会议上将数据质量归因于人工智能用例难以满足数据要求的原因之一。经常有人对预测变量的数据质量提出使用担忧,例如错误的交易代码、行业分类等。但是,对于机器学习最关键的变量,事件的定义,即事件发生的日期,并没有太多讨论。提交可疑活动报告 (SAR)。

设计/方法/方法

该团队分析了分布在亚洲和欧洲的四家主要银行的交易行为。根据调查结果,该团队创建了一个包含 2,000 名 SAR 客户的合成数据库,模拟了调查和结案的时间。在本文中,作者专注于数据质量的一个非常具体的领域,即事件的定义,即 SAR/可疑交易报告。

发现

对亚洲和欧洲少数银行的分析表明,这本身可以提高模型的有效性并减少预测跨度,即洗钱交易完成和洗钱预测之间的时间滞后,作为调查的警报

研究限制/影响

分析是根据警报和病例结案之间的持续时间较长(从 15 天到 10 个月之间的任何时间)的所有情况的现有经验完成的。由于迄今为止没有观察到这样的实际案例,团队无法量化这一发现的影响。

原创性/价值

论文的主要发现表明,洗钱者通常要么增加他们的活动水平,要么减少他们最近一个季度的活动。就实际行为而言,这是不正确的。他们通常在洗钱期间通过各种方式显示活动激增。这反过来又会影响模型应该被训练的洞察力的质量。作者认为,一旦金融机构开始加快对高风险案件的调查,SAR行为的散点图将发生显着变化,从而更好地捕捉洗钱行为和更快、更准确的“抓获”率。

更新日期:2021-07-28
down
wechat
bug