Monitoring high-frequency data streams in FinTech: FADO versus K-means,IEEE Intelligent Systems

当前位置： X-MOL 学术 › IEEE Intell. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Monitoring high-frequency data streams in FinTech: FADO versus K-means
IEEE Intelligent Systems ( IF 6.4 ) Pub Date : 2020-01-01 , DOI: 10.1109/mis.2020.2977012
Kristiaan Pelckmans

Modern applications of FinTech are challenged by enormous volumes of financial data. One way to handle these is to adopt a streaming setting where data are only available to the algorithms during a very short time. When a new data point (financial transaction) is generated, it needs to be processed directly, and be forgotten immediately after. Especially, ongoing globalization efforts in FinTech require modern methods of fault detection to be able to work efficiently through more than 10 000 financial transactions per second if they are to be deployed as a first line of defence. This article investigates two algorithms able to perform well in this demanding setting: $K$K-means and FADO. Especially, this article provides supports for the claim that “the use of multiple clusters does not necessarily translate into increased detection performance.” To support this claim, results are reported when operating in a quasi-realistic case study of Anti Money Laundering (AML) detection in real-time payment systems. We focus on two prototypical algorithms: the passive aggressive FADO assuming a single cluster, and the well-known $K$K-means algorithm working with $K>1$K>1 clusters. We find—in this case—that the use of $K$K-means with multiple clusters is unfavorable as 1) both tuning for $K$K, as well as the need for additional complexity in the $K$K-means algorithm challenges the computational constraints; 2) $K$K-means introduces necessarily added variability (unreliability) in the results; 3) it requires dimensionality reduction, compromising interpretability of the detections; 4) the prevalence of singleton clusters adds unreliability to the outcome. This makes in the presented case FADO favorable over K-means (with $K>1$K>1).

中文翻译：

监控金融科技中的高频数据流：FADO 与 K-means

金融科技的现代应用受到海量金融数据的挑战。处理这些问题的一种方法是采用流设置，其中数据仅在很短的时间内可供算法使用。当一个新的数据点（金融交易）产生时，需要直接处理，之后立即被遗忘。尤其是，金融科技领域正在进行的全球化努力需要现代故障检测方法，如果要将它们部署为第一道防线，则能够通过每秒 10,000 多笔金融交易有效地工作。本文研究了两种能够在这种苛刻环境中表现良好的算法：$K$K-means 和 FADO。特别是，本文为“使用多个集群并不一定会转化为提高检测性能”的说法提供了支持。” 为了支持这一说法，在实时支付系统中反洗钱 (AML) 检测的准现实案例研究中报告了结果。我们专注于两种原型算法：假设单个集群的被动攻击 FADO，以及使用 $K>1$K>1 集群的众所周知的 $K$K-means 算法。我们发现——在这种情况下——将 $K$K-means 与多个集群一起使用是不利的，因为 1) 既要调整 $K$K，又需要 $K$K-means 算法的额外复杂性挑战计算约束；2) $K$K-means 在结果中引入了必然增加的可变性（不可靠性）；3）它需要降维，影响检测的可解释性；4) 单例集群的流行增加了结果的不可靠性。

更新日期：2020-01-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>