An Integrated Cluster Detection, Optimization, and Interpretation Approach for Financial Data.,IEEE Transactions on Cybernetics

当前位置： X-MOL 学术 › IEEE Trans. Cybern. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An Integrated Cluster Detection, Optimization, and Interpretation Approach for Financial Data.
IEEE Transactions on Cybernetics ( IF 11.8 ) Pub Date : 2022-11-18 , DOI: 10.1109/tcyb.2021.3109066
Tie Li ₁ , Gang Kou ₂ , Yi Peng ₁ , Philip S. Yu ₃

Affiliation

In many financial applications, such as fraud detection, reject inference, and credit evaluation, detecting clusters automatically is critical because it helps to understand the subpatterns of the data that can be used to infer user's behaviors and identify potential risks. Due to the complexity of human behaviors and changing social environments, the distributions of financial data are usually complex and it is challenging to find clusters and give reasonable interpretations. The goal of this study is to develop an integrated approach to detect clusters in financial data, and optimize the scope of the clusters such that the clusters can be easily interpreted. Specifically, we first proposed a new cluster quality evaluation criterion, which is free from large-scale computation and can guide base clustering algorithms such as k -Means to detect hyperellipsoidal clusters adaptively. Then, we designed a new solver for a revised support vector data description model, which efficiently refines the centroids and scopes of the detected clusters to make the clusters tighter such that the data in the clusters share greater similarities, and thus, the clusters can be easily interpreted with eigenvectors. Using ten financial datasets, the experiments showed that the proposed algorithm can efficiently find reasonable number of clusters. The proposed approach is suitable for large-scale financial datasets whose features are meaningful, and also applicable to financial mining tasks, such as data distribution interpretation and anomaly detection.

中文翻译：

金融数据的集成集群检测、优化和解释方法。

在许多金融应用程序中，例如欺诈检测、拒绝推理和信用评估，自动检测集群非常重要，因为它有助于理解可用于推断用户行为和识别潜在风险的数据子模式。由于人类行为的复杂性和不断变化的社会环境，金融数据的分布通常很复杂，很难找到聚类并给出合理的解释。本研究的目标是开发一种集成方法来检测金融数据中的集群，并优化集群的范围，以便集群易于解释。具体来说，我们首先提出了一个新的集群质量评价标准，无需大规模计算，可指导k-Means等基础聚类算法自适应检测超椭球聚类。然后，我们为修正的支持向量数据描述模型设计了一个新的求解器，它可以有效地细化检测到的聚类的质心和范围，使聚类更紧密，使得聚类中的数据具有更大的相似性，从而可以将聚类很容易用特征向量解释。使用十个金融数据集，实验表明所提出的算法可以有效地找到合理数量的聚类。所提出的方法适用于特征有意义的大规模金融数据集，也适用于数据分布解释和异常检测等金融挖掘任务。我们为修订后的支持向量数据描述模型设计了一个新的求解器，它可以有效地细化检测到的聚类的质心和范围，使聚类更紧密，从而使聚类中的数据具有更大的相似性，从而可以轻松解释聚类与特征向量。使用十个金融数据集，实验表明所提出的算法可以有效地找到合理数量的聚类。所提出的方法适用于特征有意义的大规模金融数据集，也适用于数据分布解释和异常检测等金融挖掘任务。我们为修订后的支持向量数据描述模型设计了一个新的求解器，它可以有效地细化检测到的聚类的质心和范围，使聚类更紧密，从而使聚类中的数据具有更大的相似性，从而可以轻松解释聚类与特征向量。使用十个金融数据集，实验表明所提出的算法可以有效地找到合理数量的聚类。所提出的方法适用于特征有意义的大规模金融数据集，也适用于数据分布解释和异常检测等金融挖掘任务。它有效地细化了检测到的聚类的质心和范围，使聚类更紧密，使得聚类中的数据具有更大的相似性，因此，聚类可以很容易地用特征向量解释。使用十个金融数据集，实验表明所提出的算法可以有效地找到合理数量的聚类。所提出的方法适用于特征有意义的大规模金融数据集，也适用于数据分布解释和异常检测等金融挖掘任务。它有效地细化了检测到的聚类的质心和范围，使聚类更紧密，使得聚类中的数据具有更大的相似性，因此，聚类可以很容易地用特征向量解释。使用十个金融数据集，实验表明所提出的算法可以有效地找到合理数量的聚类。所提出的方法适用于特征有意义的大规模金融数据集，也适用于数据分布解释和异常检测等金融挖掘任务。

更新日期：2021-09-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>