当前位置: X-MOL 学术Appl. Soft Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An outlier detection algorithm for categorical matrix-object data
Applied Soft Computing ( IF 8.7 ) Pub Date : 2021-02-15 , DOI: 10.1016/j.asoc.2021.107182
Fuyuan Cao , Xiaolin Wu , Liqin Yu , Jiye Liang

Outlier detection is a significant problem in data mining and machine learning which aims to discover objects in a data set that do not conform to well-defined notions of expected behavior. Generally, the input of the existing outlier detection algorithms is a collection of n objects and each object is described by a feature vector. However, in many real world applications, an object is not only described by one feature vector, but a number of feature vectors. In this paper, we define an object described by more than one feature vector as a matrix-object. Inspired by the concepts of cohesion and coupling in software engineering, we define the coupling of a matrix-object based on the average distance between it and other matrix-objects, and define its cohesion based on information entropy and mutual information. On this basis, the outlier factor of a matrix-object is given, and an outlier detection algorithm for categorical matrix-object data is proposed. The experimental results on real and synthetic data sets have shown that the proposed outlier detection algorithm can effectively detect outliers for the matrix-object data set compared with other algorithms.



中文翻译:

分类矩阵对象数据的离群值检测算法

离群检测是数据挖掘和机器学习中的一个重要问题,旨在发现数据集中不符合预期行为的明确定义的对象。通常,现有离群值检测算法的输入是ñ对象,每个对象由特征向量描述。但是,在许多实际应用中,对象不仅由一个特征向量描述,而且由多个特征向量描述。在本文中,我们将由多个特征向量描述的对象定义为矩阵对象。受到软件工程中内聚和耦合概念的启发,我们基于矩阵对象与其他矩阵对象之间的平均距离来定义其耦合,并基于信息熵和互信息来定义其内聚。在此基础上,给出了矩阵对象的离群因子,提出了一种分类矩阵对象数据的离群值检测算法。

更新日期:2021-02-17
down
wechat
bug