当前位置: X-MOL 学术Qual. Technol. Quant. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Distributed outlier detection in hierarchically structured datasets with mixed attributes
Quality Technology and Quantitative Management ( IF 2.8 ) Pub Date : 2019-06-21 , DOI: 10.1080/16843703.2019.1629679
Qiao Liang 1 , Kaibo Wang 1
Affiliation  

Anomaly detection has been extensively studied over the past decades; however, there are still various challenges due to the complex structures of the real-world datasets. First, only a few methods in the literature provide insight into the datasets that have both categorical and continuous attributes, and even fewer of them are sensitive to the dependencies between the two types of attributes. Second, a real-world dataset tends to be more complex in its structure, and the categorical attributes are usually hierarchically correlated, which has been largely ignored by the existing outlier detection approaches. Following this line of reasoning, we propose a distributed outlier detection method for mixed attribute datasets, especially with hierarchical categorical attributes. The proposed method accounts for the dependencies between categorical and continuous attributes rather than treating them as two separate parts. In addition, the proposed method is able to capture the hierarchical structure among categorical attributes. The experimental results on a real-world dataset and a simulation study show its superior performance in terms of both the detection accuracy and time efficiency.



中文翻译:

具有混合属性的分层结构化数据集中的分布式离群值检测

在过去的几十年中,对异常检测进行了广泛的研究。但是,由于现实世界数据集的复杂结构,仍然存在各种挑战。首先,文献中只有很少的方法可以洞察具有分类属性和连续属性的数据集,而对这两种属性之间的依赖关系敏感的方法则更少。其次,真实数据集的结构往往更复杂,并且类别属性通常是层次相关的,而现有的异常值检测方法已大大忽略了这些属性。根据这一推理,我们提出了一种针对混合属性数据集的分布式异常值检测方法,尤其是对于分层的分类属性。所提出的方法考虑了分类属性和连续属性之间的依赖性,而不是将它们视为两个单独的部分。另外,所提出的方法能够捕获类别属性之间的层次结构。在真实数据集上的实验结果和仿真研究表明,它在检测精度和时间效率方面均具有出色的性能。

更新日期:2019-06-21
down
wechat
bug