Information Sciences ( IF 8.1 ) Pub Date : 2021-02-26 , DOI: 10.1016/j.ins.2021.02.045 Yu Wang , Yupeng Li
Outlier detection is of great importance in industry as unexpected errors or faults, abnormal behaviours or phenomena, etc. can occur due to a variety of human, system, and environmental reasons. To identify and analyse these rare items, events or observations can find either anomalies or novelties and, as a result, can help avoid potential unexpected consequences or improve industrial system performance. The operating data collected from industrial systems in the Industry 4.0 era are characterized as multi-attribute (e.g., both numerical and categorical) compared to previous studies. Therefore, a new outlier detection method for mixed-valued datasets based on the weighted network model is proposed in this paper. Concretely, a weighted neighbourhood information network (WNIN) is constructed by considering the neighbourhood relations and similarities among objects to represent a dataset with mixed-valued attributes (DMA). A tailored Markov random walk method is employed to detect outlier on the predefined network model. After reaching the equilibrium, the inlier score is defined according to the out-degree of nodes in the WNIN to represent the inlier degree of objects. Experiments on two real datasets and a case study illustrate the effectiveness and adaptability of the proposed method.
中文翻译:
基于加权邻域信息网络的混合值数据集离群值检测
离群检测在工业中非常重要,因为可能由于各种人为,系统和环境原因而发生意外错误或故障,异常行为或现象等。为了识别和分析这些稀有物品,事件或观察值可以发现异常或新颖性,从而可以帮助避免潜在的意外后果或改善工业系统的性能。与以前的研究相比,从工业4.0时代的工业系统中收集的操作数据具有多属性(例如,数字和类别)的特征。因此,本文提出了一种基于加权网络模型的混合值数据集离群值检测新方法。具体来说,通过考虑对象之间的邻域关系和相似性来构建加权邻域信息网络(WNIN),以表示具有混合值属性(DMA)的数据集。采用量身定制的马尔可夫随机游走方法来检测预定义网络模型上的异常值。达到平衡后,根据WNIN中节点的向外度定义内部分数,以表示对象的内部度。在两个真实数据集上的实验和案例研究说明了该方法的有效性和适应性。根据WNIN中节点的向外度定义内部分数,以表示对象的内部度。在两个真实数据集上的实验和一个案例研究说明了该方法的有效性和适应性。根据WNIN中节点的向外度定义内部分数,以表示对象的内部度。在两个真实数据集上的实验和案例研究说明了该方法的有效性和适应性。