当前位置: X-MOL 学术Journal of Data and Information Science › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Two-Level Approach based on Integration of Bagging and Voting for Outlier Detection
Journal of Data and Information Science ( IF 1.5 ) Pub Date : 2020-05-20 , DOI: 10.2478/jdis-2020-0014
Alican Dogan 1 , Derya Birant 2
Affiliation  

Abstract Purpose The main aim of this study is to build a robust novel approach that is able to detect outliers in the datasets accurately. To serve this purpose, a novel approach is introduced to determine the likelihood of an object to be extremely different from the general behavior of the entire dataset. Design/methodology/approach This paper proposes a novel two-level approach based on the integration of bagging and voting techniques for anomaly detection problems. The proposed approach, named Bagged and Voted Local Outlier Detection (BV-LOF), benefits from the Local Outlier Factor (LOF) as the base algorithm and improves its detection rate by using ensemble methods. Findings Several experiments have been performed on ten benchmark outlier detection datasets to demonstrate the effectiveness of the BV-LOF method. According to the results, the BV-LOF approach significantly outperformed LOF on 9 datasets of 10 ones on average. Research limitations In the BV-LOF approach, the base algorithm is applied to each subset data multiple times with different neighborhood sizes (k) in each case and with different ensemble sizes (T). In our study, we have chosen k and T value ranges as [1–100]; however, these ranges can be changed according to the dataset handled and to the problem addressed. Practical implications The proposed method can be applied to the datasets from different domains (i.e. health, finance, manufacturing, etc.) without requiring any prior information. Since the BV-LOF method includes two-level ensemble operations, it may lead to more computational time than single-level ensemble methods; however, this drawback can be overcome by parallelization and by using a proper data structure such as R*-tree or KD-tree. Originality/value The proposed approach (BV-LOF) investigates multiple neighborhood sizes (k), which provides findings of instances with different local densities, and in this way, it provides more likelihood of outlier detection that LOF may neglect. It also brings many benefits such as easy implementation, improved capability, higher applicability, and interpretability.

中文翻译:

基于袋装和投票集成的两级离群值检测方法

摘要目的本研究的主要目的是建立一种鲁棒的新颖方法,该方法能够准确检测数据集中的异常值。为此,引入了一种新颖的方法来确定对象与整个数据集的一般行为极为不同的可能性。设计/方法/方法本文提出了一种基于袋装和投票技术集成的新颖的两级方法来解决异常检测问题。所提出的名为袋装和投票局部离群值检测(BV-LOF)的方法受益于局部离群因子(LOF)作为基本算法,并通过使用集成方法提高了其检测率。结果在十个基准离群值检测数据集上进行了一些实验,以证明BV-LOF方法的有效性。根据结果​​,BV-LOF方法在9个平均10个数据集上的LOF显着优于LOF。研究局限性在BV-LOF方法中,基本算法多次应用于每个子集数据,每种情况下具有不同的邻域大小(k),并且具有不同的集合大小(T)。在我们的研究中,我们选择k和T值范围为[1-100];但是,可以根据所处理的数据集和所解决的问题来更改这些范围。实际意义所建议的方法可以应用于来自不同领域(即健康,财务,制造业等)的数据集,而无需任何先验信息。由于BV-LOF方法包括两级合奏操作,因此与单级合奏方法相比,它可能导致更多的计算时间。然而,通过并行化和使用适当的数据结构(例如R * -tree或KD-tree)可以克服此缺点。独创性/价值所提出的方法(BV-LOF)研究了多个邻域大小(k),从而提供了具有不同局部密度的实例的发现,并以此方式提供了LOF可能忽略的异常检测可能性。它还带来了许多好处,例如易于实现,改进的功能,更高的适用性和可解释性。
更新日期:2020-05-20
down
wechat
bug