当前位置: X-MOL 学术J. Intell. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Uncertain distance-based outlier detection with arbitrarily shaped data objects
Journal of Intelligent Information Systems ( IF 3.4 ) Pub Date : 2020-10-15 , DOI: 10.1007/s10844-020-00624-7
Fabrizio Angiulli , Fabio Fassetti

Enabling information systems to face anomalies in the presence of uncertainty is a compelling and challenging task. In this work the problem of unsupervised outlier detection in large collections of data objects modeled by means of arbitrary multidimensional probability density functions is considered. We present a novel definition of uncertain distance-based outlier under the attribute level uncertainty model, according to which an uncertain object is an object that always exists but its actual value is modeled by a multivariate pdf. According to this definition an uncertain object is declared to be an outlier on the basis of the expected number of its neighbors in the dataset. To the best of our knowledge this is the first work that considers the unsupervised outlier detection problem on data objects modeled by means of arbitrarily shaped multidimensional distribution functions. We present the UDBOD algorithm which efficiently detects the outliers in an input uncertain dataset by taking advantages of three optimized phases, that are parameter estimation, candidate selection, and the candidate filtering. An experimental campaign is presented, including a sensitivity analysis, a study of the effectiveness of the technique, a comparison with related algorithms, also in presence of high dimensional data, and a discussion about the behavior of our technique in real case scenarios.

中文翻译:

使用任意形状的数据对象进行基于不确定距离的异常值检测

使信息系统能够在存在不确定性的情况下面对异常情况是一项引人注目且具有挑战性的任务。在这项工作中,考虑了在通过任意多维概率密度函数建模的大量数据对象中的无监督异常值检测问题。我们在属性级不确定性模型下提出了基于不确定距离的异常值的新定义,根据该定义,不确定对象是始终存在但其实际值由多元 pdf 建模的对象。根据这个定义,不确定对象被声明为基于数据集中其邻居的预期数量的异常值。据我们所知,这是第一项考虑对通过任意形状的多维分布函数建模的数据对象的无监督异常值检测问题的工作。我们提出了 UDBOD 算法,该算法利用三个优化阶段,即参数估计、候选选择和候选过滤,有效地检测输入不确定数据集中的异常值。展示了一项实验活动,包括敏感性分析、技术有效性研究、与相关算法的比较、高维数据的比较,以及关于我们的技术在真实案例场景中的行为的讨论。我们提出了 UDBOD 算法,该算法利用三个优化阶段,即参数估计、候选选择和候选过滤,有效地检测输入不确定数据集中的异常值。展示了一项实验活动,包括敏感性分析、技术有效性研究、与相关算法的比较、高维数据的比较,以及关于我们的技术在真实案例场景中的行为的讨论。我们提出了 UDBOD 算法,该算法利用三个优化阶段,即参数估计、候选选择和候选过滤,有效地检测输入不确定数据集中的异常值。展示了一项实验活动,包括敏感性分析、技术有效性研究、与相关算法的比较、高维数据的比较,以及关于我们的技术在真实案例场景中的行为的讨论。
更新日期:2020-10-15
down
wechat
bug