Generating Artificial Outliers in the Absence of Genuine Ones — A Survey,ACM Transactions on Knowledge Discovery from Data

当前位置： X-MOL 学术 › ACM Trans. Knowl. Discov. Data › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Generating Artificial Outliers in the Absence of Genuine Ones — A Survey
ACM Transactions on Knowledge Discovery from Data ( IF 4.0 ) Pub Date : 2021-03-27 , DOI: 10.1145/3447822
Georg Steinbuss ₁ , Klemens Böhm ₁

Affiliation

By definition, outliers are rarely observed in reality, making them difficult to detect or analyze. Artificial outliers approximate such genuine outliers and can, for instance, help with the detection of genuine outliers or with benchmarking outlier-detection algorithms. The literature features different approaches to generate artificial outliers. However, systematic comparison of these approaches remains absent. This surveys and compares these approaches. We start by clarifying the terminology in the field, which varies from publication to publication, and we propose a general problem formulation. Our description of the connection of generating outliers to other research fields like experimental design or generative models frames the field of artificial outliers. Along with offering a concise description, we group the approaches by their general concepts and how they make use of genuine instances. An extensive experimental study reveals the differences between the generation approaches when ultimately being used for outlier detection. This survey shows that the existing approaches already cover a wide range of concepts underlying the generation, but also that the field still has potential for further development. Our experimental study does confirm the expectation that the quality of the generation approaches varies widely, for example, in terms of the dataset they are used on. Ultimately, to guide the choice of the generation approach in a specific context, we propose an appropriate general-decision process. In summary, this survey comprises, describes, and connects all relevant work regarding the generation of artificial outliers and may serve as a basis to guide further research in the field.

中文翻译：

在没有真正异常值的情况下生成人工异常值——一项调查

根据定义，在现实中很少观察到异常值，因此难以检测或分析。人工异常值近似于此类真正的异常值，例如，可以帮助检测真正的异常值或对异常值检测算法进行基准测试。文献中介绍了生成人工异常值的不同方法。然而，仍然缺乏对这些方法的系统比较。本文对这些方法进行了调查和比较。我们首先澄清该领域的术语，这些术语因出版物而异，我们提出了一个一般性的问题表述。我们对生成异常值与其他研究领域（如实验设计或生成模型）的联系的描述构成了人工异常值领域。除了提供简洁的描述外，我们根据它们的一般概念以及它们如何利用真实实例对这些方法进行分组。一项广泛的实验研究揭示了最终用于异常值检测时生成方法之间的差异。这项调查表明，现有方法已经涵盖了一代人的广泛概念，但该领域仍有进一步发展的潜力。我们的实验研究确实证实了生成方法的质量差异很大的预期，例如，就它们所使用的数据集而言。最终，为了指导在特定情况下选择生成方法，我们提出了一个适当的一般决策过程。总之，本调查包括、描述、

更新日期：2021-03-27

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11