当前位置: X-MOL 学术ACM Trans. Knowl. Discov. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Discovering Anomalies by Incorporating Feedback from an Expert
ACM Transactions on Knowledge Discovery from Data ( IF 4.0 ) Pub Date : 2020-06-22 , DOI: 10.1145/3396608
Shubhomoy Das 1 , Weng-Keen Wong 1 , Thomas Dietterich 1 , Alan Fern 1 , Andrew Emmott 1
Affiliation  

Unsupervised anomaly detection algorithms search for outliers and then predict that these outliers are the anomalies. When deployed, however, these algorithms are often criticized for high false-positive and high false-negative rates. One main cause of poor performance is that not all outliers are anomalies and not all anomalies are outliers. In this article, we describe the Active Anomaly Discovery (AAD) algorithm, which incorporates feedback from an expert user that labels a queried data instance as an anomaly or nominal point. This feedback is intended to adjust the anomaly detector so that the outliers it discovers are more in tune with the expert user’s semantic understanding of the anomalies. The AAD algorithm is based on a weighted ensemble of anomaly detectors. When it receives a label from the user, it adjusts the weights on each individual ensemble member such that the anomalies rank higher in terms of their anomaly score than the outliers. The AAD approach is designed to operate in an interactive data exploration loop. In each iteration of this loop, our algorithm first selects a data instance to present to the expert as a potential anomaly and then the expert labels the instance as an anomaly or as a nominal data point. When it receives the instance label, the algorithm updates its internal model and the loop continues until a budget of B queries is spent. The goal of our approach is to maximize the total number of true anomalies in the B instances presented to the expert. We show that the AAD method performs well and in some cases doubles the number of true anomalies found compared to previous methods. In addition we present approximations that make the AAD algorithm much more computationally efficient while maintaining a desirable level of performance.

中文翻译:

通过结合专家的反馈发现异常

无监督异常检测算法搜索异常值,然后预测这些异常值是异常值。然而,在部署时,这些算法经常因高误报率和高误报率而受到批评。性能不佳的一个主要原因是并非所有异常值都是异常值,也不是所有异常值都是异常值。在本文中,我们描述了主动异常发现 (AAD) 算法,该算法结合了专家用户的反馈,将查询的数据实例标记为异常点或标称点。该反馈旨在调整异常检测器,使其发现的异常值更符合专家用户对异常的语义理解。AAD 算法基于异常检测器的加权集合。当它收到用户的标签时,它调整每个单独的集成成员的权重,以使异常在其异常分数方面的排名高于异常值。AAD 方法旨在在交互式数据探索循环中运行。在此循环的每次迭代中,我们的算法首先选择一个数据实例作为潜在异常呈现给专家,然后专家将该实例标记为异常或标称数据点。当它收到实例标签时,算法会更新其内部模型,循环继续,直到预算为 我们的算法首先选择一个数据实例作为潜在异常呈现给专家,然后专家将该实例标记为异常或标称数据点。当它收到实例标签时,算法会更新其内部模型,循环继续,直到预算为 我们的算法首先选择一个数据实例作为潜在异常呈现给专家,然后专家将该实例标记为异常或标称数据点。当它收到实例标签时,算法会更新其内部模型,循环继续,直到预算为查询已用完。我们方法的目标是最大化真实异常的总数提交给专家的实例。我们表明,AAD 方法表现良好,在某些情况下,与以前的方法相比,发现的真实异常数量翻了一番。此外,我们提出了一些近似值,使 AAD 算法的计算效率更高,同时保持了理想的性能水平。
更新日期:2020-06-22
down
wechat
bug