An approach for outlier and novelty detection for text data based on classifier confidence,AI Communications

当前位置： X-MOL 学术 › AI Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An approach for outlier and novelty detection for text data based on classifier confidence
AI Communications ( IF 1.4 ) Pub Date : 2020-12-18 , DOI: 10.3233/aic-200649
Nikola Pižurica ₁ , Savo Tomović ₁

Affiliation

In this paper we present an approach for novelty detection in text data. The approach can also be considered as semi-supervised anomaly detection because it operates with the training dataset containing labelled instances for the known classes only. During the training phase the classification model is learned. It is assumed that at least two known classes exist in the available training dataset. In the testing phase instances are classified as normal or anomalous based on the classifier confidence. In other words, if the classifier cannot assign any of the known class labels to the given instance with sufficiently high confidence (probability), the instance will be declared as novelty (anomaly). We propose two procedures to objectively measure the classifier confidence. Experimental results show that the proposed approach is comparable to methods known in the literature.

中文翻译：

基于分类器置信度的文本数据离群和新颖性检测方法

在本文中，我们提出了一种在文本数据中进行新颖性检测的方法。该方法也可以视为半监督异常检测，因为它与仅包含已知类的标记实例的训练数据集一起运行。在训练阶段，学习分类模型。假定在可用的训练数据集中至少存在两个已知的类。在测试阶段，根据分类器的置信度将实例分类为正常或异常。换句话说，如果分类器不能以足够高的置信度（概率）将任何已知的类标签分配给给定实例，则该实例将被声明为新颖性（异常）。我们提出了两种程序来客观地衡量分类器的置信度。

更新日期：2020-12-23

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11