Estimating a one-class naive Bayes text classifier,Intelligent Data Analysis

当前位置： X-MOL 学术 › Intell. Data Anal. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Estimating a one-class naive Bayes text classifier
Intelligent Data Analysis ( IF 0.9 ) Pub Date : 2020-05-21 , DOI: 10.3233/ida-194669
Yihong Zhang ₁ , Adam Jatowt ₂

Affiliation

Nowadays more and more information extraction projects need to classify large amounts of text data. The common way to classify text is to build a supervised classifier trained on human-labeled positive and negative examples. In many cases, however, it is easy to label positive examples, but hard tolabel negative examples. In this paper, we address the problem of building a one-class classifier when only the positive examples are labeled. Previous works on building one-class classifier mostly use positive examples and unlabeled data. In this paper, we show that a configurable one-class classifier such as one-class naive Bayes can be optimized by examining the clustering quality of the classification on target data. We propose to use existing and new quality scores for determining clustering quality of the classification. Experimental analysis with real-world data show that our approach generally achieves high classification accuracy, and in some cases improves the accuracy by more than 10% compared to state-of-art baselines.

中文翻译：

估计一类朴素的贝叶斯文本分类器

如今，越来越多的信息提取项目需要对大量文本数据进行分类。对文本进行分类的常见方法是建立一个在人为标记的正面和负面例子上受过训练的监督分类器。但是，在很多情况下，标记积极的例子很容易，而否定的例子却很难。在本文中，我们解决了仅标记正面示例时构建一类分类器的问题。先前建立一类分类器的工作大多使用正面示例和未标记的数据。在本文中，我们表明可以通过检查目标数据分类的聚类质量来优化可配置的一类分类器，例如一类朴素贝叶斯。我们建议使用现有和新的质量得分来确定分类的聚类质量。

更新日期：2020-06-30

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11