当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A literature review on one-class classification and its potential applications in big data
Journal of Big Data ( IF 8.6 ) Pub Date : 2021-09-10 , DOI: 10.1186/s40537-021-00514-x
Naeem Seliya 1 , Azadeh Abdollah Zadeh 2 , Taghi M. Khoshgoftaar 2
Affiliation  

In severely imbalanced datasets, using traditional binary or multi-class classification typically leads to bias towards the class(es) with the much larger number of instances. Under such conditions, modeling and detecting instances of the minority class is very difficult. One-class classification (OCC) is an approach to detect abnormal data points compared to the instances of the known class and can serve to address issues related to severely imbalanced datasets, which are especially very common in big data. We present a detailed survey of OCC-related literature works published over the last decade, approximately. We group the different works into three categories: outlier detection, novelty detection, and deep learning and OCC. We closely examine and evaluate selected works on OCC such that a good cross section of approaches, methods, and application domains is represented in the survey. Commonly used techniques in OCC for outlier detection and for novelty detection, respectively, are discussed. We observed one area that has been largely omitted in OCC-related literature is its application context for big data and its inherently associated problems, such as severe class imbalance, class rarity, noisy data, feature selection, and data reduction. We feel the survey will be appreciated by researchers working in these areas of big data.



中文翻译:

一类分类及其在大数据中的潜在应用的文献综述

在严重不平衡的数据集中,使用传统的二元或多类分类通常会导致偏向具有大量实例的类。在这种情况下,对少数类的实例进行建模和检测是非常困难的。一类分类 (OCC) 是一种检测与已知类的实例相比异常数据点的方法,可用于解决与严重不平衡数据集相关的问题,这在大数据中尤为常见。我们对过去十年中出版的 OCC 相关文献作品进行了详细调查,大约。我们将不同的工作分为三类:异常值检测、新颖性检测以及深度学习和 OCC。我们仔细检查和评估关于 OCC 的选定作品,以便在方法、方法、调查中代表了应用领域。分别讨论了 OCC 中用于异常值检测和新颖性检测的常用技术。我们观察到在 OCC 相关文献中被大量忽略的一个领域是它在大数据的应用环境及其固有的相关问题,例如严重的类不平衡、类稀有、嘈杂的数据、特征选择和数据缩减。我们认为在这些大数据领域工作的研究人员会欣赏这项调查。类别稀有性、噪声数据、特征选择和数据缩减。我们认为在这些大数据领域工作的研究人员会欣赏这项调查。类别稀有性、噪声数据、特征选择和数据缩减。我们认为在这些大数据领域工作的研究人员会欣赏这项调查。

更新日期:2021-09-10
down
wechat
bug