当前位置: X-MOL 学术ACM Trans. Knowl. Discov. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
S2OSC: A Holistic Semi-Supervised Approach for Open Set Classification
ACM Transactions on Knowledge Discovery from Data ( IF 3.6 ) Pub Date : 2021-09-04 , DOI: 10.1145/3468675
Yang Yang 1 , Hongchen Wei 1 , Zhen-Qiang Sun 2 , Guang-Yu Li 1 , Yuanchun Zhou 3 , Hui Xiong 4 , Jian Yang 1
Affiliation  

Open set classification (OSC) tackles the problem of determining whether the data are in-class or out-of-class during inference, when only provided with a set of in-class examples at training time. Traditional OSC methods usually train discriminative or generative models with the owned in-class data, and then utilize the pre-trained models to classify test data directly. However, these methods always suffer from the embedding confusion problem, i.e., partial out-of-class instances are mixed with in-class ones of similar semantics, making it difficult to classify. To solve this problem, we unify semi-supervised learning to develop a novel OSC algorithm, S2OSC, which incorporates out-of-class instances filtering and model re-training in a transductive manner. In detail, given a pool of newly coming test data, S2OSC firstly filters the mostly distinct out-of-class instances using the pre-trained model, and annotates super-class for them. Then, S2OSC trains a holistic classification model by combing in-class and out-of-class labeled data with the remaining unlabeled test data in a semi-supervised paradigm. Furthermore, considering that data are usually in the streaming form in real applications, we extend S2OSC into an incremental update framework (I-S2OSC), and adopt a knowledge memory regularization to mitigate the catastrophic forgetting problem in incremental update. Despite the simplicity of proposed models, the experimental results show that S2OSC achieves state-of-the-art performance across a variety of OSC tasks, including 85.4% of F1 on CIFAR-10 with only 300 pseudo-labels. We also demonstrate how S2OSC can be expanded to incremental OSC setting effectively with streaming data.

中文翻译:

S2OSC:开放集分类的整体半监督方法

开放集分类 (OSC) 解决了在推理过程中确定数据是类内还是类外的问题,在训练时只提供一组类内示例。传统的 OSC 方法通常使用拥有的类内数据训练判别或生成模型,然后利用预训练的模型直接对测试数据进行分类。然而,这些方法总是存在嵌入混淆问题,即部分类外实例与语义相似的类内实例混合在一起,难以分类。为了解决这个问题,我们统一了半监督学习来开发一种新的 OSC 算法,S2OSC,它以转换的方式结合了类外实例过滤和模型重新训练。详细地说,给定一个新的测试数据池,S2OSC 首先使用预训练模型过滤大部分不同的类外实例,并为它们标注超类。然后,S2OSC 通过在半监督范式中将类内和类外标记数据与剩余的未标记测试数据相结合,训练一个整体分类模型。此外,考虑到数据在实际应用中通常为流形式,我们将 S2OSC 扩展为增量更新框架(I-S2OSC),并采用知识内存正则化来缓解增量更新中的灾难性遗忘问题。尽管提出的模型很简单,但实验结果表明,S2OSC 在各种 OSC 任务中实现了最先进的性能,包括只有 300 个伪标签的 CIFAR-10 上 85.4% 的 F1。
更新日期:2021-09-04
down
wechat
bug