当前位置: X-MOL 学术J. Am. Stat. Assoc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Collaborative Multilabel Classification
Journal of the American Statistical Association ( IF 3.7 ) Pub Date : 2021-09-01 , DOI: 10.1080/01621459.2021.1961783
Yunzhang Zhu 1 , Xiaotong Shen 2 , Hui Jiang 3 , Wing Hung Wong 4
Affiliation  

Abstract

In multilabel classification, strong label dependence is present for exploiting, particularly for word-to-word dependence defined by semantic labels. In such a situation, we develop a collaborative-learning framework to predict class labels based on label-predictor pairs and label-only data. For example, in image categorization and recognition, language expressions describe the content of an image together with a large number of words and phrases without associated images. This article proposes a new loss quantifying partial correctness for false positive and negative misclassifications due to label similarities. Given this loss, we develop the Bayes rule to capture label dependence by nonlinear classification. On this ground, we introduce a weighted random forest classifier for complete data and a stacking scheme for leveraging additional labels to enhance the performance of supervised learning based on label-predictor pairs. Importantly, we decompose multilabel classification into a sequence of independent learning tasks, based on which the computational complexity of our classifier becomes linear in the size of labels. Compared to existing classifiers without label-only data, the proposed classifier enjoys the computational benefit while enabling the detection of novel labels absent from training by exploring label dependence and leveraging label-only data for higher accuracy. Theoretically, we show that the proposed method reconstructs the Bayes performance consistently, achieving the desired learning accuracy. Numerically, we demonstrate that the proposed method compares favorably in terms of the proposed and Hamming losses against binary relevance and a regularized Ising classifier modeling conditional label dependence. Indeed, leveraging additional labels tends to improve the supervised performance, especially when the training sample is not very large, as in semisupervised learning. Finally, we demonstrate the utility of the proposed approach on the Microsoft COCO object detection challenge, PASCAL visual object classes challenge 2007, and Mediamill benchmark.



中文翻译:

协作多标签分类

摘要

在多标签分类中,存在很强的标签依赖性可供利用,特别是对于由语义标签定义的单词到单词的依赖性。在这种情况下,我们开发了一个协作学习框架来根据标签预测器对和仅标签数据来预测类标签。例如,在图像分类和识别中,语言表达与大量没有关联图像的单词和短语一起描述图像的内容。本文提出了一种新的损失,用于量化由于标签相似性而导致的误分类和误分类的部分正确性。考虑到这种损失,我们开发了贝叶斯规则来通过非线性分类捕获标签依赖性。在这片土地上,我们引入了完整数据的加权随机森林分类器和利用附加标签的堆叠方案来增强基于标签预测器对的监督学习的性能。重要的是,我们将多标签分类分解为一系列独立的学习任务,基于此,我们的分类器的计算复杂度与标签的大小成线性关系。与没有仅标签数据的现有分类器相比,所提出的分类器享有计算优势,同时通过探索标签依赖性并利用仅标签数据来实现更高的准确性,从而能够检测训练中缺少的新标签。从理论上讲,我们表明所提出的方法能够一致地重建贝叶斯性能,从而达到所需的学习精度。从数字上看,我们证明,所提出的方法在针对二元相关性和建模条件标签依赖性的正则化伊辛分类器和汉明损失方面具有优势。事实上,利用额外的标签往往会提高监督性能,特别是当训练样本不是很大时,例如在半监督学习中。最后,我们在 Microsoft COCO 对象检测挑战赛、2007 年 PASCAL 视觉对象类挑战赛和 Mediamill 基准测试中展示了所提出的方法的实用性。就像半监督学习一样。最后,我们在 Microsoft COCO 对象检测挑战赛、2007 年 PASCAL 视觉对象类挑战赛和 Mediamill 基准测试中展示了所提出的方法的实用性。就像半监督学习一样。最后,我们在 Microsoft COCO 对象检测挑战赛、2007 年 PASCAL 视觉对象类挑战赛和 Mediamill 基准测试中展示了所提出的方法的实用性。

更新日期:2021-09-01
down
wechat
bug