Multi-label image recognition with two-stream dynamic graph convolution networks,Image and Vision Computing

当前位置： X-MOL 学术 › Image Vis. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-label image recognition with two-stream dynamic graph convolution networks
Image and Vision Computing ( IF 4.2 ) Pub Date : 2021-06-24 , DOI: 10.1016/j.imavis.2021.104238
Pingping Cao , Pengpeng Chen , Qiang Niu

Recent studies use Graph Convolution Networks (GCN) to model label correlation for multi-label images because of the outstanding performance of GCN in relational modeling tasks. However, the traditional GCN has low generalization, and the current state-of-the-arts' accuracy is poor. Therefore, we propose a Two-Stream Dynamic Graph Convolution Network (2S-DGCN) to improve the performance of multi-label image recognition. In 2S-DGCN, we first obtain the Up Confidence Score of prediction categories (UCS), the content-aware category and the label discriminant vector by a Semantic Attention Module (SAM) and a Dynamic Graph Convolution Network (DGCN) in upstream. Then fed the new graph feature nodes reconstructed by lateral embedding the content-aware category and the label discriminant vector into a DGCN to produce the Down Confidence Score of prediction categories (DCS) in downstream. Finally, the Final Confidence Score of prediction categories (FCS) for multi-label image recognition is synthesized by fusing the UCS and DCS. Extensive experiments on the public multi-label benchmarks achieve mAPs of 85.6% on MS-COCO and 95.4% on VOC 2007. The results of compared experiment and visualization demonstrate that our method has better performance than the current state-of-the-art methods.

中文翻译：

双流动态图卷积网络的多标签图像识别

由于 GCN 在关系建模任务中的出色表现，最近的研究使用图卷积网络 (GCN) 对多标签图像的标签相关性进行建模。然而，传统的 GCN 泛化能力低，目前的 state-of-the-arts 的准确率很差。因此，我们提出了一个双流动态图卷积网络（2S-DGCN）来提高多标签图像识别的性能。在 2S-DGCN 中，我们首先通过上游的语义注意模块 (SAM) 和动态图卷积网络 (DGCN) 获得预测类别 (UCS)、内容感知类别和标签判别向量的 Up Confidence Score。然后将通过横向嵌入内容感知类别和标签判别向量重建的新图特征节点馈送到 DGCN 中，以在下游产生预测类别 (DCS) 的向下置信度分数。最后，通过融合 UCS 和 DCS 合成用于多标签图像识别的预测类别（FCS）的最终置信度得分。在公共多标签基准上的大量实验在 MS-COCO 上实现了 85.6% 的 mAP，在 VOC 2007 上实现了 95.4%。比较实验和可视化的结果表明我们的方法比当前最先进的方法具有更好的性能.

更新日期：2021-06-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11