Retaining Diverse Information in Contrastive Learning Through Multiple Projectors,IEEE Signal Processing Letters

当前位置： X-MOL 学术 › IEEE Signal Process. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Retaining Diverse Information in Contrastive Learning Through Multiple Projectors
IEEE Signal Processing Letters ( IF 3.2 ) Pub Date : 8-10-2022 , DOI: 10.1109/lsp.2022.3198015
He Zhu ₁ , Shan Yu ₁

Affiliation

Contrastive Learning (CL) achieves great success in learning visual representations by comparing two augmented views of the same images. However, this very design removes transformation-dependent visual information from the pre-training, which leads to incomplete representations and is harmful for downstream tasks. It's still an open question to retain such information in the CL pre-training process. In this letter, we propose a Multi-Projector Contrastive Learning (MPCL) to address this issue, which produces multi-view contrastive candidates to retain more comprehensive visual characteristics. In addition, we introduce a contrast regularization to construct multiple projectors as different as possible, thereby facilitating the diversity of preserved information. Finally, to promote a consistent learning process for multi-projector, we design a projector training balance strategy to adjust the learning preference of different projectors. MPCL can be applied to various CL frameworks to effectively protect visual characteristics. Experimental results show that the method performs well on subsequent tasks such as linear and semi-supervised image classification, object detection, and semantic segmentation. Importantly, the visual transformer trained by MPCL improves 2% absolute points of linear evaluation beyond the MoCo-v3 on the ImageNet-100 dataset.

中文翻译：

通过多台投影仪在对比学习中保留多样化信息

对比学习（CL）通过比较同一图像的两个增强视图在学习视觉表示方面取得了巨大成功。然而，这种设计从预训练中删除了依赖于变换的视觉信息，这导致表示不完整，并且对下游任务有害。在 CL 预训练过程中保留这些信息仍然是一个悬而未决的问题。在这封信中，我们提出了多投影仪对比学习（MPCL）来解决这个问题，它产生多视图对比候选，以保留更全面的视觉特征。此外，我们引入了对比度正则化来构建尽可能不同的多个投影仪，从而促进保存信息的多样性。最后，为了促进多投影仪的一致学习过程，我们设计了投影仪训练平衡策略来调整不同投影仪的学习偏好。 MPCL可以应用于各种CL框架，有效保护视觉特性。实验结果表明，该方法在线性和半监督图像分类、目标检测、语义分割等后续任务上表现良好。重要的是，MPCL 训练的视觉 Transformer 在 ImageNet-100 数据集上比 MoCo-v3 提高了 2% 的线性评估绝对点。

更新日期：2024-08-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11