On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-07-30 , DOI: arxiv-2107.14762
Haizhou Shi, Youcai Zhang, Siliang Tang, Wenjie Zhu, Yaqian Li, Yandong Guo, Yueting Zhuang

It is a consensus that small models perform quite poorly under the paradigm of self-supervised contrastive learning. Existing methods usually adopt a large off-the-shelf model to transfer knowledge to the small one via knowledge distillation. Despite their effectiveness, distillation-based methods may not be suitable for some resource-restricted scenarios due to the huge computational expenses of deploying a large model. In this paper, we study the issue of training self-supervised small models without distillation signals. We first evaluate the representation spaces of the small models and make two non-negligible observations: (i) small models can complete the pretext task without overfitting despite its limited capacity; (ii) small models universally suffer the problem of over-clustering. Then we verify multiple assumptions that are considered to alleviate the over-clustering phenomenon. Finally, we combine the validated techniques and improve the baseline of five small architectures with considerable margins, which indicates that training small self-supervised contrastive models is feasible even without distillation signals.

中文翻译：

关于无蒸馏信号的小型自监督对比模型的功效

小模型在自监督对比学习的范式下表现很差，这是一个共识。现有方法通常采用大型现成模型，通过知识蒸馏将知识转移到小型模型。尽管它们有效，但由于部署大型模型的巨大计算开销，基于蒸馏的方法可能不适合某些资源受限的场景。在本文中，我们研究了在没有蒸馏信号的情况下训练自监督小模型的问题。我们首先评估小模型的表示空间并进行两个不可忽略的观察：（i）小模型尽管容量有限，但可以完成借口任务而不会过度拟合；(ii) 小模型普遍存在过度聚类的问题。然后我们验证被认为可以减轻过度聚类现象的多个假设。最后，我们结合经过验证的技术并改进了五个具有可观裕度的小型架构的基线，这表明即使没有蒸馏信号，训练小型自监督对比模型也是可行的。

更新日期：2021-08-02

点击分享查看原文

点击收藏

阅读更多本刊最新论文