Toward Better Accuracy-Efficiency Trade-Offs: Divide and Co-Training,IEEE Transactions on Image Processing

当前位置： X-MOL 学术 › IEEE Trans. Image Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Toward Better Accuracy-Efficiency Trade-Offs: Divide and Co-Training
IEEE Transactions on Image Processing ( IF 10.8 ) Pub Date : 9-5-2022 , DOI: 10.1109/tip.2022.3201602
Shuai Zhao ₁ , Liguang Zhou ₁ , Wenxiao Wang ₂ , Deng Cai ₂ , Tin Lun Lam ₁ , Yangsheng Xu ₁

Affiliation

The width of a neural network matters since increasing the width will necessarily increase the model capacity. However, the performance of a network does not improve linearly with the width and soon gets saturated. In this case, we argue that increasing the number of networks (ensemble) can achieve better accuracy-efficiency trade-offs than purely increasing the width. To prove it, one large network is divided into several small ones regarding its parameters and regularization components. Each of these small networks has a fraction of the original one’s parameters. We then train these small networks together and make them see various views of the same data to increase their diversity. During this co-training process, networks can also learn from each other. As a result, small networks can achieve better ensemble performance than the large one with few or no extra parameters or FLOPs, i. e., achieving better accuracy-efficiency trade-offs. Small networks can also achieve faster inference speed than the large one by concurrent running. All of the above shows that the number of networks is a new dimension of model scaling. We validate our argument with 8 different neural architectures on common benchmarks through extensive experiments.

中文翻译：

实现更好的准确性与效率的权衡：分工与协同训练

神经网络的宽度很重要，因为增加宽度必然会增加模型容量。然而，网络的性能不会随着宽度线性提高，很快就会饱和。在这种情况下，我们认为增加网络（集成）的数量比纯粹增加宽度可以实现更好的精度-效率权衡。为了证明这一点，一个大型网络根据其参数和正则化组件被分为几个小网络。这些小型网络中的每一个都具有原始参数的一小部分。然后，我们一起训练这些小型网络，让它们看到相同数据的不同视图，以增加它们的多样性。在这个协同训练过程中，网络还可以互相学习。因此，小型网络可以比大型网络在很少或没有额外参数或 FLOP 的情况下实现更好的集成性能，即实现更好的精度-效率权衡。通过并发运行，小型网络也可以比大型网络获得更快的推理速度。所有这些都表明网络数量是模型缩放的新维度。我们通过广泛的实验在通用基准上使用 8 种不同的神经架构验证了我们的论点。

更新日期：2024-08-26

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11