Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?,arXiv - CS - Computer Vision and Pattern Recognition

当前位置： X-MOL 学术 › arXiv.cs.CV › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-05-06 , DOI: arxiv-2105.02498
Yue Song, Nicu Sebe, Wei Wang

Global covariance pooling (GCP) aims at exploiting the second-order statistics of the convolutional feature. Its effectiveness has been demonstrated in boosting the classification performance of Convolutional Neural Networks (CNNs). Singular Value Decomposition (SVD) is used in GCP to compute the matrix square root. However, the approximate matrix square root calculated using Newton-Schulz iteration \cite{li2018towards} outperforms the accurate one computed via SVD \cite{li2017second}. We empirically analyze the reason behind the performance gap from the perspectives of data precision and gradient smoothness. Various remedies for computing smooth SVD gradients are investigated. Based on our observation and analyses, a hybrid training protocol is proposed for SVD-based GCP meta-layers such that competitive performances can be achieved against Newton-Schulz iteration. Moreover, we propose a new GCP meta-layer that uses SVD in the forward pass, and Pad\'e Approximants in the backward propagation to compute the gradients. The proposed meta-layer has been integrated into different CNN models and achieves state-of-the-art performances on both large-scale and fine-grained datasets.

中文翻译：

为什么在全局协方差合并中近似矩阵平方根优于精确的SVD？

全局协方差合并（GCP）旨在利用卷积特征的二阶统计量。在提高卷积神经网络（CNN）的分类性能方面已证明了其有效性。GCP中使用奇异值分解（SVD）来计算矩阵平方根。但是，使用牛顿-舒尔茨迭代\ cite {li2018towards}计算的近似矩阵平方根优于通过SVD \ cite {li2017second}计算的精确矩阵平方根。我们从数据精度和梯度平滑度的角度对性能差距背后的原因进行了实证分析。研究了用于计算平滑SVD梯度的各种方法。根据我们的观察和分析，针对基于SVD的GCP元层，提出了一种混合训练协议，从而可以针对Newton-Schulz迭代获得具有竞争力的性能。此外，我们提出了一个新的GCP元层，该层在前向通过时使用SVD，在后向传播中使用Pad'e近似值来计算梯度。拟议的元层已集成到不同的CNN模型中，并在大规模和细粒度数据集上均实现了最新的性能。

更新日期：2021-05-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>