当前位置:
X-MOL 学术
›
arXiv.cs.CV
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-05-06 , DOI: arxiv-2105.02498 Yue Song, Nicu Sebe, Wei Wang
arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2021-05-06 , DOI: arxiv-2105.02498 Yue Song, Nicu Sebe, Wei Wang
Global covariance pooling (GCP) aims at exploiting the second-order
statistics of the convolutional feature. Its effectiveness has been
demonstrated in boosting the classification performance of Convolutional Neural
Networks (CNNs). Singular Value Decomposition (SVD) is used in GCP to compute
the matrix square root. However, the approximate matrix square root calculated
using Newton-Schulz iteration \cite{li2018towards} outperforms the accurate one
computed via SVD \cite{li2017second}. We empirically analyze the reason behind
the performance gap from the perspectives of data precision and gradient
smoothness. Various remedies for computing smooth SVD gradients are
investigated. Based on our observation and analyses, a hybrid training protocol
is proposed for SVD-based GCP meta-layers such that competitive performances
can be achieved against Newton-Schulz iteration. Moreover, we propose a new GCP
meta-layer that uses SVD in the forward pass, and Pad\'e Approximants in the
backward propagation to compute the gradients. The proposed meta-layer has been
integrated into different CNN models and achieves state-of-the-art performances
on both large-scale and fine-grained datasets.
中文翻译:
为什么在全局协方差合并中近似矩阵平方根优于精确的SVD?
全局协方差合并(GCP)旨在利用卷积特征的二阶统计量。在提高卷积神经网络(CNN)的分类性能方面已证明了其有效性。GCP中使用奇异值分解(SVD)来计算矩阵平方根。但是,使用牛顿-舒尔茨迭代\ cite {li2018towards}计算的近似矩阵平方根优于通过SVD \ cite {li2017second}计算的精确矩阵平方根。我们从数据精度和梯度平滑度的角度对性能差距背后的原因进行了实证分析。研究了用于计算平滑SVD梯度的各种方法。根据我们的观察和分析,针对基于SVD的GCP元层,提出了一种混合训练协议,从而可以针对Newton-Schulz迭代获得具有竞争力的性能。此外,我们提出了一个新的GCP元层,该层在前向通过时使用SVD,在后向传播中使用Pad'e近似值来计算梯度。拟议的元层已集成到不同的CNN模型中,并在大规模和细粒度数据集上均实现了最新的性能。
更新日期:2021-05-07
中文翻译:
为什么在全局协方差合并中近似矩阵平方根优于精确的SVD?
全局协方差合并(GCP)旨在利用卷积特征的二阶统计量。在提高卷积神经网络(CNN)的分类性能方面已证明了其有效性。GCP中使用奇异值分解(SVD)来计算矩阵平方根。但是,使用牛顿-舒尔茨迭代\ cite {li2018towards}计算的近似矩阵平方根优于通过SVD \ cite {li2017second}计算的精确矩阵平方根。我们从数据精度和梯度平滑度的角度对性能差距背后的原因进行了实证分析。研究了用于计算平滑SVD梯度的各种方法。根据我们的观察和分析,针对基于SVD的GCP元层,提出了一种混合训练协议,从而可以针对Newton-Schulz迭代获得具有竞争力的性能。此外,我们提出了一个新的GCP元层,该层在前向通过时使用SVD,在后向传播中使用Pad'e近似值来计算梯度。拟议的元层已集成到不同的CNN模型中,并在大规模和细粒度数据集上均实现了最新的性能。