当前位置: X-MOL 学术Int. J. Comput. Vis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SSN: Learning Sparse Switchable Normalization via SparsestMax
International Journal of Computer Vision ( IF 19.5 ) Pub Date : 2019-12-09 , DOI: 10.1007/s11263-019-01269-y
Wenqi Shao , Jingyu Li , Jiamin Ren , Ruimao Zhang , Xiaogang Wang , Ping Luo

Normalization method deals with parameters training of convolution neural networks (CNNs) in which there are often multiple convolution layers. Despite the fact that layers in CNN are not homogeneous in the role they play at representing a prediction function, existing works often employ identical normalizer in different layers, making performance away from idealism. To tackle this problem and further boost performance, a recently-proposed switchable normalization (SN) provides a new perspective for deep learning: it learns to select different normalizers for different convolution layers of a ConvNet. However, SN uses softmax function to learn importance ratios to combine normalizers, not only leading to redundant computations compared to a single normalizer but also making model less interpretable. This work addresses this issue by presenting sparse switchable normalization (SSN) where the importance ratios are constrained to be sparse. Unlike ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _1$$\end{document} and ℓ0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _0$$\end{document} regularizations that impose difficulties in tuning layer-wise regularization coefficients, we turn this sparse-constrained optimization problem into feed-forward computation by proposing SparsestMax, which is a sparse version of softmax. SSN has several appealing properties. (1) It inherits all benefits from SN such as applicability in various tasks and robustness to a wide range of batch sizes. (2) It is guaranteed to select only one normalizer for each normalization layer, avoiding redundant computations and improving interpretability of normalizer selection. (3) SSN can be transferred to various tasks in an end-to-end manner. Extensive experiments show that SSN outperforms its counterparts on various challenging benchmarks such as ImageNet, COCO, Cityscapes, ADE20K, Kinetics and MegaFace. Models and code are available at https://github.com/switchablenorms/Sparse_SwitchNorm.

中文翻译:

SSN:通过 SparsestMax 学习稀疏可切换归一化

归一化方法处理卷积神经网络 (CNN) 的参数训练,其中通常有多个卷积层。尽管 CNN 中的层在表示预测函数方面的作用并不相同,但现有工作通常在不同层中使用相同的归一化器,从而使性能远离理想主义。为了解决这个问题并进一步提高性能,最近提出的可切换归一化 (SN) 为深度学习提供了一个新的视角:它学习为 ConvNet 的不同卷积层选择不同的归一化器。然而,SN 使用 softmax 函数来学习重要性比率以组合归一化器,与单个归一化器相比,不仅导致冗余计算,而且使模型的可解释性降低。这项工作通过提出稀疏可切换归一化(SSN)来解决这个问题,其中重要性比率被限制为稀疏。不同于 ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin }{-69pt} \begin{document}$$\ell _1$$\end{document} 和 ℓ0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{ amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _0$$\end{document} 给调整带来困难的正则化逐层正则化系数,我们通过提出 SparsestMax 将这个稀疏约束优化问题转化为前馈计算,SparsestMax 是 softmax 的稀疏版本。SSN 有几个吸引人的特性。(1) 它继承了 SN 的所有优点,例如在各种任务中的适用性和对广泛批量大小的鲁棒性。(2)保证每个归一化层只选择一个归一化器,避免冗余计算,提高归一化器选择的可解释性。(3) SSN可以端到端的转移到各种任务中。大量实验表明,SSN 在各种具有挑战性的基准测试(如 ImageNet、COCO、Cityscapes、ADE20K、Kinetics 和 MegaFace)上的表现优于其同行。模型和代码可在 https://github.com/switchablenorms/Sparse_SwitchNorm 获得。SSN 有几个吸引人的特性。(1) 它继承了 SN 的所有优点,例如在各种任务中的适用性和对广泛批量大小的鲁棒性。(2)保证每个归一化层只选择一个归一化器,避免冗余计算,提高归一化器选择的可解释性。(3) SSN可以端到端的转移到各种任务中。大量实验表明,SSN 在各种具有挑战性的基准测试(如 ImageNet、COCO、Cityscapes、ADE20K、Kinetics 和 MegaFace)上的表现优于其同行。模型和代码可从 https://github.com/switchablenorms/Sparse_SwitchNorm 获得。SSN 有几个吸引人的特性。(1) 它继承了 SN 的所有优点,例如在各种任务中的适用性和对广泛批量大小的鲁棒性。(2)保证每个归一化层只选择一个归一化器,避免冗余计算,提高归一化器选择的可解释性。(3) SSN可以端到端的转移到各种任务中。大量实验表明,SSN 在各种具有挑战性的基准测试(如 ImageNet、COCO、Cityscapes、ADE20K、Kinetics 和 MegaFace)上的表现优于其同行。模型和代码可从 https://github.com/switchablenorms/Sparse_SwitchNorm 获得。(2)保证每个归一化层只选择一个归一化器,避免冗余计算,提高归一化器选择的可解释性。(3) SSN可以端到端的转移到各种任务中。大量实验表明,SSN 在各种具有挑战性的基准测试(如 ImageNet、COCO、Cityscapes、ADE20K、Kinetics 和 MegaFace)上的表现优于其同行。模型和代码可从 https://github.com/switchablenorms/Sparse_SwitchNorm 获得。(2)保证每个归一化层只选择一个归一化器,避免冗余计算,提高归一化器选择的可解释性。(3) SSN可以端到端的转移到各种任务中。大量实验表明,SSN 在各种具有挑战性的基准测试(如 ImageNet、COCO、Cityscapes、ADE20K、Kinetics 和 MegaFace)上的表现优于其同行。模型和代码可从 https://github.com/switchablenorms/Sparse_SwitchNorm 获得。
更新日期:2019-12-09
down
wechat
bug