Boosting binary masks for multi-domain learning through affine transformations

Mancini, Massimiliano; Ricci, Elisa; Caputo, Barbara; Rota Bulò, Samuel

doi:10.1007/s00138-020-01090-5

Boosting binary masks for multi-domain learning through affine transformations

Original Paper
Published: 18 June 2020

Volume 31, article number 42, (2020)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Massimiliano Mancini ORCID: orcid.org/0000-0001-8595-9955¹,
Elisa Ricci^2,3,
Barbara Caputo^4,5 &
…
Samuel Rota Bulò⁶

365 Accesses
3 Citations
4 Altmetric
Explore all metrics

Abstract

In this work, we present a new, algorithm for multi-domain learning. Given a pretrained architecture and a set of visual domains received sequentially, the goal of multi-domain learning is to produce a single model performing a task in all the domains together. Recent works showed how we can address this problem by masking the internal weights of a given original convnet through learned binary variables. In this work, we provide a general formulation of binary mask-based models for multi-domain learning by affine transformations of the original network parameters. Our formulation obtains significantly higher levels of adaptation to new domains, achieving performances comparable to domain-specific models while requiring slightly more than 1 bit per network parameter per additional domain. Experiments on two popular benchmarks showcase the power of our approach, achieving performances close to state-of-the-art methods on the Visual Decathlon Challenge.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adding New Tasks to a Single Network with Weight Transformations Using Binary Masks

DeepJDOT: Deep Joint Distribution Optimal Transport for Unsupervised Domain Adaptation

Self-distilled Vision Transformer for Domain Generalization

Notes

We focus on classification tasks, but the proposed method applies also to other tasks.
Fully connected layers are a special case.
If the base architecture contains \(N_p\) parameters and the additional bits introduced per domain are \(A_p\) then \(\#~{\text {Params}=1+\frac{A_p\cdot (T-1)}{32\cdot N_p}}\), where T denotes the number of domains (included the one used for pretraining the network) and the 32 factors come from the bits required for each real number. The classifiers are not included in the computation.

References

Bendale, A., Boult, T.: Towards open world recognition. In: CVPR (2015)
Bendale, A., Boult, T.E.: Towards open set deep networks. In: CVPR (2016)
Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)
Berriel, R., Lathuilière, S., Nabi, M., Klein, T., Oliveira-Santos, T., Sebe, N., Ricci, E.: Budget-aware adapters for multi-domain learning. In: ICCV (2019)
Bilen, H., Fernando, B., Gavves, E., Vedaldi, A., Gould, S.: Dynamic image networks for action recognition. In: CVPR (2016)
Bilen, H., Vedaldi, A.: Universal representations: the missing link between faces, text, planktons, and cat breeds. arXiv preprint arXiv:1701.07275 (2017)
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010. Springer (2010)
Cermelli, F., Mancini, M., Buló, S.R., Ricci, E., Caputo, B.: Modeling the background for incremental learning in semantic segmentation. In: CVPR (2020)
Cermelli, F., Mancini, M., Ricci, E., Caputo, B.: The RGB-D triathlon: towards agile visual toolboxes for robots. In: IROS (2019)
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: CVPR (2014)
Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: training deep neural networks with weights and activations constrained to \(+1\) or \(-1\). arXiv preprint arXiv:1602.02830 (2016)
Eitz, M., Hays, J., Alexa, M.: How do humans sketch objects? ACM Trans. Gr. 31(4), 44:1–44:10 (2012)
Google Scholar
Fontanel, D., Cermelli, F., Mancini, M., Buló, S.R., Ricci, E., Caputo, B.: Boosting deep open world recognition by clustering. arXiv preprint arXiv:2004.13849 (2020)
French, R.M.: Catastrophic forgetting in connectionist networks. Trends Cogn. Sci. 3(4), 128–135 (1999)
Article Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
Goodfellow, I.J., Mirza, M., Xiao, D., Courville, A., Bengio, Y.: An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211 (2013)
Goodman, R.M., Zeng, Z.: A learning algorithm for multi-layer perceptrons with hard-limiting threshold units. In: NIPS Workshops (1994)
Guerriero, S., Caputo, B., Mensink, T.: Deep nearest class mean classifiers. In: ICLR Worskhops (2018)
Guo, Y., Shi, H., Kumar, A., Grauman, K., Rosing, T., Feris, R.: Spottune: transfer learning through adaptive fine-tuning. In: CVPR (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Hinton, G.: Neural networks for machine learning (2012). Coursera, video lectures
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: CVPR (2017)
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: NIPS (2016)
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Quantized neural networks: training neural networks with low precision weights and activations. JMLR 18(187), 1–30 (2018)
MATH Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A.A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)
Article MathSciNet Google Scholar
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: ICCV Workshops (2013)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
Kuzborskij, I., Orabona, F., Caputo, B.: From N to N + 1: multiclass transfer incremental learning. In: CVPR (2013)
Kuzborskij, I., Orabona, F., Caputo, B.: Scalable greedy algorithms for transfer learning. CVIU 156, 174–185 (2017)
Google Scholar
Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015)
Article MathSciNet Google Scholar
Li, Y., Vasconcelos, N.: Efficient multi-domain learning by covariance normalization. In: CVPR (2019)
Li, Z., Hoiem, D.: Learning without forgetting. In: IEEE T-PAMI (2017)
Lin, D., Talathi, S., Annapureddy, S.: Fixed point quantization of deep convolutional networks. In: ICML (2016)
Lin, Z., Courbariaux, M., Memisevic, R., Bengio, Y.: Neural networks with few multiplications. arXiv preprint arXiv:1510.03009 (2015)
Liu, S., Johns, E., Davison, A.J.: End-to-end multi-task learning with attention. In: CVPR (2019)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013)
Mallya, A., Lazebnik, S.: Packnet: adding multiple tasks to a single network by iterative pruning. In: CVPR (2018)
Mallya, A., Lazebnik, S.: Piggyback: adding multiple tasks to a single, fixed network by learning to mask. arXiv preprint arXiv:1801.06519 (2018)
Mancini, M., Karaoguz, H., Ricci, E., Jensfelt, P., Caputo, B.: Knowledge is never enough: towards web aided deep open world recognition. In: ICRA (2019)
Mancini, M., Ricci, E., Caputo, B., Rota Buló, S.: Adding new tasks to a single network with weight transformations using binary masks. In: ECCV-WS (2018)
McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. In: Bower, G.H. (ed.) Psychology of learning and motivation, vol. 24, pp. 109–165. Elsevier, Amsterdam (1989)
Google Scholar
Mensink, T., Verbeek, J.J., Perronnin, F., Csurka, G.: Distance-based image classification: generalizing to new classes at near-zero cost. IEEE T-PAMI 35(11), 2624–2637 (2013)
Article Google Scholar
Morgado, P., Vasconcelos, N.: Nettailor: tuning the architecture, not just the weights. In: CVPR (2019)
Munder, S., Gavrila, D.M.: An experimental study on pedestrian classification. IEEE T-PAMI 28(11), 1863–1868 (2006)
Article Google Scholar
Neelakantan, A., Vilnis, L., Le, Q.V., Sutskever, I., Kaiser, L., Kurach, K., Martens, J.: Adding gradient noise improves learning for very deep networks. arXiv preprint arXiv:1511.06807 (2015)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshops (2011)
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Sixth Indian Conference on Computer Vision, Graphics & Image Processing, 2008. ICVGIP’08. IEEE (2008)
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-net: Imagenet classification using binary convolutional neural networks. In: ECCV (2016)
Rebuffi, S., Kolesnikov, A., Sperl, G., Lampert, C.H.: iCaRL: incremental classifier and representation learning. In: CVPR (2017)
Rebuffi, S.A., Bilen, H., Vedaldi, A.: Learning multiple visual domains with residual adapters. In: NIPS (2017)
Rebuffi, S.A., Bilen, H., Vedaldi, A.: Efficient parametrization of multi-domain deep neural networks. In: CVPR (2018)
Ristin, M., Guillaumin, M., Gall, J., Gool, L.J.V.: Incremental learning of random forests for large-scale image classification. IEEE T-PAMI 38(3), 490–503 (2016)
Article Google Scholar
Rosenfeld, A., Tsotsos, J.K.: Incremental learning through deep adaptation. arXiv preprint arXiv:1705.04228 (2017)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Rusu, A.A., Rabinowitz, N.C., Desjardins, G., Soyer, H., Kirkpatrick, J., Kavukcuoglu, K., Pascanu, R., Hadsell, R.: Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016)
Saleh, B., Elgammal, A.: Large-scale classification of fine-art paintings: learning the right metric on the right feature. In: International Conference on Data Mining Workshops (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
Stallkamp, J., Schlipsing, M., Salmen, J., Igel, C.: Man vs. computer: benchmarking machine learning algorithms for traffic sign recognition. Neural Netw. 32, 323–332 (2012)
Article Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset (2011)
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC (2016)
Zamir, A.R., Sax, A., Shen, W., Guibas, L.J., Malik, J., Savarese, S.: Taskonomy: disentangling task transfer learning. In: CVPR (2018)
Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: DoReFa-Net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016)

Download references

Acknowledgements

We acknowledge financial support from ERC Grant 637076—RoboExNovo and Project DIGIMAP, Grant 860375, funded by the Austrian Research Promotion Agency (FFG).

Author information

Authors and Affiliations

Sapienza University of Rome, Rome, Italy
Massimiliano Mancini
Fondazione Bruno Kessler, Trento, Italy
Elisa Ricci
University of Trento, Trento, Italy
Elisa Ricci
Politecnico di Torino, Turin, Italy
Barbara Caputo
Italian Institute of Technology, Turin, Italy
Barbara Caputo
Mapillary Research, Graz, Austria
Samuel Rota Bulò

Authors

Massimiliano Mancini
View author publications
You can also search for this author in PubMed Google Scholar
Elisa Ricci
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Caputo
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Rota Bulò
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Massimiliano Mancini.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mancini, M., Ricci, E., Caputo, B. et al. Boosting binary masks for multi-domain learning through affine transformations. Machine Vision and Applications 31, 42 (2020). https://doi.org/10.1007/s00138-020-01090-5

Download citation

Received: 21 January 2020
Revised: 07 May 2020
Accepted: 21 May 2020
Published: 18 June 2020
DOI: https://doi.org/10.1007/s00138-020-01090-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Boosting binary masks for multi-domain learning through affine transformations

Abstract

Access this article

Similar content being viewed by others

Adding New Tasks to a Single Network with Weight Transformations Using Binary Masks

DeepJDOT: Deep Joint Distribution Optimal Transport for Unsupervised Domain Adaptation

Self-distilled Vision Transformer for Domain Generalization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Adding New Tasks to a Single Network with Weight Transformations Using Binary Masks

DeepJDOT: Deep Joint Distribution Optimal Transport for Unsupervised Domain Adaptation

Self-distilled Vision Transformer for Domain Generalization

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation