Information Bottleneck Theory on Convolutional Neural Networks

Li, Junjie; Liu, Ding

doi:10.1007/s11063-021-10445-6

Information Bottleneck Theory on Convolutional Neural Networks

Published: 18 February 2021

Volume 53, pages 1385–1400, (2021)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Junjie Li¹ &
Ding Liu¹

996 Accesses
7 Citations
Explore all metrics

Abstract

Recent years, many researches attempt to open the black box of deep neural networks and propose a various of theories to understand it. Among them, information bottleneck (IB) theory claims that there are two distinct phases consisting of fitting phase and compression phase in the course of training. This statement attracts many attentions since its success in explaining the inner behavior of feedforward neural networks. In this paper, we employ IB theory to understand the dynamic behavior of convolutional neural networks (CNNs) and investigate how the fundamental features such as convolutional layer width, kernel size, network depth, pooling layers and multi-fully connected layer have impact on the performance of CNNs. In particular, through a series of experimental analysis on benchmark of MNIST and Fashion-MNIST, we demonstrate that the compression phase is not observed in all these cases. This shows us the CNNs have a rather complicated behavior than feedforward neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Keras Convolutional Neural Network Model in Sequential Build of 4 and 6 Layer Architectures

A review of convolutional neural network architectures and their optimizations

Article 22 June 2022

Lightweight residual densely connected convolutional neural network

Article 04 July 2020

Notes

https://github.com/mrjunjieli/IB_ON_CNN.

References

Advani MS, Saxe AM (2017) High-dimensional dynamics of generalization error in neural networks. Preprint arXiv:1710.03667
Amjad RA, Geiger BC (2019) Learning representations for neural network-based classification using the information bottleneck principle. IEEE Trans Pattern Anal Mach Intell 42:2225–2239
Article Google Scholar
Chechik G, Globerson A, Tishby N, Weiss Y (2005) Information bottleneck for Gaussian variables. J Mach Learn Res 6(Jan):165–188
MathSciNet MATH Google Scholar
Dai B, Zhu C, Wipf D (2018) Compressing neural networks using the variational information bottleneck. Preprint arXiv:1802.10399
Elidan G, Friedman N (2005) Learning hidden variable networks: the information bottleneck approach. J Mach Learn Res 6(Jan):81–127
MathSciNet MATH Google Scholar
Friedman N, Mosenzon O, Slonim N, Tishby N (2013) Multivariate information bottleneck. Preprint arXiv:1301.2270
Gabrié M, Manoel A, Luneau C, Macris N, Krzakala F, Zdeborová L et al (2018) Entropy and mutual information in models of deep neural networks. In: Advances in neural information processing systems, pp 1821–1831
Goldfeld Z, Berg E, Greenewald K, Melnyk I, Nguyen N, Kingsbury B, Polyanskiy Y (2018) Estimating information flow in deep neural networks. Preprint arXiv:1810.05728
Goldfeld Z, Van Den Berg E, Greenewald K, Melnyk I, Nguyen N, Kingsbury B, Polyanskiy Y (2019) Estimating information flow in deep neural networks. In: Proceedings of the 36th international conference on machine learning, vol 97, pp 2299–2308
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, New York
MATH Google Scholar
Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D (2019) A survey of methods for explaining black box models. ACM Comput Surv (CSUR) 51(5):93
Article Google Scholar
Hsu WH, Kennedy LS, Chang SF (2006) Video search reranking via information bottleneck principle. In: Proceedings of the 14th ACM international conference on multimedia, pp 35–44
Jónsson H, Cherubini G, Eleftheriou E (2019) Convergence of DNNS with mutual-information-based regularization. In: Proceedings of the Bayesian deep learning@ advances in neural information processing systems (NeurIPS), Vancouver
Kadmon J, Sompolinsky H (2016) Optimal architectures in a solvable model of deep networks. In: Advances in neural information processing systems, pp 4781–4789
Kolchinsky A, Tracey B (2017) Estimating mixture entropy with pairwise distances. Entropy 19(7):361
Article Google Scholar
Kolchinsky A, Tracey BD, Wolpert DH (2019) Nonlinear information bottleneck. Entropy 21(12):1181
Article MathSciNet Google Scholar
Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images
Painsky A, Tishby N (2017) Gaussian lower bound for the information bottleneck limit. J Mach Learn Res 18(1):7908–7936
MathSciNet MATH Google Scholar
Poole B, Ozair S, Oord A, Alemi AA, Tucker G (2019) On variational bounds of mutual information. Preprint arXiv:1905.06922
Saxe AM, Bansal Y, Dapello J, Advani M, Kolchinsky A, Tracey BD, Cox DD (2019) On the information bottleneck theory of deep learning. J Stat Mech Theory Exp 2019(12):124020
Article MathSciNet Google Scholar
Saxe AM, Mcclelland JL, Ganguli S (2014) Exact solutions to the nonlinear dynamics of learning in deep linear neural network. In: In International conference on learning representations. Citeseer, New York
Shamir O, Sabato S, Tishby N (2010) Learning and generalization with the information bottleneck. Theoret Comput Sci 411(29–30):2696–2711
Article MathSciNet Google Scholar
Shwartz-Ziv R, Tishby N (2017) Opening the black box of deep neural networks via information. Preprint arXiv:1703.00810
Slonim N, Tishby N (2000) Document clustering using word clusters via the information bottleneck method. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pp 208–215
Strouse D, Schwab DJ (2017) The deterministic information bottleneck. Neural Comput 29(6):1611–1630
Article MathSciNet Google Scholar
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Tishby N (2000) The information bottleneck method. Computing Research Repository (CoRR)
Tishby N, Zaslavsky N (2015) Deep learning and the information bottleneck principle. In: 2015 IEEE information theory workshop (ITW). IEEE, New York, pp 1–5
Wang Q, Gao J, Li X (2019) Weakly supervised adversarial domain adaptation for semantic segmentation in urban scenes. IEEE Trans Image Process 28(9):4376–4386
Article MathSciNet Google Scholar
Wang Q, Yuan Z, Du Q, Li X (2018) Getnet: a general end-to-end 2-D CNN framework for hyperspectral image change detection. IEEE Trans Geosci Remote Sens 57(1):3–13
Article Google Scholar
Yu S, Principe JC (2019) Understanding autoencoders with information theoretic concepts. Neural Netw 117:104–123
Article Google Scholar
Yu S, Wickstrøm K, Jenssen R, Principe JC (2020) Understanding convolutional neural networks with information theory: an initial exploration. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2020.2968509
Article Google Scholar
Yu Y, Chan KHR, You C, Song C, Ma Y (2020) Learning diverse and discriminative representations via the principle of maximal coding rate reduction. Preprint arXiv:2006.08558

Download references

Acknowledgements

Our research is supported by the Tianjin Natural Science Foundation of China (20JCYBJC00500), the Science and Technology Development Fund of Tianjin Education Commission for Higher Education (2018KJ217).

Author information

Authors and Affiliations

Department of Computer Science and Technology, School of Computer Science and Technology, Tiangong University, Tianjin, 300387, China
Junjie Li & Ding Liu

Authors

Junjie Li
View author publications
You can also search for this author in PubMed Google Scholar
Ding Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ding Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

In order to further verify our conclusions, we conduct additional experiments on the CIFAR-10 dataset [17]. This dataset consists of 60,000 \(32\times 32\) colour images in 10 classes, with 6000 images per class. There are 50,000 training images and 10,000 test images.

In this experiment, the whole 50,000 training images and 10,000 test images are selected as our training dataset and test dataset respectively, which is the only different setting from Experiments and discussion section. Furthermore, because of the arithmetic of computing mutual information, we choose to average the image of three channels and turn it into a signal channel as input data.

The Figs. 9 and 10 show the MI with different widths and depths on training data respectively. Figure 11 shows the MI with pooling layer on test data. These results offer more proof about the IB theory.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Liu, D. Information Bottleneck Theory on Convolutional Neural Networks. Neural Process Lett 53, 1385–1400 (2021). https://doi.org/10.1007/s11063-021-10445-6

Download citation

Accepted: 02 February 2021
Published: 18 February 2021
Issue Date: April 2021
DOI: https://doi.org/10.1007/s11063-021-10445-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Information Bottleneck Theory on Convolutional Neural Networks

Abstract

Access this article

Similar content being viewed by others

Keras Convolutional Neural Network Model in Sequential Build of 4 and 6 Layer Architectures

A review of convolutional neural network architectures and their optimizations

Lightweight residual densely connected convolutional neural network

Notes

References

Acknowledgements