Information Dropout: Learning Optimal Representations Through Noisy Computation,IEEE Transactions on Pattern Analysis and Machine Intelligence

当前位置： X-MOL 学术 › IEEE Trans. Pattern Anal. Mach. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Information Dropout: Learning Optimal Representations Through Noisy Computation
IEEE Transactions on Pattern Analysis and Machine Intelligence ( IF 20.8 ) Pub Date : 1-10-2018 , DOI: 10.1109/tpami.2017.2784440
Alessandro Achille , Stefano Soatto

The cross-entropy loss commonly used in deep learning is closely related to the defining properties of optimal representations, but does not enforce some of the key properties. We show that this can be solved by adding a regularization term, which is in turn related to injecting multiplicative noise in the activations of a Deep Neural Network, a special case of which is the common practice of dropout. We show that our regularized loss function can be efficiently minimized using Information Dropout, a generalization of dropout rooted in information theoretic principles that automatically adapts to the data and can better exploit architectures of limited capacity. When the task is the reconstruction of the input, we show that our loss function yields a Variational Autoencoder as a special case, thus providing a link between representation learning, information theory and variational inference. Finally, we prove that we can promote the creation of optimal disentangled representations simply by enforcing a factorized prior, a fact that has been observed empirically in recent work. Our experiments validate the theoretical intuitions behind our method, and we find that Information Dropout achieves a comparable or better generalization performance than binary dropout, especially on smaller models, since it can automatically adapt the noise to the structure of the network, as well as to the test sample.

中文翻译：

信息丢失：通过噪声计算学习最佳表示

深度学习中常用的交叉熵损失与最优表示的定义属性密切相关，但并不强制执行某些关键属性。我们证明，这可以通过添加正则化项来解决，而正则化项又与在深度神经网络的激活中注入乘性噪声有关，其中一种特殊情况是 dropout 的常见做法。我们证明，我们的正则化损失函数可以使用信息丢失（Information Dropout）有效地最小化，信息丢失是植根于信息论原理的丢失的概括，可以自动适应数据，并且可以更好地利用有限容量的架构。当任务是输入的重建时，我们表明我们的损失函数产生变分自编码器作为特例，从而提供表示学习、信息论和变分推理之间的联系。最后，我们证明，我们可以通过强制执行因式分解先验来促进最佳解缠结表示的创建，这是在最近的工作中凭经验观察到的事实。我们的实验验证了我们方法背后的理论直觉，我们发现 Information Dropout 实现了与二元 dropout 相当或更好的泛化性能，特别是在较小的模型上，因为它可以自动使噪声适应网络结构，以及测试样本。

更新日期：2024-08-22

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11