Abstract
Building a scalable machine learning system for unsupervised anomaly detection via representation learning is highly desirable. One of the prevalent methods is using a reconstruction error of variational autoencoder (VAE) by maximizing the evidence lower bound. We revisit VAE from the perspective of information theory to provide some theoretical foundations on using the reconstruction error and finally arrive at a simpler yet effective model for anomaly detection. In addition, to enhance the effectiveness of detecting anomalies, we incorporate a practical model uncertainty measure into the anomaly score. We show empirically the competitive performance of our approach on benchmark data sets.
Similar content being viewed by others
References
An, J., Cho, S.: Variational autoencoder based anomaly detection using reconstruction probability. Special Lect. IE 2(1) (2015)
Ruff, L., Vandermeulen, R., Goernitz, N., Deecke, L., Siddiqui, S.A., Binder, A., Müller, E., Kloft, M.: Deep one-class classification. In: International conference on machine learning, pp. 4393–4402 (2018)
Sakurada, M., Yairi, T.: Anomaly detection using autoencoders with nonlinear dimensionality reduction. In: Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, pp. 4–11 (2014)
Marchi, E., Vesperini, F., Eyben, F., Squartini, S., Schuller, B.: A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 1996–2000. IEEE (2015)
Zhou, C., Paffenroth, R.C.: Anomaly detection with robust deep autoencoders. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 665–674 (2017)
Dau, H.A., Ciesielski, V., Song, A.: Anomaly detection using replicator neural networks trained on examples of one class. In: Asia-Pacific Conference on Simulated Evolution and Learning, pp. 311–322. Springer (2014)
Perera, P., Nallapati, R., Xiang, B.: OCGAN: One-class novelty detection using GANs with constrained latent representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2898–2906 (2019)
Chalapathy, R., Menon, A.K., Chawla, S.: Anomaly detection using one-class neural networks. arXiv:1802.06360 (2018)
Pidhorskyi, S., Almohsen, R., Doretto, G.: Generative probabilistic novelty detection with adversarial autoencoders. In: Advances in neural information processing systems, pp. 6822–6833 (2018)
Sabokrou, M., Fayyaz, M., Fathy, M., Klette, R.: Deep-cascade: Cascading 3d deep neural networks for fast anomaly detection and localization in crowded scenes. IEEE Trans. Image Process. 26(4), 1992–2004 (2017)
Hawkins, S., He, H., Williams, G., Baxter, R.: Outlier detection using replicator neural networks. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 170–180. Springer (2002)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv:1312.6114 (2013)
Berger, T.: Rate-distortion theory. Wiley Encyclopedia of Telecommunications (2003)
Alemi, A.A., Poole, B., Fischer, I., Dillon, J.V., Saurous, R.A., Murphy, K.: Fixing a broken ELBO. arXiv:1711.00464 (2017)
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. arXiv:1506.02142 (2015)
Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? In: Advances in neural information processing systems, pp. 5574–5584 (2017)
Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders. arXiv:1511.05644 (2015)
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P-A: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)
Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: International conference on artificial neural networks, pp. 52–59. Springer (2011)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680 (2014)
Schlegl, T., Seeböck, P., Waldstein, S.M., Schmidt-Erfurth, U., Langs, G.: Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In: International conference on information processing in medical imaging, pp. 146–157. Springer (2017)
Tax, DavidMJ, Duin, RobertPW: Support vector data description. Mach.s Learn. 54(1), 45–66 (2004)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Ruff, L., Vandermeulen, R.A., Görnitz, N., Binder, A., Müller, E., Müller, K.-R., Kloft, M.: Deep semi-supervised anomaly detection. arXiv:1906.02694 (2019)
Golan, I., El-Yaniv, R.: Deep anomaly detection using geometric transformations. In: Advances in Neural Information Processing Systems, pp. 9758–9769 (2018)
Hendrycks, D., Mazeika, M., Kadavath, S., Song, D.: Using self-supervised learning can improve model robustness and uncertainty. In: Advances in Neural Information Processing Systems, pp. 15663–15674 (2019)
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv:1610.02136(2016)
Nalisnick, E., Matsukawa, A., Teh, Y.W., Gorur, D., Lakshminarayanan, B.: Do deep generative models know what they don’t know? arXiv:1810.09136 (2018)
Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A., et al.: Conditional image generation with pixelcnn decoders. In: Advances in neural information processing systems, pp. 4790–4798 (2016)
Pang, G., Shen, C., Cao, L., Hengel, A.v.d.: Deep learning for anomaly detection: A review. arXiv:2007.02500 (2020)
Ruff, L., Kauffmann, J.R., Vandermeulen, R.A., Montavon, G., Samek, W., Kloft, M., Dietterich, T.G., Müller, K.-R.: A unifying review of deep and shallow anomaly detection. arXiv:2009.11732 (2020)
Tishby, N., Pereira, F.C., Bialek, W.: The information bottleneck method. arXiv preprint physics/0004057 (2000)
Tishby, N., Zaslavsky, N.: Deep learning and the information bottleneck principle. In: 2015 IEEE Information Theory Workshop (ITW), pp. 1–5. IEEE (2015)
Brekelmans, R., Moyer, D., Galstyan, A., VerSteeg, G.: Exact rate-distortion in autoencoders via echo noise. In: Advances in Neural Information Processing Systems, pp. 3884–3895 (2019)
Blau, Y., Michaeli, T.: Rethinking lossy compression: The rate-distortion-perception tradeoff. arXiv:1901.07821 (2019)
Lastras, L.A.: Information theoretic lower bounds on negative log likelihood. arXiv:1904.06395 (2019)
Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., Lerchner, A.: β-VAE: Learning basic visual concepts with a constrained variational framework. Iclr 2(5), 6 (2017)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv:1412.6980 (2014)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. Siam Rev. 60(2), 223–311 (2018)
Park, S., Jung, S.H., Pardalos, P.M.: Combining stochastic adaptive cubic regularization with negative curvature for nonconvex optimization. J. Optim. Theory Appl. 184(3), 953–971 (2020)
Hoffman, M.D., Johnson, M.J.: ELBO surgery: yet another way to carve up the variational evidence lower bound. In: Workshop in Advances in Approximate Bayesian Inference, NIPS, vol. 1, pp. 2 (2016)
Tomczak, J.M., Welling, M.: VAE with a VampPrior. arXiv:1705.07120 (2017)
Kim, H., Mnih, A.: Disentangling by factorising. arXiv:1802.05983 (2018)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Becker, R.A.: The variance drain and Jensen’s inequality. CAEPR Working Paper No. 2012-004 (2012)
Liao, J.G., Berg, A.: Sharpening Jensen’s inequality. Am. Stat. 73(3), 278–281 (2019)
LeCun, Y.: The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/ (1998)
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747 (2017)
Krizhevsky, A., Nair, V., Hinton, G.: The CIFAR-10 dataset. online: http://www.cs.toronto.edu/kriz/cifar.html. vol. 55 (2014)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)
Acknowledgements
Panos M. Pardalos was supported by a Humboldt Research Award (Germany).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Park, S., Adosoglou, G. & Pardalos, P.M. Interpreting rate-distortion of variational autoencoder and using model uncertainty for anomaly detection. Ann Math Artif Intell 90, 735–752 (2022). https://doi.org/10.1007/s10472-021-09728-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-021-09728-4