Skip to main content
Log in

Interpreting rate-distortion of variational autoencoder and using model uncertainty for anomaly detection

  • S702: LION14
  • Published:
Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Abstract

Building a scalable machine learning system for unsupervised anomaly detection via representation learning is highly desirable. One of the prevalent methods is using a reconstruction error of variational autoencoder (VAE) by maximizing the evidence lower bound. We revisit VAE from the perspective of information theory to provide some theoretical foundations on using the reconstruction error and finally arrive at a simpler yet effective model for anomaly detection. In addition, to enhance the effectiveness of detecting anomalies, we incorporate a practical model uncertainty measure into the anomaly score. We show empirically the competitive performance of our approach on benchmark data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. An, J., Cho, S.: Variational autoencoder based anomaly detection using reconstruction probability. Special Lect. IE 2(1) (2015)

  2. Ruff, L., Vandermeulen, R., Goernitz, N., Deecke, L., Siddiqui, S.A., Binder, A., Müller, E., Kloft, M.: Deep one-class classification. In: International conference on machine learning, pp. 4393–4402 (2018)

  3. Sakurada, M., Yairi, T.: Anomaly detection using autoencoders with nonlinear dimensionality reduction. In: Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, pp. 4–11 (2014)

  4. Marchi, E., Vesperini, F., Eyben, F., Squartini, S., Schuller, B.: A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 1996–2000. IEEE (2015)

  5. Zhou, C., Paffenroth, R.C.: Anomaly detection with robust deep autoencoders. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 665–674 (2017)

  6. Dau, H.A., Ciesielski, V., Song, A.: Anomaly detection using replicator neural networks trained on examples of one class. In: Asia-Pacific Conference on Simulated Evolution and Learning, pp. 311–322. Springer (2014)

  7. Perera, P., Nallapati, R., Xiang, B.: OCGAN: One-class novelty detection using GANs with constrained latent representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2898–2906 (2019)

  8. Chalapathy, R., Menon, A.K., Chawla, S.: Anomaly detection using one-class neural networks. arXiv:1802.06360 (2018)

  9. Pidhorskyi, S., Almohsen, R., Doretto, G.: Generative probabilistic novelty detection with adversarial autoencoders. In: Advances in neural information processing systems, pp. 6822–6833 (2018)

  10. Sabokrou, M., Fayyaz, M., Fathy, M., Klette, R.: Deep-cascade: Cascading 3d deep neural networks for fast anomaly detection and localization in crowded scenes. IEEE Trans. Image Process. 26(4), 1992–2004 (2017)

    Article  MathSciNet  Google Scholar 

  11. Hawkins, S., He, H., Williams, G., Baxter, R.: Outlier detection using replicator neural networks. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 170–180. Springer (2002)

  12. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv:1312.6114 (2013)

  13. Berger, T.: Rate-distortion theory. Wiley Encyclopedia of Telecommunications (2003)

  14. Alemi, A.A., Poole, B., Fischer, I., Dillon, J.V., Saurous, R.A., Murphy, K.: Fixing a broken ELBO. arXiv:1711.00464 (2017)

  15. Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. arXiv:1506.02142 (2015)

  16. Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? In: Advances in neural information processing systems, pp. 5574–5584 (2017)

  17. Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders. arXiv:1511.05644 (2015)

  18. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P-A: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)

    MathSciNet  MATH  Google Scholar 

  19. Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: International conference on artificial neural networks, pp. 52–59. Springer (2011)

  20. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680 (2014)

  21. Schlegl, T., Seeböck, P., Waldstein, S.M., Schmidt-Erfurth, U., Langs, G.: Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In: International conference on information processing in medical imaging, pp. 146–157. Springer (2017)

  22. Tax, DavidMJ, Duin, RobertPW: Support vector data description. Mach.s Learn. 54(1), 45–66 (2004)

    Article  Google Scholar 

  23. Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)

    Article  Google Scholar 

  24. Ruff, L., Vandermeulen, R.A., Görnitz, N., Binder, A., Müller, E., Müller, K.-R., Kloft, M.: Deep semi-supervised anomaly detection. arXiv:1906.02694 (2019)

  25. Golan, I., El-Yaniv, R.: Deep anomaly detection using geometric transformations. In: Advances in Neural Information Processing Systems, pp. 9758–9769 (2018)

  26. Hendrycks, D., Mazeika, M., Kadavath, S., Song, D.: Using self-supervised learning can improve model robustness and uncertainty. In: Advances in Neural Information Processing Systems, pp. 15663–15674 (2019)

  27. Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv:1610.02136(2016)

  28. Nalisnick, E., Matsukawa, A., Teh, Y.W., Gorur, D., Lakshminarayanan, B.: Do deep generative models know what they don’t know? arXiv:1810.09136 (2018)

  29. Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A., et al.: Conditional image generation with pixelcnn decoders. In: Advances in neural information processing systems, pp. 4790–4798 (2016)

  30. Pang, G., Shen, C., Cao, L., Hengel, A.v.d.: Deep learning for anomaly detection: A review. arXiv:2007.02500 (2020)

  31. Ruff, L., Kauffmann, J.R., Vandermeulen, R.A., Montavon, G., Samek, W., Kloft, M., Dietterich, T.G., Müller, K.-R.: A unifying review of deep and shallow anomaly detection. arXiv:2009.11732 (2020)

  32. Tishby, N., Pereira, F.C., Bialek, W.: The information bottleneck method. arXiv preprint physics/0004057 (2000)

  33. Tishby, N., Zaslavsky, N.: Deep learning and the information bottleneck principle. In: 2015 IEEE Information Theory Workshop (ITW), pp. 1–5. IEEE (2015)

  34. Brekelmans, R., Moyer, D., Galstyan, A., VerSteeg, G.: Exact rate-distortion in autoencoders via echo noise. In: Advances in Neural Information Processing Systems, pp. 3884–3895 (2019)

  35. Blau, Y., Michaeli, T.: Rethinking lossy compression: The rate-distortion-perception tradeoff. arXiv:1901.07821 (2019)

  36. Lastras, L.A.: Information theoretic lower bounds on negative log likelihood. arXiv:1904.06395 (2019)

  37. Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., Lerchner, A.: β-VAE: Learning basic visual concepts with a constrained variational framework. Iclr 2(5), 6 (2017)

    Google Scholar 

  38. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv:1412.6980 (2014)

  39. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)

    MathSciNet  MATH  Google Scholar 

  40. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. Siam Rev. 60(2), 223–311 (2018)

    Article  MathSciNet  Google Scholar 

  41. Park, S., Jung, S.H., Pardalos, P.M.: Combining stochastic adaptive cubic regularization with negative curvature for nonconvex optimization. J. Optim. Theory Appl. 184(3), 953–971 (2020)

    Article  MathSciNet  Google Scholar 

  42. Hoffman, M.D., Johnson, M.J.: ELBO surgery: yet another way to carve up the variational evidence lower bound. In: Workshop in Advances in Approximate Bayesian Inference, NIPS, vol. 1, pp. 2 (2016)

  43. Tomczak, J.M., Welling, M.: VAE with a VampPrior. arXiv:1705.07120 (2017)

  44. Kim, H., Mnih, A.: Disentangling by factorising. arXiv:1802.05983 (2018)

  45. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  46. Becker, R.A.: The variance drain and Jensen’s inequality. CAEPR Working Paper No. 2012-004 (2012)

  47. Liao, J.G., Berg, A.: Sharpening Jensen’s inequality. Am. Stat. 73(3), 278–281 (2019)

    Article  MathSciNet  Google Scholar 

  48. LeCun, Y.: The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/ (1998)

  49. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747 (2017)

  50. Krizhevsky, A., Nair, V., Hinton, G.: The CIFAR-10 dataset. online: http://www.cs.toronto.edu/kriz/cifar.html. vol. 55 (2014)

  51. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)

Download references

Acknowledgements

Panos M. Pardalos was supported by a Humboldt Research Award (Germany).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seonho Park.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Park, S., Adosoglou, G. & Pardalos, P.M. Interpreting rate-distortion of variational autoencoder and using model uncertainty for anomaly detection. Ann Math Artif Intell 90, 735–752 (2022). https://doi.org/10.1007/s10472-021-09728-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10472-021-09728-4

Keywords

Navigation