Anomaly and Novelty detection for robust semi-supervised learning

Cappozzo, Andrea; Greselin, Francesca; Murphy, Thomas Brendan

doi:10.1007/s11222-020-09959-1

Anomaly and Novelty detection for robust semi-supervised learning

Published: 30 June 2020

Volume 30, pages 1545–1571, (2020)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

772 Accesses
13 Citations
1 Altmetric
Explore all metrics

Abstract

Three important issues are often encountered in Supervised and Semi-Supervised Classification: class memberships are unreliable for some training units (label noise), a proportion of observations might depart from the main structure of the data (outliers) and new groups in the test set may have not been encountered earlier in the learning phase (unobserved classes). The present work introduces a robust and adaptive Discriminant Analysis rule, capable of handling situations in which one or more of the aforementioned problems occur. Two EM-based classifiers are proposed: the first one that jointly exploits the training and test sets (transductive approach), and the second one that expands the parameter estimation using the test set, to complete the group structure learned from the training set (inductive approach). Experiments on synthetic and real data, artificially adulterated, are provided to underline the benefits of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on semi-supervised learning

Article Open access 15 November 2019

A survey on ensemble learning

Article 30 August 2019

References

Aitken, A.C.: A series formula for the roots of algebraic and transcendental equations. Proc. R. Soc. Edinb. 45(01), 14–22 (1926). https://doi.org/10.1017/S0370164600024871
Article MATH Google Scholar
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974). https://doi.org/10.1109/TAC.1974.1100705
Article MathSciNet MATH Google Scholar
Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3), 803 (1993). https://doi.org/10.2307/2532201
Article MathSciNet MATH Google Scholar
Bensmail, H., Celeux, G.: Regularized Gaussian discriminant analysis through eigenvalue decomposition. J. Am. Stat. Assoc. 91(436), 1743–1748 (1996). https://doi.org/10.1080/01621459.1996.10476746
Article MathSciNet MATH Google Scholar
Biernacki, C.: Degeneracy in the maximum likelihood estimation of univariate Gaussian mixtures for grouped data and behaviour of the EM algorithm. Scand. J. Stat. 34(3), 569–586 (2007). https://doi.org/10.1111/j.1467-9469.2006.00553.x
Article MathSciNet MATH Google Scholar
Böhning, D., Dietz, E., Schaub, R., Schlattmann, P., Lindsay, B.G.: The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann. Inst. Stat. Math. 46(2), 373–388 (1994). https://doi.org/10.1007/BF01720593
Article MATH Google Scholar
Bokulich, N.A., Thorngate, J.H., Richardson, P.M., Mills, D.A.: Microbial biogeography of wine grapes is conditioned by cultivar, vintage, and climate. Proc. National Acad. Sci. 111(1), E139–E148 (2014). https://doi.org/10.1073/pnas.1317377110
Article Google Scholar
Bokulich, N.A., Collins, T., Masarweh, C., Allen, G., Heymann, H., Ebeler, S.E., Mills, D.A.: Fermentation behavior suggest microbial contribution to regional. MBio 7(3), 1–12 (2016). https://doi.org/10.1128/mBio.00631-16.Editor
Article Google Scholar
Bolyen, E., Rideout, J.R., Dillon, M.R., Bokulich, N.A., Abnet, C.C., Al-Ghalith, G.A., Alexander, H., Alm, E.J., Arumugam, M., Asnicar, F., Bai, Y., Bisanz, J.E., Bittinger, K., Brejnrod, A., Brislawn, C.J., Brown, C.T., Callahan, B.J., Caraballo-Rodríguez, A.M., Chase, J., Cope, E.K., Da Silva, R., Diener, C., Dorrestein, P.C., Douglas, G.M., Durall, D.M., Duvallet, C., Edwardson, C.F., Ernst, M., Estaki, M., Fouquier, J., Gauglitz, J.M., Gibbons, S.M., Gibson, D.L., Gonzalez, A., Gorlick, K., Guo, J., Hillmann, B., Holmes, S., Holste, H., Huttenhower, C., Huttley, G.A., Janssen, S., Jarmusch, A.K., Jiang, L., Kaehler, B.D., Kang, K.B., Keefe, C.R., Keim, P., Kelley, S.T., Knights, D., Koester, I., Kosciolek, T., Kreps, J., Langille, M.G., Lee, J., Ley, R., Liu, Y.X., Loftfield, E., Lozupone, C., Maher, M., Marotz, C., Martin, B.D., McDonald, D., McIver, L.J., Melnik, A.V., Metcalf, J.L., Morgan, S.C., Morton, J.T., Naimey, A.T., Navas-Molina, J.A., Nothias, L.F., Orchanian, S.B., Pearson, T., Peoples, S.L., Petras, D., Preuss, M.L., Pruesse, E., Rasmussen, L.B., Rivers, A., Robeson, M.S., Rosenthal, P., Segata, N., Shaffer, M., Shiffer, A., Sinha, R., Song, S.J., Spear, J.R., Swafford, A.D., Thompson, L.R., Torres, P.J., Trinh, P., Tripathi, A., Turnbaugh, P.J., Ul-Hasan, S., van der Hooft, J.J., Vargas, F., Vázquez-Baeza, Y., Vogtmann, E., von Hippel, M., Walters, W., Wan, Y., Wang, M., Warren, J., Weber, K.C., Williamson, C.H., Willis, A.D., Xu, Z.Z., Zaneveld, J.R., Zhang, Y., Zhu, Q., Knight, R., Caporaso, J.G.: Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37(8), 852–857 (2019). https://doi.org/10.1038/s41587-019-0209-9
Article Google Scholar
Bouveyron, C.: Adaptive mixture discriminant analysis for supervised learning with unobserved classes. J. Classif. 31(1), 49–84 (2014). https://doi.org/10.1007/s00357-014-9147-x
Article MathSciNet MATH Google Scholar
Bouveyron, C., Girard, S.: Robust supervised classification with mixture models: learning from data with uncertain labels. Pattern Recognit. 42(11), 2649–2658 (2009). https://doi.org/10.1016/j.patcog.2009.03.027
Article MATH Google Scholar
Calle, M.L.: Statistical Analysis of Metagenomics Data. Genom. Inform. 17(1), e6 (2019). https://doi.org/10.5808/GI.2019.17.1.e6
Article Google Scholar
Cappozzo, A., Greselin, F., Murphy, T.B.: A robust approach to model-based classification based on trimming and constraints. Adv. Data Anal. Classif. (2019). https://doi.org/10.1007/s11634-019-00371-w
Article MATH Google Scholar
Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recognit. 28(5), 781–793 (1995). https://doi.org/10.1016/0031-3203(94)00125-6
Article Google Scholar
Cerioli, A., García-Escudero, L.A., Mayo-Iscar, A., Riani, M.: Finding the number of normal groups in model-based clustering via constrained likelihoods. J. Comput. Graph. Stat. 27(2), 404–416 (2018). https://doi.org/10.1080/10618600.2017.1390469
Article MathSciNet Google Scholar
Cerioli, A., Farcomeni, A., Riani, M.: Wild adaptive trimming for robust estimation and cluster analysis. Scand. J. Stat. 46(1), 235–256 (2019). https://doi.org/10.1111/sjos.12349
Article MathSciNet MATH Google Scholar
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection. ACM Comput. Surv. 41(3), 1–58 (2009). https://doi.org/10.1145/1541880.1541882
Article Google Scholar
Chiquet, J., Mariadassou, M., Robin, S.: Variational inference for probabilistic Poisson PCA. Ann. Appl. Stat. 12(4), 2674–2698 (2018). https://doi.org/10.1214/18-AOAS1177
Article MathSciNet MATH Google Scholar
Coretto, P., Hennig, C.: Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for Robust Gaussian clustering. J. Am. Stat. Assoc. 111(516), 1648–1659 (2016). https://doi.org/10.1080/01621459.2015.1100996
Article MathSciNet Google Scholar
Day, N.E.: Estimating the components of a mixture of normal distributions. Biometrika 56(3), 463–474 (1969)
Article MathSciNet Google Scholar
Dean, N., Murphy, T.B., Downey, G.: Using unlabelled data to update classification rules with applications in food authenticity studies. J. R. Stat. Soc. Ser. C Appl. Stat. 55(1), 1–14 (2006). https://doi.org/10.1111/j.1467-9876.2005.00526.x
Article MathSciNet MATH Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. 39(1), 1–38 (1977). https://doi.org/10.2307/2984875
Article MathSciNet MATH Google Scholar
Evangelista, P.F., Embrechts, M.J., Szymanski, B.K.: Taming the curse of dimensionality in kernels and novelty detection. Adv. Soft Comput. 34, 425–438 (2006). https://doi.org/10.1007/3-540-31662-0_33
Article Google Scholar
Fop, M., Mattei, P.A., Murphy, T.B., Bouveyron, C.: (2018) Unobserved classes and extra variables in high-dimensional discriminant analysis. In: CASI 2018 Conference proceeding, pp. 70–72
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97(458), 611–631 (2002). https://doi.org/10.1198/016214502760047131
Article MathSciNet MATH Google Scholar
Gallegos, M.T., Ritter, G.: Using combinatorial optimization in model-based trimmed clustering with cardinality constraints. Comput. Stat. Data Anal. 54(3), 637–654 (2010). https://doi.org/10.1016/j.csda.2009.08.023
Article MathSciNet MATH Google Scholar
García-Escudero, L., Gordaliza, A., Mayo-Iscar, A., San Martín, R.: Robust clusterwise linear regression through trimming. Comput. Stat. Data Anal. 54(12), 3057–3069 (2010). https://doi.org/10.1016/j.csda.2009.07.002
Article MathSciNet MATH Google Scholar
García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: A general trimming approach to robust cluster analysis. Ann. Stat. 36(3), 1324–1345 (2008). https://doi.org/10.1214/07-AOS515
Article MathSciNet MATH Google Scholar
García-Escudero, L.A., Gordaliza, A., Mayo-Iscar, A.: A constrained robust proposal for mixture modeling avoiding spurious solutions. Adv. Data Anal. Classif. 8(1), 27–43 (2014). https://doi.org/10.1007/s11634-013-0153-3
Article MathSciNet MATH Google Scholar
García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: Avoiding spurious local maximizers in mixture modeling. Stat. Comput. 25(3), 619–633 (2015). https://doi.org/10.1007/s11222-014-9455-3
Article MathSciNet MATH Google Scholar
García-Escudero, L.A., Gordaliza, A., Greselin, F., Ingrassia, S., Mayo-Iscar, A.: The joint role of trimming and constraints in robust estimation for mixtures of Gaussian factor analyzers. Comput. Stat. Data Anal. 99, 131–147 (2016). https://doi.org/10.1016/j.csda.2016.01.005
Article MathSciNet MATH Google Scholar
García-Escudero, L.A., Gordaliza, A., Greselin, F., Ingrassia, S., Mayo-Iscar, A.: Robust estimation of mixtures of regressions with random covariates, via trimming and constraints. Stat. Comput. 27(2), 377–402 (2017). https://doi.org/10.1007/s11222-016-9628-3
Article MathSciNet MATH Google Scholar
García-Escudero, L.A., Gordaliza, A., Greselin, F., Ingrassia, S., Mayo-Iscar, A.: Eigenvalues and constraints in mixture modeling: geometric and computational issues. Adv. Data Anal. Classif. 12(2), 203–233 (2018a). https://doi.org/10.1007/s11634-017-0293-y
Article MathSciNet MATH Google Scholar
García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: Comments on “The power of monitoring: how to make the most of a contaminated multivariate sample”. Stat. Methods Appl. 27(4), 661–666 (2018b). https://doi.org/10.1007/s10260-018-00436-8
Article MathSciNet MATH Google Scholar
Gordaliza, A.: Best approximations to random variables based on trimming procedures. J. Approx. Theory 64(2), 162–180 (1991). https://doi.org/10.1016/0021-9045(91)90072-I
Article MathSciNet MATH Google Scholar
Greco, L., Agostinelli, C.: Weighted likelihood mixture modeling and model-based clustering. Stat. Comput. (2019). https://doi.org/10.1007/s11222-019-09881-1
Article MATH Google Scholar
Greselin, F., Punzo, A.: Closed likelihood ratio testing procedures to assess similarity of covariance matrices. Am. Stat. 67(3), 117–128 (2013). https://doi.org/10.1080/00031305.2013.791643
Article MathSciNet Google Scholar
Hawkins, D.M., McLachlan, G.J.: High-breakdown linear discriminant analysis. J. Am. Stat. Assoc. 92(437), 136 (1997). https://doi.org/10.2307/2291457
Article MathSciNet MATH Google Scholar
Hawkins, D.M., Liu, L., Young, S.S.: (2001) Robust singular value decomposition. National Institute of Statistical Science Technical Report 122
Hickey, R.J.: Noise modelling and evaluating learning from examples. Artif. Intell. 82(1–2), 157–179 (1996). https://doi.org/10.1016/0004-3702(94)00094-8
Article MathSciNet Google Scholar
Hubert, M., Rousseeuw, P.J., Vanden Branden, K.: ROBPCA: a new approach to robust principal component analysis. Technometrics 47(1), 64–79 (2005). https://doi.org/10.1198/004017004000000563
Article MathSciNet Google Scholar
Ingrassia, S.: A likelihood-based constrained algorithm for multivariate normal mixture models. Stat. Methods Appl. 13(2), 151–166 (2004). https://doi.org/10.1007/s10260-004-0092-4
Article MathSciNet Google Scholar
Ingrassia, S., Rocci, R.: Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints. Comput. Stat. Data Anal. 55(4), 1715–1725 (2011). https://doi.org/10.1016/j.csda.2010.10.026
Article MathSciNet MATH Google Scholar
Kasabov, N., Pang, S.: (2003) Transductive support vector machines and applications in bioinformatics for promoter recognition. In: International Conference on Neural Networks and Signal Processing, 2003. Proceedings of the 2003, IEEE, vol 1, pp 1–6. https://doi.org/10.1109/ICNNSP.2003.1279199, http://ieeexplore.ieee.org/document/1279199/
Li, M., Xiang, S., Yao, W.: Robust estimation of the number of components for mixtures of linear regression models. Comput. Stat. 31(4), 1539–1555 (2016). https://doi.org/10.1007/s00180-015-0610-x
Article MathSciNet MATH Google Scholar
Markou, M., Singh, S.: Novelty detection: a review-part 1: statistical approaches. Signal Process. 83(12), 2481–2497 (2003). https://doi.org/10.1016/j.sigpro.2003.07.018
Article MATH Google Scholar
Mclachlan, G.J., Rathnayake, S.: On the number of components in a Gaussian mixture model. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 4(5), 341–355 (2014). https://doi.org/10.1002/widm.1135
Article Google Scholar
McNicholas, P., Murphy, T., McDaid, A., Frost, D.: Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput. Stat. Data Anal. 54(3), 711–723 (2010). https://doi.org/10.1016/j.csda.2009.02.011
Article MathSciNet MATH Google Scholar
Mezzasalma, V., Sandionigi, A., Bruni, I., Bruno, A., Lovicu, G., Casiraghi, M., Labra, M.: Grape microbiome as a reliable and persistent signature of field origin and environmental conditions in Cannonau wine production. PLOS ONE 12(9), e0184615 (2017). https://doi.org/10.1371/journal.pone.0184615
Article Google Scholar
Mezzasalma, V., Sandionigi, A., Guzzetti, L., Galimberti, A., Grando, M.S., Tardaguila, J., Labra, M.: Geographical and cultivar features differentiate grape microbiota in northern Italy and Spain Vineyards. Front. Microbiol. 9(MAY), 1–13 (2018). https://doi.org/10.3389/fmicb.2018.00946
Article Google Scholar
Mitchell, T.M.: Machine Learning, 1st edn. McGraw-Hill Inc, New York (1997)
MATH Google Scholar
Neykov, N.M., Filzmoser, P., Dimova, R.I., Neytchev, P.N.: Robust fitting of mixtures using the trimmed likelihood estimator. Comput. Stat Data Anal. 52(1), 299–308 (2007). https://doi.org/10.1016/j.csda.2006.12.024
Article MathSciNet MATH Google Scholar
Nguyen, M.H., de la Torre, F.: Optimal feature selection for support vector machines. Pattern Recognit. 43(3), 584–591 (2010). https://doi.org/10.1016/j.patcog.2009.09.003
Article MATH Google Scholar
Peel, D., McLachlan, G.J.: Robust mixture modelling using the t distribution. Stat. Comput. 10(4), 339–348 (2000). https://doi.org/10.1023/A:1008981510081
Article Google Scholar
Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D.: Dataset Shift in Machine Learning. The MIT Press, Cambridge (2009)
Google Scholar
Team, R.C.: (2018) R: A Language and Environment for Statistical Computing. https://www.r-project.org/
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846 (1971). https://doi.org/10.2307/2284239
Article Google Scholar
Rousseeuw, P.J., Driessen, K.V.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3), 212–223 (1999). https://doi.org/10.1080/00401706.1999.10485670
Article Google Scholar
Schölkopf, B., Williamson, R., Smola, A., Shawe-Taylor, J., Platt, J.: Support vector method for novelty detection. Adv. Neural Inf. Process. Syst. 12, 582–588 (2000)
Google Scholar
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978). https://doi.org/10.1214/aos/1176344136
Article MathSciNet MATH Google Scholar
Pang, S., Kasabov, N.: (2004) Inductive vs transductive inference, global vs local models: SVM, TSVM, and SVMT for gene expression classification problems. In: 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541), IEEE, vol 2, pp 1197–1202, https://doi.org/10.1109/IJCNN.2004.1380112, http://ieeexplore.ieee.org/document/1380112/
Tax, D.M.J., Duin, R.P.W.: Outlier detection using classifier instability. In: Amin, A., Dori, D., Pudil, P., Freeman, H. (eds.) Advances in Pattern Recognition, pp. 593–601. Springer, Berlin (1998)
Chapter Google Scholar
Todorov, V., Filzmoser, P.: An object-oriented framework for Robust multivariate analysis. J. Stat. Softw. 32(3), 1–47 (2009). https://doi.org/10.18637/jss.v032.i03
Vanden Branden, K., Hubert, M.: Robust classification in high dimensions based on the SIMCA Method. Chemom. Intell. Lab. Syst. 79(1–2), 10–21 (2005). https://doi.org/10.1016/j.chemolab.2005.03.002
Article Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory, vol. 3. Springer, New York (2000). https://doi.org/10.1007/978-1-4757-3264-1
Book MATH Google Scholar
Waldron, L.: Data and statistical methods to analyze the human microbiome. mSystems 3(2), 1–4 (2018). https://doi.org/10.1128/mSystems.00194-17
Article Google Scholar
Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study. Artif. Intell. Rev. 22(3), 177–210 (2004). https://doi.org/10.1007/s10462-004-0751-8
Article MATH Google Scholar

Download references

Acknowledgements

The authors are grateful to Anna Sandionigi, Lorenzo Guzzetti, Maurizio Casiraghi and Massimo Labra for fruitful discussions and domain-knowledge sharing for our Grapevine microbiome analyses for detection of provenances and varieties. In particular, authors thank Anna Sandionigi for her decisive help in performing the routines described in Sect. 5.2.2 and for her support throughout the must samples analysis. We also would like to thank the Editor, Associate Editor and Referees whose suggestions and comments enhanced the quality of the paper. Brendan Murphy’s work was supported by the Science Foundation Ireland Insight Research Centre (12/RC/2289_P2) and Vistamilk Research Centre (16/RC/3835).

Author information

Authors and Affiliations

Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Milan, Italy
Andrea Cappozzo & Francesca Greselin
School of Mathematics & Statistics and Insight Research Centre, University College Dublin, Dublin, Ireland
Thomas Brendan Murphy

Authors

Andrea Cappozzo
View author publications
You can also search for this author in PubMed Google Scholar
Francesca Greselin
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Brendan Murphy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrea Cappozzo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 586 KB)

Appendix A: Inductive covariance matrices estimation

This appendix provides closed form solutions for the estimation of the covariance matrices $\varvec{\varSigma }_h$, $ h=G+1,\ldots ,E$ of the unobserved classes via the inductive approach; our main reference here is the seminal paper of Celeux and Govaert (1995), where patterned covariance matrices were firstly defined and algorithms for their ML estimation were proposed. In the robust discovery phase only the parameters for the $H=E-G$ densities need to be estimated, according to the available patterned models, given the one considered in the Learning Phase (see Fig. 5). Denote with ${\varvec{W}}_h=\sum _{m=1}^{M^{*}} \varphi (\mathbf {y}^{*}_m){\hat{z}}^{*}_{mh}\left[ \left( \mathbf {y}^{*}_{m}-\hat{\varvec{\mu }}_{h}\right) \left( \mathbf {y}^{*}_{m}-\hat{\varvec{\mu }}_{h}\right) ^{\prime }\right] $ and let ${\varvec{W}}_h={\varvec{L}}_h\varvec{\varDelta }_h{\varvec{L}}^{'}_h$ be its eigenvalue decomposition. Further, consider $n_h=\sum _{m=1}^{M^{*}} \varphi (\mathbf {y}^{*}_m){\hat{z}}^{*}_{mh}$ for $h=G+1,\ldots , E$. Lastly, denote with a bar the estimates obtained in the robust learning phase for the G known groups: they are fixed and should not be changed. The formulae needed for the parameter updates are as follows:

VII model: $\varvec{\varSigma }_h=\lambda _h {\varvec{I}}$
$$\begin{aligned} {\hat{\lambda }}_h= \frac{\hbox {tr}({\varvec{W}}_h)}{p \, n_h}, \qquad h=G+1,\ldots ,E. \end{aligned}$$
VEI model: $\varvec{\varSigma }_h=\lambda _h \bar{{\varvec{A}}}$
$$\begin{aligned} {\hat{\lambda }}_h= \frac{\hbox {tr}({\varvec{W}}_h {\bar{A}}^{-1})}{p \, n_h}, \qquad h=G+1,\ldots ,E. \end{aligned}$$
EVI model: $\varvec{\varSigma }_h={\bar{\lambda }} {\varvec{A}}_h$
$$\begin{aligned} \hat{{\varvec{A}}}_h= \frac{\hbox {diag}({\varvec{W}}_h)}{|\hbox {diag}({\varvec{W}}_h)|^{1/p}}, \qquad h=G+1,\ldots ,E. \end{aligned}$$
VVI model: $\varvec{\varSigma }_h=\lambda _h {\varvec{A}}_h$
$$\begin{aligned} {\hat{\lambda }}_h= \frac{|\hbox {diag}({\varvec{W}}_h)|^{1/p}}{n_h}, \qquad h=G+1,\ldots ,E. \\ \hat{{\varvec{A}}}_h= \frac{\hbox {diag}({\varvec{W}}_h)}{|\hbox {diag}({\varvec{W}}_h)|^{1/p}}, \qquad h=G+1,\ldots ,E. \end{aligned}$$
VEE model: $\varvec{\varSigma }_h=\lambda _h \bar{{\varvec{D}}}\bar{{\varvec{A}}}\bar{{\varvec{D}}}^{'}$

Let $\bar{{\varvec{C}}}=\bar{{\varvec{D}}}\bar{{\varvec{A}}}\bar{{\varvec{D}}}^{'}$ and
$$\begin{aligned} {\hat{\lambda }}_h= \frac{\hbox {tr}({\varvec{W}}_h \bar{{\varvec{C}}}^{-1})}{p \, n_h}, \qquad h=G+1,\ldots ,E. \end{aligned}$$
EVE model: $\varvec{\varSigma }_h={\bar{\lambda }} \bar{{\varvec{D}}}{\varvec{A}}_h\bar{{\varvec{D}}}^{'}$
$$\begin{aligned} \hat{{\varvec{A}}}_h= \frac{\hbox {diag}(\bar{{\varvec{D}}}^{'}{\varvec{W}}_h\bar{{\varvec{D}}})}{|\hbox {diag}(\bar{{\varvec{D}}}^{'}{\varvec{W}}_h\bar{{\varvec{D}}})|^{1/p}}, \qquad h=G+1,\ldots ,E. \end{aligned}$$
EEV model: $\varvec{\varSigma }_h={\bar{\lambda }} {\varvec{D}}_h\bar{{\varvec{A}}}{\varvec{D}}_h^{'}$
$$\begin{aligned} \hat{{\varvec{D}}}_h= {\varvec{L}}_h, \qquad h=G+1,\ldots ,E. \end{aligned}$$
VVE model: $\varvec{\varSigma }_h=\lambda _h \bar{{\varvec{D}}}{\varvec{A}}_h\bar{{\varvec{D}}}{'}$

Let ${\varvec{R}}_h=\lambda _h {\varvec{A}}_h$
$$\begin{aligned} \hat{{\varvec{R}}}_h= \frac{1}{n_h}\hbox {diag}(\bar{{\varvec{D}}}^{'}{\varvec{W}}_h\bar{{\varvec{D}}}), \qquad h=G+1,\ldots ,E. \end{aligned}$$
and, subsequently
$$\begin{aligned} {\hat{\lambda }}_h= & {} |\hat{{\varvec{R}}}_h|^{1/p}, \qquad h=G+1,\ldots ,E.\\ \hat{{\varvec{A}}}_h= & {} \frac{1}{{\hat{\lambda }}_h}\hat{{\varvec{R}}}_h, \qquad h=G+1,\ldots ,E. \end{aligned}$$
VEV model: $\varvec{\varSigma }_h=\lambda _h {\varvec{D}}_h\bar{{\varvec{A}}}{\varvec{D}}_h^{'}$
$$\begin{aligned} \hat{{\varvec{D}}}_h= & {} {\varvec{L}}_h, \qquad h=G+1,\ldots ,E.\\ {\hat{\lambda }}_h= & {} \frac{\hbox {tr}({\varvec{W}}_h \hat{{\varvec{D}}}_h \bar{{\varvec{A}}}^{-1}\hat{{\varvec{D}}}_h{'})}{p \, n_h}, \qquad h=G+1,\ldots ,E. \end{aligned}$$
EVV model: $\varvec{\varSigma }_h={\bar{\lambda }} {\varvec{D}}_h{\varvec{A}}_h{\varvec{D}}_h^{'}$

Let ${\varvec{C}}_h= {\varvec{D}}_h{\varvec{A}}_h{\varvec{D}}_h^{'}$
$$\begin{aligned} \hat{{\varvec{C}}}_h= \frac{{\varvec{W}}_h}{|{\varvec{W}}_h|^{1/p}}, \qquad h=G+1,\ldots ,E. \end{aligned}$$
$\hat{{\varvec{A}}}_h$, $\hat{{\varvec{D}}_h}$ are obtained through the eigenvalue decomposition of $\hat{{\varvec{C}}}_h$, $h=G+1,\ldots ,E$.
VVV model: $\varvec{\varSigma }_h=\lambda _h {\varvec{D}}_h{\varvec{A}}_h{\varvec{D}}_h^{'}$
$$\begin{aligned} \hat{\varvec{\varSigma }}_h=\frac{1}{n_h}{\varvec{W}}_h \end{aligned}$$
$\hat{\lambda _h}$, $\hat{{\varvec{A}}}_h$, $\hat{{\varvec{D}}_h}$ are obtained through the eigenvalue decomposition of $\hat{\varvec{\varSigma }}_h$, $h=G+1,\ldots ,E$.

Lastly, it is easy to see that whenever the model in the discovery phase is EII, EEI or EEE, no extra parameters need to be estimated for the covariance matrices of the hidden groups.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cappozzo, A., Greselin, F. & Murphy, T.B. Anomaly and Novelty detection for robust semi-supervised learning. Stat Comput 30, 1545–1571 (2020). https://doi.org/10.1007/s11222-020-09959-1

Download citation

Received: 19 November 2019
Accepted: 16 June 2020
Published: 30 June 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11222-020-09959-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Anomaly and Novelty detection for robust semi-supervised learning

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on semi-supervised learning

A survey on ensemble learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (zip 586 KB)

Appendix A: Inductive covariance matrices estimation

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Anomaly and Novelty detection for robust semi-supervised learning

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on semi-supervised learning

A survey on ensemble learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (zip 586 KB)

Appendix A: Inductive covariance matrices estimation

Appendix A: Inductive covariance matrices estimation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation