An apparent paradox: a classifier based on a partially classified sample may have smaller expected error rate than that if the sample were completely classified

Ahfock, Daniel; McLachlan, Geoffrey J.

doi:10.1007/s11222-020-09971-5

An apparent paradox: a classifier based on a partially classified sample may have smaller expected error rate than that if the sample were completely classified

Published: 05 September 2020

Volume 30, pages 1779–1790, (2020)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

441 Accesses
6 Citations
Explore all metrics

Abstract

There has been increasing interest in using semi-supervised learning to form a classifier. As is well known, the (Fisher) information in an unclassified feature with unknown class label is less (considerably less for weakly separated classes) than that of a classified feature which has known class label. Hence in the case where the absence of class labels does not depend on the data, the expected error rate of a classifier formed from the classified and unclassified features in a partially classified sample is greater than that if the sample were completely classified. We propose to treat the labels of the unclassified features as missing data and to introduce a framework for their missingness as in the pioneering work of Rubin (Biometrika 63:581–592, 1976) for missingness in incomplete data analysis. An examination of several partially classified data sets in the literature suggests that the unclassified features are not occurring at random in the feature space, but rather tend to be concentrated in regions of relatively high entropy. It suggests that the missingness of the labels of the features can be modelled by representing the conditional probability of a missing label for a feature via the logistic model with covariate depending on the entropy of the feature or an appropriate proxy for it. We consider here the case of two normal classes with a common covariance matrix where for computational convenience the square of the discriminant function is used as the covariate in the logistic model in place of the negative log entropy. Rather paradoxically, we show that the classifier so formed from the partially classified sample may have smaller expected error rate than that if the sample were completely classified.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Estimation of Classification Rules From Partially Classified Data

Who Is Missing? A New Pattern Recognition Puzzle

Assessing the Reliability of a Multi-Class Classifier

References

Aghaeepour, N., Finak, G., Hoos, H., Mosmann, T., Brinkman, R., Gottardo, R., Scheuermann, R.: FlowCAP consortium, dream consortium: critical assessment of automated flow cytometry data analysis techniques. Nat. Methods 10, 228–238 (2013)
Article Google Scholar
Ahfock, D., McLachlan, G.J.: On missing data patterns in semi-supervised learning. arXiv ePreprint arXiv:1904.02883 (2019)
Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.: Mixmatch: A holistic approach to semi-supervised learning. In: Advances in Neural Information Processing Systems (2019)
Castelli, V., Cover, T.M.: The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Trans. Inf. Theory 42, 2102–2117 (1996)
Article MathSciNet Google Scholar
Chapelle, O., Schlköpf, B., Zien, A.: Semi-Supervised Learning. The MIT Press, Cambridge (2010)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. B 39, 1–22 (1977)
MATH Google Scholar
Efron, B.: The efficiency of logistic regression compared to normal discriminant analysis. J. Am. Stat. Assoc. 70, 892–898 (1975)
Article MathSciNet Google Scholar
Efron, B., Tibshirani, R.: Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat. Sci. 1, 54–75 (1986)
Article MathSciNet Google Scholar
Elkan, C., Neto, K.: Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 213–220 (2008)
Ganesalingam, S., McLachlan, G.J.: The efficiency of a linear discriminant function based on unclassified initial samples. Biometrika 65, 658–665 (1978)
Article MathSciNet Google Scholar
Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. In: Advances in Neural Information Processing Systems, pp. 529–536 (2005)
McLachlan, G.J.: Iterative reclassification procedure for constructing an asymptotically optimal rule of allocation in discriminant analysis. J. Am. Stat. Assoc. 70, 365–369 (1975)
Article MathSciNet Google Scholar
McLachlan, G.J.: Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York (1992)
Book Google Scholar
McLachlan, G.J., Gordon, R.D.: Mixture models for partially unclassified data: a case study of renal venous renin in hypertension. Stat. Med. 8, 1291–1300 (1989)
Article Google Scholar
McLachlan, G.J., Scot, D.: Asymptotic relative efficiency of the linear discriminant function under partial nonrandom classification of the training data. J. Stat. Comput. Simul. 52, 415–426 (1995)
Article MathSciNet Google Scholar
Mealli, F., Rubin, D.B.: Clarifying missing at random and relaated definitions, and implications when coupled with exchangeability. Biometrika 102, 995–1000 (2015)
Article MathSciNet Google Scholar
Molenberghs, G., Fitzmaurice, G.M., Kenward, M.G., Tsiatis, A.A., Verbeke, G.: Handbook of Missing Data Methodology. CRC Press, Boca Raton (2014)
Book Google Scholar
O’Neill, T.J.: Normal discrimination with unclassified observations. J. Am. Stat. Assoc. 73, 821–826 (1978)
Article MathSciNet Google Scholar
Ratsaby, J., Venkatesh, S.S.: Learning from a mixture of labeled and unlabeled examples with parametric side information. In: Proceedings of the Eighth Annual Conference on Computational Learning Theory, pp. 412–417 (1995)
Rubin, D.B.: Inference and missing data. Biometrika 63, 581–592 (1976)
Article MathSciNet Google Scholar
Shahshahani, B.M., Landgrebe, D.A.: The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon. IEEE Trans. Geosci. Remote Sens. 32, 1087–1095 (1994)
Article Google Scholar
van Engelen, J., Hoos, H.: A survey on semi-supervised learning. Mach. Learn. 109, 373–440 (2020)
Article MathSciNet Google Scholar
Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Zhang, T.: The value of unlabeled data for classification problems. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 1191–1198 (2000)

Download references

Acknowledgements

The authors are indebted to the Co-ordinating Editor and two Reviewers for their comments that have improved the exposition of the manuscript.

Author information

Authors and Affiliations

School of Mathematics and Physics, University of Queensland, St. Lucia, QLD, 4072, Australia
Daniel Ahfock & Geoffrey J. McLachlan

Authors

Daniel Ahfock
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey J. McLachlan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Geoffrey J. McLachlan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was funded by the Australian Government through the Australian Research Council (Project Numbers DP170100907 and IC170100035).

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 169 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ahfock, D., McLachlan, G.J. An apparent paradox: a classifier based on a partially classified sample may have smaller expected error rate than that if the sample were completely classified. Stat Comput 30, 1779–1790 (2020). https://doi.org/10.1007/s11222-020-09971-5

Download citation

Received: 14 January 2020
Accepted: 27 August 2020
Published: 05 September 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s11222-020-09971-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An apparent paradox: a classifier based on a partially classified sample may have smaller expected error rate than that if the sample were completely classified

Abstract

Access this article

Similar content being viewed by others

Estimation of Classification Rules From Partially Classified Data

Who Is Missing? A New Pattern Recognition Puzzle

Assessing the Reliability of a Multi-Class Classifier

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 169 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An apparent paradox: a classifier based on a partially classified sample may have smaller expected error rate than that if the sample were completely classified

Abstract

Access this article

Similar content being viewed by others

Estimation of Classification Rules From Partially Classified Data

Who Is Missing? A New Pattern Recognition Puzzle

Assessing the Reliability of a Multi-Class Classifier

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 169 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation