ROC and AUC with a Binary Predictor: a Potentially Misleading Metric

Muschelli, John

doi:10.1007/s00357-019-09345-1

ROC and AUC with a Binary Predictor: a Potentially Misleading Metric

Software Abstract
Published: 23 December 2019

Volume 37, pages 696–708, (2020)
Cite this article

Journal of Classification Aims and scope Submit manuscript

John Muschelli III ORCID: orcid.org/0000-0001-6469-1750¹

6630 Accesses
106 Citations
5 Altmetric
Explore all metrics

Abstract

In analysis of binary outcomes, the receiver operator characteristic (ROC) curve is heavily used to show the performance of a model or algorithm. The ROC curve is informative about the performance over a series of thresholds and can be summarized by the area under the curve (AUC), a single number. When a predictor is categorical, the ROC curve has one less than number of categories as potential thresholds; when the predictor is binary, there is only one threshold. As the AUC may be used in decision-making processes on determining the best model, it important to discuss how it agrees with the intuition from the ROC curve. We discuss how the interpolation of the curve between thresholds with binary predictors can largely change the AUC. Overall, we show using a linear interpolation from the ROC curve with binary predictors corresponds to the estimated AUC, which is most commonly done in software, which we believe can lead to misleading results. We compare R, Python, Stata, and SAS software implementations. We recommend using reporting the interpolation used and discuss the merit of using the step function interpolator, also referred to as the “pessimistic” approach by Fawcett (2006).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Allaire, J.J., Ushey, K., Tang, Y. (2018). Reticulate: interface to ‘Python’. https://github.com/rstudio/reticulate.
Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology, 12(4), 387–415.
Article MathSciNet Google Scholar
Blumberg, D.M., De Moraes, C.G., Liebmann, J.M., Garg, R., Chen, C., Theventhiran, A., Hood, D.C. (2016). Technology and the glaucoma suspect. Investigative Ophthalmology & Visual Science, 57(9), OCT80–OCT85.
Article Google Scholar
Budwega, J., Sprengerb, T., De Vere-Tyndall, A., Hagenkordd, A., Stippichd, C., Bergera, C.T. (2016). Factors associated with significant MRI findings in medical walk-in patients with acute headache. Swiss Medical Weekly, 146, w14349.
Google Scholar
DeLong, E.R, DeLong, D.M, Clarke-Pearson, D.L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 837–45.
Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27(8), 861–74.
Article MathSciNet Google Scholar
Glaveckaite, S., Valeviciene, N., Palionis, D., Skorniakov, V., Celutkiene, J., Tamosiunas, A., Uzdavinys, G., Laucevicius, A. (2011). Value of scar imaging and inotropic reserve combination for the prediction of segmental and global left ventricular functional recovery after revascularisation. Journal of Cardiovascular Magnetic Resonance, 13(1), 35.
Article Google Scholar
Hanley, J.A, & McNeil, B.J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36.
Article Google Scholar
Hsu, Y.-C., & Lieli, R. (2014). Inference for ROC curves based on estimated predictive indices: a note on testing AUC = 0.5. Unpublished Manuscript.
Hunter, J.D. (2007). Matplotlib: a 2D graphics environment. Computing in Science & Engineering, 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55.
Article Google Scholar
Kushnir, V.A, Darmon, S.K, Barad, D.H, Gleicher, N. (2018). Degree of mosaicism in trophectoderm does not predict pregnancy potential: a corrected analysis of pregnancy outcomes following transfer of mosaic embryos. Reproductive Biology and Endocrinology, 16(1), 6.
Article Google Scholar
Litvin, TV, Bresnick, GH, Cuadros, JA, Selvin, S, Kanai, K, Ozawa, GY. (2017). A revised approach for the detection of sight-threatening diabetic macular edema. JAMA Ophthalmology, 135(1), 62–68. https://doi.org/10.1001/jamaophthalmol.2016.4772.
Article Google Scholar
Maverakis, E., Ma, C., Shinkai, K., et al. (2018). Diagnostic criteria of ulcerative pyoderma gangrenosum: a Delphi consensus of international experts. JAMA Dermatology, 154(4), 461–66. https://doi.org/10.1001/jamadermatol.2017.5980.
Article Google Scholar
Mwipatayi, B.P, Sharma, S., Daneshmand, A., Thomas, S.D, Vijayan, V., Altaf, N., Garbowski, M., et al. (2016). Durability of the balloon-expandable covered versus bare-metal stents in the covered versus balloon expandable stent trial (COBEST) for the treatment of aortoiliac occlusive disease. Journal of Vascular Surgery, 64(1), 83–94.
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., et al. (2011). Scikit-learn: machine learning in python. Journal of Machine Learning Research, 12, 2825–30.
MathSciNet MATH Google Scholar
Pepe, M., Longton, G., Janes, H. (2009). Estimation and comparison of receiver operating characteristic curves. The Stata Journal, 9(1), 1.
Article Google Scholar
Peter, E. (2016). Fbroc: fast algorithms to bootstrap receiver operating characteristics curves. https://CRAN.R-project.org/package=fbroc.
R Core Team. (2018). R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C., Müller, M. (2011). pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics, 12, 77.
Article Google Scholar
Saito, T., & Rehmsmeier, M. (2015). The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PloS One, 10(3), e0118432.
Article Google Scholar
SAS, S.A.S., & Version, S.T.A.T. (2017). 9.4 [Computer program]. Cary, NC:SAS Institute.
Shterev, I.D, Dunson, D.B, Chan, C., Sempowski, G.D. (2018). Bayesian multi-plate high-throughput screening of compounds. Scientific Reports, 8(1), 9551.
Article Google Scholar
Sing, T, Sander, O, Beerenwinkel, N, Lengauer, T. (2005). ROCR: visualizing classifier performance R. Bioinformatics, 21(20), 7881. http://rocr.bioinf.mpi-sb.mpg.de.
Article Google Scholar
Snarr, B.S, Liu, M.Y, Zuckerberg, J.C, Falkensammer, C.B, Nadaraj, S., Burstein, D., Ho, D., et al. (2017). The parasternal short-axis view improves diagnostic accuracy for inferior sinus venosus type of atrial septal defects by transthoracic echocardiography. Journal of the American Society of Echocardiography, 30(3), 209–15.
Article Google Scholar
Stata, S. (2013). Release 13. Statistical software. StataCorp LP, College Station, TX.
Tuszynski, J. (2018). caTools: Tools: Moving Window Statistics, GIF, Base64, ROC AUC, Etc. https://CRAN.R-project.org/package=caTools.
Veltri, D., Kamath, U., Shehu, A. (2018). Deep learning improves antimicrobial peptide recognition. Bioinformatics, 1, 8.
Google Scholar
Xiong, X., Li, Q., Yang, W.-S., Wei, X., Hu, X., Wang, X.-C., Zhu, D., Li, R., Cao, D., Xie, P. (2018). Comparison of swirl sign and black hole sign in predicting early hematoma growth in patients with spontaneous intracerebral hemorrhage. Medical Science Monitor: International Medical Journal of Experimental and Clinical Research, 24, 567.
Article Google Scholar

Download references

Funding

This analysis was supported by NIH Grants R01NS060910 and U01NS080824.

Author information

Authors and Affiliations

Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
John Muschelli III

Authors

John Muschelli III
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John Muschelli III.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 118 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Muschelli, J. ROC and AUC with a Binary Predictor: a Potentially Misleading Metric. J Classif 37, 696–708 (2020). https://doi.org/10.1007/s00357-019-09345-1

Download citation

Published: 23 December 2019
Issue Date: October 2020
DOI: https://doi.org/10.1007/s00357-019-09345-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ROC and AUC with a Binary Predictor: a Potentially Misleading Metric

Abstract

Access this article

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Electronic supplementary material

(PDF 118 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation