Model-free posterior inference on the area under the receiver operating characteristic curve
Introduction
First proposed during World War II to assess the performance of radar receiver operators (Calì and Longobardi, 2015), the receiver operating characteristic (ROC) curve is now an essential tool for analyzing the performance of binary classifiers in areas such as signal detection (Green and Swets, 1966), psychology examination (Swets, 1973, Swets, 1986), radiology (Lusted, 1960, Hanley and McNeil, 1982), medical diagnosis (Swets and Pickett, 1982, Hanley, 1989), and data mining (Spackman, 1989, Fawcett, 2006). One informative summary of the ROC curve is the corresponding area under the curve (AUC). This measure provides an overall assessment of classifier’s performance, independent of the choice of threshold, and is, therefore, the preferred method for evaluating classification algorithms (Provost and Fawcett, 1997, Provost et al., 1998, Bradley, 1997, Huang and Ling, 2005). The AUC is an unknown quantity, and our goal is to use the information contained in the data to make inference about the AUC. The specific set up is as follows. For a binary classifier which produces a random score to indicate the propensity for, say, Group 1; individuals with scores higher than a threshold are classified to Group 1, the rest are classified to Group 0. Let and be independent scores corresponding to Group 1 and Group 0, respectively. Given a threshold , define the specificity and sensitivity as and . Then the ROC curve is a plot of the parametric curve as takes all possible values for scores. While the ROC curve summarizes the classifier’s tradeoff between sensitivity and specificity as the threshold varies, the AUC measures the probability of correctly assigning scores for two individuals from two groups, which equals (Bamber, 1975), and is independent of the choice of threshold. Consequently, the AUC is a functional of the joint distribution of , denoted by , so the ROC curve is actually not needed to identify AUC.
In the context of inference on the AUC, when the scores are continuous, it is common to assume that satisfies a so-called binormality assumption, which states that there exists a monotone increasing transformation that maps both and to normal random variables (Hanley, 1988). For most medical diagnostic tests, where the classifiers are simple and ready-to-use without training, such an assumption serves well (Hanley, 1988, Metz et al., 1998, Cai and Moskowitz, 2004), although it has been argued that other distributions can be more appropriate for some specific tests (e.g., Guignard and Salehi, 1983, Goddard and Hinberg, 1990). But for complicated classifiers which involve multiple predictors, as often arise in machine learning applications, binormality – or any other model assumption for that matter – becomes a burden. This motivates our pursuit of a “model-free” approach to inference about the AUC.
Specifically, our goal is the construction of a type of posterior distribution for the AUC. The most familiar such construction is via Bayes’s formula, but this requires a likelihood function and, hence, a statistical model. The only way one can be effectively “model-free” within a Bayesian framework is to make the model extra flexible, which requires lots of parameters. In the extreme case, a so-called Bayesian nonparametric approach would take the distribution itself as the model parameter (e.g., Ghosal and van der Vaart, 2017, Gu et al., 2008). When the model includes lots of parameters, then the analyst has the burden of specifying prior distributions for these, based on little or no genuine prior information, and also computation of a high-dimensional posterior. But since the AUC is just a one-dimensional feature of this complicated set of parameters, there is no obvious return on the investment into prior specification and posterior computation. A better approach would be to construct the posterior distribution for the AUC directly, using available prior information about the AUC only, without specifying a model and without the introduction of artificial model parameters. That way, the data analyst can avoid the burdens of prior specification and posterior computation, bias due to model misspecification, and issues that can arise as a result of non-linear marginalization (e.g., Martin, 2019, Fraser, 2011).
As an alternative to the traditional Bayesian approach, we consider here the construction of a so-called Gibbs posterior for the AUC. In general, the Gibbs posterior construction proceeds by defining the quantity of interest as the minimizer of a suitable risk function, treating an empirical version of that loss function like a negative log-likelihood, and then combining with a prior distribution like in Bayes’s formula. General discussion of Gibbs posteriors can be found in Zhang, 2006a, Zhang, 2006b, Bissiri et al., 2016 and Alquier et al. (2016), and some statistical applications are discussed in Jiang and Tanner (2008) and Syring and Martin, 2017, Syring and Martin, 2019a, Syring and Martin, 2019b. Again, the advantage is that Gibbs posteriors avoid model misspecification bias and the need to deal with nuisance parameters. Moreover, under suitable conditions, Gibbs posteriors can be shown to have desirable asymptotic concentration properties (e.g., Syring and Martin, 2020, Bhattacharya and Martin, 2020, Chernozhukov and Hong, 2003), with theory that parallels that of Bayesian posteriors under model misspecification (e.g., Kleijn and van der Vaart, 2006, Kleijn and van der Vaart, 2012).
A subtle point is that, while the risk minimization problem that defines the quantity of interest is independent of the scale of the loss function, the Gibbs posterior is not. This scale factor is often referred to as the learning rate (e.g., Grünwald, 2012) and, because it controls the spread of the Gibbs posterior, its specification needs to be handled carefully. There are various approaches to the specification of the learning rate parameter (e.g., Grünwald, 2012, Grünwald and Van Ommen, 2017, Bissiri et al., 2016, Holmes and Walker, 2017, Lyddon et al., 2019). Here we adopt the approach in Syring and Martin (2019a) that aims to set the learning rate so that, in addition to its robustness to model misspecification and asymptotic concentration properties, the Gibbs posterior credible sets have the nominal frequentist coverage probability. When the sample size is large, we recommend an (asymptotically) equivalent calibration method that is simpler to compute.
The present paper is organized as follows. In Section 2.1, we review some methods for making inference on the AUC based on the binormality assumption, in particular, the Bayesian approach in Gu and Ghosal (2009) that involves a suitable rank-based likelihood. In Section 2.2, we argue that the binormality assumption is generally inappropriate in machine learning applications, and provide one illustrative example involving a support vector machine. This difficulty with model specification leads us to the Gibbs posterior, a model-free alternative to a Bayesian posterior, which is reviewed in Section 2.3. We develop the Gibbs posterior for inference on the AUC, derive its asymptotic concentration properties, and investigate how to properly scale the risk function in Section 3. Simulation experiments are carried out in Section 4, where a Gibbs posterior estimator performs favorably compared with the Bayesian approach based on a rank-based likelihood and another two Bayesian nonparametric methods. We also apply the Gibbs posterior on a real dataset for evaluating the performance of a biomarker for pancreatic cancer and compare our result with those based on some existing Bayesian methods. Finally, we give some concluding remarks in Section 5.
Section snippets
Binormality and related methods
Following Hanley (1988), the scores and satisfy the binormality assumption if their distribution functions are and respectively, where , , is a monotone increasing function, and denotes the distribution function, which implies that and can be transformed to and via . If denotes the distribution of under this assumption, then the ROC curve and the AUC, respectively, are given by and
Definition
As mentioned, the AUC is a functional of the joint distribution of , i.e., , given by . Recall that the data consists of independent copies and of and , respectively. To construct a Gibbs posterior distribution for as discussed above, we need an appropriate loss function. That is, we need a function such that the corresponding risk function, , is minimized at the true AUC, . If we define then it is easy
Simulation studies
Since the AUC is invariant when random variables and undergo the same monotone increasing transformation, we fix the distribution of to be standard normal and consider four examples for the distribution of :
- Example 1.
and ;
- Example 2.
– skew normal – and ;
- Example 3.
and ;
- Example 4.
and .
Fig. 2 provides a visualization of the two densities in each of the four examples. Note that these four examples
Conclusion
In certain applications, the parameters of interest can be defined as minimizers of an appropriate risk function, separate from any statistical model. In such cases, one can avoid potential model misspecification biases by working some kind of “model-free” approach. The present paper considered one such example, namely, inference on the AUC, where the state-of-the-art statistical model is one that depends on an infinite-dimensional nuisance parameter. As an alternative, we propose to construct
CRediT authorship contribution statement
Zhe Wang: Conceptualization, Methodology, Writing - original draft, Writing -reveiw & editing. Ryan Martin: Conceptualization, Methodology, Funding acquisition, Writing -reveiw & editing.
Acknowledgments
The authors thank the editors and anonymous reviewers for their helpful feedback on a previous version of the manuscript. This work is partially supported by the U.S. National Science Foundation , DMS–1811802.
References (55)
The area above the ordinal dominance graph and the area below the receiver operating characteristic graph
J. Math. Psychol.
(1975)The use of the area under the ROC curve in the evaluation of machine learning algorithms
Pattern Recognit.
(1997)- et al.
An MCMC approach to classical estimation
J. Econometrics
(2003) An introduction to ROC analysis
Pattern Recognit. Lett.
(2006)- et al.
Bayesian ROC curve estimation under binormality using a rank likelihood
J. Stat. Plan. Inference
(2009) Signal detection theory: Valuable tools for evaluating inductive learning
- et al.
Gibbs posterior inference on the minimum clinically important difference
J. Statist. Plann. Inference
(2017) - et al.
On the properties of variational approximations of Gibbs posteriors
J. Mach. Learn. Res.
(2016) - et al.
Gibbs posterior inference on multivariate quantiles
(2020) - et al.
A general framework for updating belief distributions
J. R. Stat. Soc. Ser. B Stat. Methodol.
(2016)
The binormal assumption on precision-recall curves
Semi-parametric estimation of the binormal ROC curve for a continuous diagnostic test
Biostatistics
Some mathematical properties of the ROC curve and their applications
Ricerche Mat.
Bayesian nonparametric ROC regression modeling
Bayesian Anal.
Fast calibrated additive quantile regression
Is Bayes posterior just quick and dirty confidence?
Stat. Sci.
Receiver operator characteristic (ROC) curves and non-normal data: an empirical study
Stat. Med.
Signal Detection Theory and Psychophysics, Vol. 1
The safe Bayesian
Inconsistency of Bayesian inference for misspecified linear models, and a proposal for repairing it
Bayesian Anal.
Bayesian bootstrap estimation of ROC curve
Stat. Med.
Validity of the Gaussian assumption in the analysis of ROC data obtained from scintigraphic-like images
Phys. Med. Biol.
The robustness of the ‘binormal’ assumptions used in fitting ROC curves
Med. Decis. Mak.
Receiver operating characteristic (ROC) methodology: the state of the art
Crit. Rev. Diagn. Imaging
The meaning and use of the area under a receiver operating characteristic (ROC) curve
Radiology
A class of statistics with asymptotically normal distribution
Ann. Math. Stat.
Cited by (19)
Gibbs posterior inference on multivariate quantiles
2022, Journal of Statistical Planning and InferenceCitation Excerpt :The Gibbs measure has its origins in statistical physics but a version of it has received attention in the statistics, machine learning, and econometrics literature; see, e.g., Bissiri et al. (2016), Zhang (2006a, b), and Chernozhukov and Hong (2003). Some recent statistical applications include data mining (Jiang and Tanner, 2008), clinical trials (Syring and Martin, 2017), image analysis (Syring and Martin, 2020b), actuarial science (Syring et al., 2019), and classifier performance assessment (Wang and Martin, 2020). Below we define the Gibbs posterior and some features that will be relevant in what follows.
Direct Gibbs posterior inference on risk minimizers: Construction, concentration, and calibration
2022, Handbook of StatisticsCitation Excerpt :This would require “reverse engineering” a loss function such that θ can be re-expressed as a risk minimizer. Examples of this reverse engineering can be found in Syring and Martin (2020) and Wang and Martin (2020, 2021). If this misspecified posterior is equipped with a learning rate η, that is, a power η < 1 on the likelihood function in the Bayes formulation, then the resulting fractional Bayes posterior (e.g., Bhattacharya et al., 2019) coincides with a Gibbs posterior based on the log-loss (3).
Chemometric development using portable molecular vibrational spectrometers for rapid evaluation of AVC (Valsa mali Miyabe et Yamada) infection of apple trees
2021, Vibrational SpectroscopyCitation Excerpt :In order to verify the applicability of the LS-SVM models, ROC curve was used for the model evaluation. The result showed that the discriminant model had better applicability when the AUC was greater than 0.5 [36]. Fig. 9(a) was the ROC curves of the LS-SVM models for NIR spectra (NIR-LS-SVM) established by XLs, CARS, and combination of XLs and CARS (XLs-CARS), respectively.
A Study on Autophagy Related Biomarkers in Alzheimer’s Disease Based on Bioinformatics
2023, Cellular and Molecular Neurobiology