当前位置: X-MOL 学术Comput. Methods Programs Biomed. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The Importance of Being External Methodological insights for the external validation of machine learning models in medicine
Computer Methods and Programs in Biomedicine ( IF 6.1 ) Pub Date : 2021-07-22 , DOI: 10.1016/j.cmpb.2021.106288
Federico Cabitza 1 , Andrea Campagner 1 , Felipe Soares 2 , Luis García de Guadiana-Romualdo 3 , Feyissa Challa 4 , Adela Sulejmani 5 , Michela Seghezzi 6 , Anna Carobene 7
Affiliation  

Background and Objective Medical machine learning (ML) models tend to perform better on data from the same cohort than on new data, often due to overfitting, or co-variate shifts. For these reasons, external validation (EV) is a necessary practice in the evaluation of medical ML. However, there is still a gap in the literature on how to interpret EV results and hence assess the robustness of ML models. Methods We fill this gap by proposing a meta-validation method, to assess the soundness of EV procedures. In doing so, we complement the usual way to assess EV by considering both dataset cardinality, and the similarity of the EV dataset with respect to the training set. We then investigate how the notions of cardinality and similarity can be used to inform on the reliability of a validation procedure, by integrating them into two summative data visualizations. Results We illustrate our methodology by applying it to the validation of a state-of-the-art COVID-19 diagnostic model on 8 EV sets, collected across 3 different continents. The model performance was moderately impacted by data similarity (Pearson ρ =.38, p<.001). In the EV, the validated model reported good AUC (average:.84), acceptable calibration (average:.17) and utility (average:.50). The validation datasets were adequate in terms of dataset cardinality and similarity, thus suggesting the soundness of the results. We also provide a qualitative guideline to evaluate the reliability of validation procedures, and we discuss the importance of proper external validation in light of the obtained results. ConclusionsIn this paper, we propose a novel, lean methodology to: 1) study how the similarity between training and validation sets impacts the generalizability of a ML model; 2) assess the soundness of EV evaluations along three complementary performance dimensions: discrimination, utility and calibration; 3) draw conclusions on the robustness of the model under validation. We applied this methodology to a state-of-the-art model for the diagnosis of COVID-19 from routine blood tests, and showed how to interpret the results in light of the presented framework.



中文翻译:

外部方法论见解对医学机器学习模型外部验证的重要性

背景和目标 医学机器学习 (ML) 模型在来自同一队列的数据上往往比在新数据上表现更好,这通常是由于过度拟合或协变量偏移。由于这些原因,外部验证(EV) 是评估医学 ML 的必要做法。然而,关于如何解释 EV 结果并因此评估 ML 模型的稳健性,文献中仍然存在空白。方法我们通过提出元验证来填补这一空白方法,以评估 EV 程序的健全性。在这样做时,我们通过考虑数据集基数和 EV 数据集与训练集的相似性来补充评估 EV 的常用方法。然后,我们通过将基数和相似性的概念集成到两个总结性数据可视化中来研究如何使用基数和相似性的概念来告知验证程序的可靠性。结果我们将我们的方法应用于在 3 个不同大陆收集的 8 个 EV 集上验证最先进的 COVID-19 诊断模型来说明我们的方法。模型性能受到数据相似性的中等影响(Pearsonρ =.38, <.001)。在 EV 中,经过验证的模型报告了良好的 AUC(平均值:.84)、可接受的校准(平均值:.17)和实用性(平均值:.50)。验证数据集在数据集基数和相似性方面是足够的,从而表明结果的合理性。我们还提供了评估验证程序可靠性的定性指南,并根据获得的结果讨论了适当外部验证的重要性。结论在本文中,我们提出了一种新颖的精益方法:1) 研究训练集和验证集之间的相似性如何影响 ML 模型的泛化性;2) 从三个互补的绩效维度评估电动汽车评估的合理性:辨别力、效用和校准;3) 对验证中模型的稳健性得出结论。我们将此方法应用于通过常规血液检查诊断 COVID-19 的最先进模型,并展示了如何根据所提出的框架解释结果。

更新日期:2021-07-23
down
wechat
bug