Biased accuracy in multisite machine-learning studies due to incomplete removal of the effects of the site,Psychiatry Research: Neuroimaging

当前位置： X-MOL 学术 › Psychiatry Res. Neuroimaging › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Biased accuracy in multisite machine-learning studies due to incomplete removal of the effects of the site
Psychiatry Research: Neuroimaging ( IF 2.1 ) Pub Date : 2021-05-29 , DOI: 10.1016/j.pscychresns.2021.111313
Aleix Solanes ₁ , Pol Palau ₂ , Lydia Fortea ₃ , Raymond Salvador ₄ , Laura González-Navarro ₅ , Cristian Daniel Llach ₆ , Marc Valentí ₆ , Eduard Vieta ₆ , Joaquim Radua ₇

Affiliation

Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain; Department of Psychiatry and Forensic Medicine, Autonomous University of Barcelona, Barcelona, Spain.
FIDMAG Research Foundation, Barcelona, Spain; CASM Benito Menni Granollers-Hospital General de Granollers, Barcelona, Spain.
Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain; Biomedical Network Research Centre on Mental Health (CIBERSAM), Instituto de Salud Carlos III, Madrid, Spain; Institute of Neurosciences, University of Barcelona, Barcelona, Spain.
FIDMAG Research Foundation, Barcelona, Spain; Biomedical Network Research Centre on Mental Health (CIBERSAM), Instituto de Salud Carlos III, Madrid, Spain.
Faculty of Biology, University of Barcelona, Barcelona, Spain.
Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain; Biomedical Network Research Centre on Mental Health (CIBERSAM), Instituto de Salud Carlos III, Madrid, Spain; Institute of Neurosciences, University of Barcelona, Barcelona, Spain; Barcelona Bipolar Disorders and Depressive Unit, Institute of Neurosciences, Hospital Clinic, Barcelona, Spain.
Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain; Biomedical Network Research Centre on Mental Health (CIBERSAM), Instituto de Salud Carlos III, Madrid, Spain; Department of Psychosis Studies, Institute of Psychiatry, Psychology, and Neuroscience, King's College London, London, United Kingdom; Centre for Psychiatric Research and Education, Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden.

Brain MRI researchers conducting multisite studies, such as within the ENIGMA Consortium, are very aware of the importance of controlling the effects of the site (EoS) in the statistical analysis. Conversely, authors of the novel machine-learning MRI studies may remove the EoS when training the machine-learning models but not control them when estimating the models' accuracy, potentially leading to severely biased estimates. We show examples from a toy simulation study and real MRI data in which we remove the EoS from both the "training set" and the "test set" during the training and application of the model. However, the accuracy is still inflated (or occasionally shrunk) unless we further control the EoS during the estimation of the accuracy. We also provide several methods for controlling the EoS during the estimation of the accuracy, and a simple R package ("multisite.accuracy") that smoothly does this task for several accuracy estimates (e.g., sensitivity/specificity, area under the curve, correlation, hazard ratio, etc.).

中文翻译：

由于站点影响的不完全消除导致多站点机器学习研究中的偏差准确性

进行多站点研究的脑 MRI 研究人员，例如在 ENIGMA 联盟内，非常清楚控制站点 (EoS) 在统计分析中的影响的重要性。相反，新颖的机器学习 MRI 研究的作者可能会在训练机器学习模型时移除 EoS，但在估计模型的准确性时不会控制它们，这可能会导致严重的估计偏差。我们展示了来自玩具模拟研究和真实 MRI 数据的示例，其中我们在模型的训练和应用过程中从“训练集”和“测试集”中删除了 EoS。然而，除非我们在估计准确度期间进一步控制 EoS，否则准确度仍然被夸大（或偶尔缩小）。

更新日期：2021-06-04

点击分享查看原文

点击收藏

阅读更多本刊最新论文