Measuring the effects of confounders in medical supervised classification problems: the Confounding Index (CI).,Artificial Intelligence in Medicine

当前位置： X-MOL 学术 › Artif. Intell. Med. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Measuring the effects of confounders in medical supervised classification problems: the Confounding Index (CI).
Artificial Intelligence in Medicine ( IF 6.1 ) Pub Date : 2020-01-13 , DOI: 10.1016/j.artmed.2020.101804
Elisa Ferrari ₁ , Alessandra Retico ₂ , Davide Bacciu ₃

Affiliation

Over the years, there has been growing interest in using machine learning techniques for biomedical data processing. When tackling these tasks, one needs to bear in mind that biomedical data depends on a variety of characteristics, such as demographic aspects (age, gender, etc.) or the acquisition technology, which might be unrelated with the target of the analysis. In supervised tasks, failing to match the ground truth targets with respect to such characteristics, called confounders, may lead to very misleading estimates of the predictive performance. Many strategies have been proposed to handle confounders, ranging from data selection, to normalization techniques, up to the use of training algorithm for learning with imbalanced data. However, all these solutions require the confounders to be known a priori. To this aim, we introduce a novel index that is able to measure the confounding effect of a data attribute in a bias-agnostic way. This index can be used to quantitatively compare the confounding effects of different variables and to inform correction methods such as normalization procedures or ad-hoc-prepared learning algorithms. The effectiveness of this index is validated on both simulated data and real-world neuroimaging data.

中文翻译：

衡量混杂因素在医学监督分类问题中的作用：混杂指数（CI）。

多年来，人们对使用机器学习技术进行生物医学数据处理的兴趣日益浓厚。在完成这些任务时，需要记住的是，生物医学数据取决于多种特征，例如人口统计方面（年龄，性别等）或采集技术，这些特征可能与分析目标无关。在有监督的任务中，未能针对这种特性（称为混杂因素）匹配地面真实目标，可能会导致对预测性能的非常误导的估计。已经提出了许多策略来处理混杂因素，从数据选择到规范化技术，再到使用训练算法学习不平衡数据的方法。但是，所有这些解决方案都要求先验混杂因素。为此，我们介绍了一种新颖的索引，该索引能够以与偏向无关的方式来衡量数据属性的混杂效果。该指数可用于定量比较不同变量的混杂影响，并为校正方法（例如标准化程序或临时准备的学习算法）提供信息。该指数的有效性已在模拟数据和真实世界的神经影像数据上得到验证。

更新日期：2020-01-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11